About Us

Building Honest AI Detection

AutheText-AI is a supervised machine learning-based binary text classification system designed to distinguish between AI-generated and human-written content. It was built as a real, end-to-end ML project — not an API wrapper.

target Our Mission

To provide an accessible, transparent, and technically honest tool for identifying AI-generated text — helping educators, publishers, and content platforms maintain authenticity in the age of generative AI.

science How It Works

Dataset Preparation

We curated 1,000+ labeled text samples — human-written content sourced from news articles, blogs, and essays, alongside AI-generated content with distinct linguistic patterns.

TF-IDF Feature Extraction

Text is converted into 5,000-dimensional numerical feature vectors using Term Frequency–Inverse Document Frequency (TF-IDF) with unigram and bigram analysis. This captures word-level and phrase-level patterns that differ between human and AI writing.

Logistic Regression Classification

A trained Logistic Regression model classifies the input text and outputs a probability score, indicating how likely it is to be AI-generated or human-written.

Browser-Based Delivery

The model is compiled down and executed entirely within the client's browser, enabling real-time private predictions without server roundtrips.

code Tech Stack

Browser ML Vanilla JS scikit-learn (Training) TF-IDF Vectorizer Logistic Regression HTML / CSS Tailwind CSS

warning Limitations

This is a probabilistic model — results are not definitive proof of AI or human authorship.
Performance may vary with text from domains not well-represented in the training data.
Very short texts (under 50 words) do not provide sufficient signal for reliable classification.
Paraphrased or edited AI text may evade detection.
The model currently works best with English-language text.

Built with ❤️ as a real machine learning project — no API wrappers, no fake claims.