Building Honest AI Detection
AuthentiText AI is a supervised machine learning-based binary text classification system designed to distinguish between AI-generated and human-written content. It was built as a real, end-to-end ML project — not an API wrapper.
🎯 Our Mission
To provide an accessible, transparent, and technically honest tool for identifying AI-generated text — helping educators, publishers, and content platforms maintain authenticity in the age of generative AI.
🔬 How It Works
Dataset Preparation
We curated 1,000+ labeled text samples — human-written content sourced from news articles, blogs, and essays, alongside AI-generated content with distinct linguistic patterns.
TF-IDF Feature Extraction
Text is converted into 5,000-dimensional numerical feature vectors using Term Frequency–Inverse Document Frequency (TF-IDF) with unigram and bigram analysis. This captures word-level and phrase-level patterns that differ between human and AI writing.
Logistic Regression Classification
A trained Logistic Regression model classifies the input text and outputs a probability score, indicating how likely it is to be AI-generated or human-written.
REST API Delivery
The model is served via a Flask REST API, enabling real-time predictions from any frontend client through a simple POST request.
🛠 Tech Stack
⚠️ Limitations
- This is a probabilistic model — results are not definitive proof of AI or human authorship.
- Performance may vary with text from domains not well-represented in the training data.
- Very short texts (under 50 words) do not provide sufficient signal for reliable classification.
- Paraphrased or edited AI text may evade detection.
- The model currently works best with English-language text.
Built with ❤️ as a real machine learning project — no API wrappers, no fake claims.