AuthentiText AI

AuthentiText AI

arrow_back Back to Home
About Us

Building Honest AI Detection

AuthentiText AI is a supervised machine learning-based binary text classification system designed to distinguish between AI-generated and human-written content. It was built as a real, end-to-end ML project — not an API wrapper.

🎯 Our Mission

To provide an accessible, transparent, and technically honest tool for identifying AI-generated text — helping educators, publishers, and content platforms maintain authenticity in the age of generative AI.

🔬 How It Works

1.

Dataset Preparation

We curated 1,000+ labeled text samples — human-written content sourced from news articles, blogs, and essays, alongside AI-generated content with distinct linguistic patterns.

2.

TF-IDF Feature Extraction

Text is converted into 5,000-dimensional numerical feature vectors using Term Frequency–Inverse Document Frequency (TF-IDF) with unigram and bigram analysis. This captures word-level and phrase-level patterns that differ between human and AI writing.

3.

Logistic Regression Classification

A trained Logistic Regression model classifies the input text and outputs a probability score, indicating how likely it is to be AI-generated or human-written.

4.

REST API Delivery

The model is served via a Flask REST API, enabling real-time predictions from any frontend client through a simple POST request.

🛠 Tech Stack

Python scikit-learn TF-IDF Vectorizer Logistic Regression Flask Pandas Joblib Gunicorn HTML / CSS / JS Tailwind CSS

⚠️ Limitations

  • This is a probabilistic model — results are not definitive proof of AI or human authorship.
  • Performance may vary with text from domains not well-represented in the training data.
  • Very short texts (under 50 words) do not provide sufficient signal for reliable classification.
  • Paraphrased or edited AI text may evade detection.
  • The model currently works best with English-language text.

Built with ❤️ as a real machine learning project — no API wrappers, no fake claims.