AutheText-AI

arrow_back Back to Home
About Us

Building Honest AI Detection

AutheText-AI is a supervised machine learning-based binary text classification system designed to distinguish between AI-generated and human-written content. It was built as a real, end-to-end ML project — not an API wrapper.

target Our Mission

To provide an accessible, transparent, and technically honest tool for identifying AI-generated text — helping educators, publishers, and content platforms maintain authenticity in the age of generative AI.

science How It Works

1

Dataset Preparation

We curated 1,000+ labeled text samples — human-written content sourced from news articles, blogs, and essays, alongside AI-generated content with distinct linguistic patterns.

2

TF-IDF Feature Extraction

Text is converted into 5,000-dimensional numerical feature vectors using Term Frequency–Inverse Document Frequency (TF-IDF) with unigram and bigram analysis. This captures word-level and phrase-level patterns that differ between human and AI writing.

3

Logistic Regression Classification

A trained Logistic Regression model classifies the input text and outputs a probability score, indicating how likely it is to be AI-generated or human-written.

4

Browser-Based Delivery

The model is compiled down and executed entirely within the client's browser, enabling real-time private predictions without server roundtrips.

code Tech Stack

Browser ML Vanilla JS scikit-learn (Training) TF-IDF Vectorizer Logistic Regression HTML / CSS Tailwind CSS

warning Limitations

  • This is a probabilistic model — results are not definitive proof of AI or human authorship.
  • Performance may vary with text from domains not well-represented in the training data.
  • Very short texts (under 50 words) do not provide sufficient signal for reliable classification.
  • Paraphrased or edited AI text may evade detection.
  • The model currently works best with English-language text.

Built with ❤️ as a real machine learning project — no API wrappers, no fake claims.