How AI Detection Tools Work — and What They Actually Detect (2025)

AI detection tools are everywhere now — Turnitin, GPTZero, Originality.ai, and dozens more. But how do they actually work? What do they measure? And how accurate are they really? This article cuts through the hype with a clear, honest look at the technology.

The two core signals: perplexity and burstiness

Most AI detection tools rely on two core statistical signals derived from language model research.

Perplexity

Perplexity measures how "surprising" a piece of text is to a language model. When an AI writes text, it naturally chooses high-probability, low-surprise word sequences — the words that statistically fit best in context. This produces low-perplexity text. Human writers, by contrast, make unexpected word choices, use unusual structures, and take creative risks — producing higher-perplexity text overall.

Detection tools measure this: low perplexity suggests AI authorship; high perplexity suggests human authorship.

Burstiness

Burstiness refers to variation in sentence length and complexity. Human writing is "bursty" — we write long complex sentences, then short punchy ones, then medium ones. AI writing tends to be uniform: each sentence is roughly the same length and complexity, creating low burstiness. Detectors measure this variation (or lack of it) as an additional signal.

The major AI detection tools compared

Tool	Primary audience	Method	Reported accuracy
GPTZero	Educators	Perplexity + burstiness scoring	~85% on clean AI text
Turnitin AI	Universities	Proprietary ML + similarity	~98% claimed, debated
Originality.ai	Publishers, SEO	Fine-tuned detection model	~94% on GPT-4 content
Copyleaks	Enterprises	Multi-model detection	~99% claimed
ZeroGPT	General public	Text analysis heuristics	Inconsistent in testing

Note: Accuracy figures are self-reported or from limited testing. Real-world performance on edited or humanized text is significantly lower for all tools.

The false positive problem

This is the most important and least-discussed issue with AI detection: false positives are common and consequential.

Research has consistently shown that certain types of human writing score very high on AI detection tools — not because they were AI-generated, but because they happen to share stylistic traits with AI output:

Non-native English speakers, who write more simply and predictably
Students with formal academic training, who use structured, predictable essay formats
Technical writers, whose clarity-focused style resembles AI output
Writers in certain genres (how-to guides, product descriptions) that naturally follow predictable patterns

A 2023 study found that GPTZero flagged over 50% of essays written by non-native English speakers as AI-generated. This is a serious problem when detection results are used to make academic misconduct decisions.

"AI detectors don't detect AI — they detect writing that resembles AI. That's a crucial distinction, and most institutions aren't making it clearly enough."

What makes text harder to detect?

Several factors consistently make AI-generated text harder for detection tools to catch:

Sentence length variation — Adding short and long sentences disrupts the low-burstiness signal
Vocabulary specificity — Using concrete, specific words raises perplexity
First-person voice — Personal anecdotes and opinions are harder to attribute to AI
Editing — Even light human editing meaningfully reduces detection scores
Humanizer tools — Tools like Humanizor restructure the text at the level that detectors measure

The arms race problem

AI detection is fundamentally an arms race that detection tools are losing. As language models improve, their output naturally becomes harder to distinguish from human writing. Each new model generation produces more varied, nuanced text — and the detection tools trained on older model outputs are less accurate on new ones.

OpenAI released and then quietly retired its own AI classifier in 2023, noting it was "not reliable enough." Anthropic has not released a detection tool, citing similar concerns about accuracy.

What this means for writers

If you're a writer using AI tools for any purpose — whether that's drafting, editing, research, or brainstorming — the most important things to know are:

Detection tools are imperfect and produce false positives, especially for non-native speakers
No tool can prove that text is AI-generated — only that it shares statistical properties with AI output
Editing your AI drafts meaningfully reduces detection scores and, more importantly, improves writing quality
Humanizer tools change the statistical properties that detectors measure — primarily perplexity and burstiness

Our recommendation Use AI tools to accelerate and assist your writing, then genuinely edit and personalise the output. This produces better writing than either pure AI or pure manual effort — and it reflects your actual voice.

Make your writing sound naturally human

Humanizor rewrites AI text to have natural variation, specific vocabulary, and human rhythm. Free, no sign-up required.

✦ Try Humanizor free