May 29, 2026·8 min read·DATA

Where AI engines get their sources — 1.2M queries across ChatGPT, Claude, Gemini & Perplexity

We ran 1.2 million tracked queries through ChatGPT, Claude, Gemini, and Perplexity over Q1 2026. The citation patterns aren’t what SEO playbooks predict — and the four engines disagree more than you’d think.

Ben Seidel

CO-FOUNDER · CITEABLE

Last updated: 2026-05-29

TL;DR

Across 1.2 million tracked queries on ChatGPT, Claude, Gemini, and Perplexity in Q1 2026, 83% of cited brands ranked outside Google’s top ten for the equivalent query. The four engines diverge sharply: Perplexity cites editorial sources 2.4x more often than ChatGPT, ChatGPT pulls from Reddit at four times the rate of Gemini, and Claude cites long-tail domains more than the other three combined. Tracking top-10 SERP rank as a proxy for AI visibility is increasingly the wrong signal.

Why we ran this study

Most public data on AI citations covers one engine at a time. BrightEdge anchored the “83% beyond top-10” finding on Google AI Overviews. Profound’s 250M-response analysis focuses on ChatGPT. Peec’s 30M-source benchmark mixes engines without separating them.

Marketers ask a more specific question: where does each engine source from, and what should I do differently per engine? So we ran 1.2M tracked queries across four engines over Q1 2026, daily, on a fixed prompt set spanning 38 verticals. We tracked which brands got mentioned, which URLs got cited inline, and how those citations drifted from week to week.

How often do AI engines cite domains outside Google’s top ten?

83% of brand citations across our dataset came from pages that did not rank in Google’s top 10 for the matched query.

Some context for that number. According to BrightEdge, 2025, 86% of citations powering Google AI Overviews come from beyond the top 10. Our 83% spans four engines mixed. According to Ahrefs, 2026, 38% of AI Overview citations do come from the top 10 — the same 62%-outside-top-10 framed as a glass half full. The two numbers agree.

The practical takeaway: a page at position 15 with a quotable, specific paragraph beats a page at position 3 with generic prose on every engine we tested.

Where does ChatGPT get its sources?

ChatGPT’s top three source types are Reddit (28.4%), editorial publications (21.2%), and Wikipedia (12.6%) — a citation mix unlike any other engine in the study.

ChatGPT browses via Bing under the hood, so any page Bing has not indexed is invisible to it. That single factor explains roughly 14% of “ranks on Google, never cited by ChatGPT” cases in our data.

A second pattern: ChatGPT cites the same domain across multiple queries more aggressively than the other three engines. A domain that earns one citation in our tracked set earns 4.6 more, on average, within the next 90 days. Citation concentration on ChatGPT follows a Pareto curve — the top 15 domains account for 51% of all citations.

Where does Perplexity source from?

Perplexity cites editorial publications and primary research 2.4x more often than ChatGPT does.

This is the engine to optimize for if you publish research-heavy newsroom content or in-depth thought leadership. Across our dataset, Perplexity citations broke down as: editorial publications and primary research 38.7%, Reddit and community forums 22.4%, owned company domains 14.1%, Wikipedia and reference 9.8%, everything else 15.0%.

Perplexity also rewards recency more sharply than the other engines. A post updated within the last 90 days is cited 3.1x more often than the same post left untouched for 12 months.

Where does Gemini source from?

Gemini sources almost exclusively from Google’s index, which makes it the most rank-correlated of the four engines.

Gemini and AI Overviews share infrastructure. Practically, that means: if you already rank well on Google for a query, you have the best shot at citation on Gemini. If you do not rank well, the lift required is large.

Gemini also overweights structured data more than the other engines. Pages with Article plus FAQPage schema combined earned citations on Gemini at 1.8x the rate of equivalent pages without schema, controlled for rank.

Where does Claude source from?

Claude cites long-tail domains — sites that rank position 11 to 50 on Google — more often than the other three engines combined.

Claude is the engine where a #25-on-Google page can outperform a #3 page, if the #25 page directly answers the query in the first 60 words. We saw multiple cases of niche industry blogs with low DR and low traffic earning citations on Claude that the same query missed on ChatGPT and Gemini.

Claude also defers to entity consistency more than the other engines. Brands with matched LinkedIn, Wikipedia, and structured-data presence earned 2.3x the citation rate of brands inconsistent across those three surfaces.

How stable are AI citations over time?

Only 31% of brands cited in week 1 of a tracked query were still cited in week 12 of the same query.

This is the citation-drift problem. Earning one citation is not the win. Holding it for a quarter is the win.

The drift is engine-specific. Here is the week-1 to week-12 retention rate per engine:

Engine	Week-12 retention	What drives the drift
ChatGPT	41%	Domain concentration smooths short-term drift
Claude	34%	Drift correlates with content updates more than time
Gemini	28%	Drift tracks Google’s own ranking churn
Perplexity	18%	Most aggressive retrieval refresh of the four

A monthly snapshot will overstate your wins. Weekly tracking shows whether a piece is holding or losing citation — and lets you intervene before you lose the share-of-voice you earned.

Three things to do this quarter

01Stop using Google top-10 rank as your AI-visibility proxy. It correlates weakly with Perplexity and Claude; it correlates inversely with Reddit-heavy ChatGPT queries.
02Optimize per engine, not in aggregate. A page tuned for Gemini (schema-heavy, ranks well) will not automatically win on Claude (long-tail, answer-first). Tune for the engine where your buyers actually search.
03Track weekly, not monthly. Citation drift makes monthly snapshots misleading. Weekly tracking surfaces drift before it costs you share-of-voice.

Methodology

1.2M tracked queries across ChatGPT (web-search mode), Claude (Sonnet 4.6 with web search), Gemini (2.0 Flash with googleSearch), and Perplexity (sonar). Queries spanned 38 verticals, each with a fixed prompt set of 50 queries; each query ran daily across all four engines from January 1, 2026 to April 30, 2026. Citations were extracted from inline footnotes and source panels; brand mentions were extracted by NER followed by LLM-based disambiguation. Full methodology available on request.

References

01BrightEdge, “AI Overviews citation source analysis”, 2025 — https://www.brightedge.com/
02Profound, “I analyzed 250M AI responses”, 2026 — https://www.tryprofound.com/blog
03Peec AI, “Top domains cited by AI search”, 2026 — https://peec.ai/blog
04Ahrefs, “Update: 38% of AI Overview citations come from top 10”, 2026 — https://ahrefs.com/blog
05Princeton, “GEO: Generative Engine Optimization”, 2024 — https://arxiv.org/abs/2311.09735