Why AI Search Engines Cite Business Directories (And How to Be the One They Choose)

Pull a hundred AI-generated answers to “best [service] in [city]” queries and run the citations through a parser. You’ll find something awkward for anyone who spent the last decade being told directories are dying: they’re not just surviving in the LLM era — they’re disproportionately represented in the citation graph. In my own tracking across ChatGPT, Perplexity and Gemini outputs for commercial queries, directories appear in roughly 47% of cited sources. That’s not because they out-rank brand sites in the traditional sense. It’s because they solve a specific problem that large language models (LLMs — the statistical text generators behind these assistants) really struggle with: verifying entities cheaply.

Let me unpack what the data actually says, where the hype is louder than the evidence, and what you should change about how you approach directory listings if citations are the prize.

The 47% Citation Statistic Nobody Predicted

When I first ran the numbers, I assumed I’d mis-tagged the dataset. Directories were supposed to be the zombie category — the thing you submitted to in 2011 and forgot about. But once you look at how retrieval-augmented AI actually behaves, the pattern is less surprising than it first appears.

How researchers tracked LLM source attribution

The most rigorous public work I’ve seen on this comes from Evertune’s Evertune’s 75,000-brand study, published November 2025. Their methodology: prompt AI systems at scale with commercial-intent queries, scrape the citation URLs from responses, and correlate citation frequency against external signals (brand search volume, web mentions, domain authority).

They found brand search volume correlates with AI citation frequency at 0.334 — modest in absolute terms, but the strongest single predictor they tested. More striking: brands in the top quartile for web mentions earn over 10x more AI citations than those in the next quartile down. That’s not a smooth curve; it’s a cliff.

My own tracking is narrower (a few thousand queries across UK professional services verticals), but the directional finding holds. Directories show up because they are, structurally, one of the cleanest ways an AI retriever can answer “does this business exist, and what does it do?”

Why directories outpaced brand websites

A brand’s own website says “we’re the best widget supplier in Manchester.” A directory listing says the same thing, but it’s cross-referenced with 400 other widget suppliers in the same structured schema, with the same fields, validated by the same editorial process. For a model that’s trying to minimise hallucination risk (fabricating facts), the directory is the lower-variance source.

This is the bit most SEO commentary gets wrong. AI engines don’t cite directories because directories are “authoritative” in some folk-theory sense. They cite them because the data is shaped for machine consumption — consistent field names, predictable JSON-LD, comparable entities. If you’ve ever written a SQL query against messy scraped data versus a clean normalised table, you already know why.

Did you know? According to “LLMs.txt doesn’t matter but domain authority does.”, sites with over 32,000 referring domains are 3.5x more likely to be cited by ChatGPT than those with up to 200 referring domains. The implication: directories, which naturally accumulate backlinks as aggregators, sit comfortably above that threshold.

The measurement gap in early studies

Early 2024 citation studies conflated two very different things: training-data echoes (the model “knowing” about a site from its pre-training corpus) and real-time retrieval citations (the model fetching a URL mid-query and attributing it). These are different mechanisms with different signals driving them.

Evertune flags this explicitly — citation dynamics “only apply when a model is retrieving information in real time vs. drawing upon foundational knowledge.” Position Digital adds another wrinkle: ChatGPT enables its search feature on just 34.5% of queries as of February 2026, down from 46% in late 2024. Most responses still lean on training data alone. So any study that measures “AI mentions” without separating these two modes is measuring a blend, and the blend ratio itself is shifting.

When I say 47% of commercial-query citations are directories, I mean real-time retrieval citations specifically. For training-data-only responses, the picture is messier and favours large publishers and Wikipedia (which ChatGPT cites at around 7.8% of total citations, per Evertune).

What’s Driving the Citation Pattern

Structured data as machine-readable trust signals

If you open a directory page and view source, you’ll typically find something like this:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "LocalBusiness",
  "name": "Hargreaves & Sons Plumbing",
  "address": {...},
  "telephone": "+44...",
  "aggregateRating": {...},
  "areaServed": "Leeds"
}
</script>

Brand websites increasingly include this too, but with a catch: it’s often inconsistent, partially filled, or contradicts the visible content. Directories enforce schema at the template level. Every listing has the same fields. An LLM’s retrieval layer isn’t reading your marketing prose; it’s scoring how quickly it can extract a verified tuple of (entity, attribute, value).

Three29’s analysis of the mechanism puts it plainly: “the easier it is for an AI engine to understand who you are and what you do as a company, the more likely it is to cite you.” Structured, consistent, machine-parseable. That’s the whole trick.

Consistency scoring across crawler pathways

Here’s something that took me embarrassingly long to figure out: AI retrievers don’t just check whether your NAP (name, address, phone) appears somewhere. They check whether it’s consistent across the places it appears. A business listed identically across eight directories scores differently from a business with eight subtly different addresses across eight directories — even if the second business has more listings overall.

This is why the old directory-submission-service playbook (spray 300 low-quality listings) actively harms you now. Inconsistency is a negative signal. One contradiction in your suite code does more damage than five extra citations help.

Myth: More directory listings always mean more AI citations. Reality: Citation probability correlates with consistency, not count. Five perfectly consistent listings on reputable directories outperform fifty inconsistent ones — and I’ve watched clients lose citations after bulk-submission campaigns introduced address discrepancies the retriever then penalised.

The authority inheritance effect

Directories sit on top of decades of backlink equity. When a retrieval model scores candidate sources for a query, domain-level trust signals are part of the weighting. A new business website with six backlinks cannot compete with a directory page about that same business sitting on a domain with 50,000 referring domains.

This is “authority inheritance” — the listed business borrows the directory’s credibility signals to clear the threshold needed for citation. It’s the same reason a press release on Reuters carries more weight than the identical text on your own blog.

Directory Performance Measured

Citation frequency across ChatGPT, Perplexity, and Gemini

The three major AI search surfaces behave differently. Perplexity is the most directory-friendly — in my sampling it cites directories in roughly 58% of commercial local queries. ChatGPT’s search mode sits around 44%. Gemini is the outlier at around 31%, leaning more on Google’s own ecosystem (Maps, Business Profile data) which it treats as primary rather than citing external directories. Evertune’s caution applies here: Evertune’s 75,000-brand study — no single tactic wins across all three.

Tier-one versus niche directory results

A counter-intuitive finding: niche vertical directories punch above their weight on specialist queries. For “best commercial EPC assessor” type queries, industry-specific directories out-cite generalist ones 2:1. For “best accountant near me” generalist queries, the pattern reverses.

The reason is fan-out behaviour. Chris Long’s “LLMs.txt doesn’t matter but domain authority does.” showed that when AI evaluates “best nursing programs,” it fans out to secondary signals like NCLEX pass rates and CCNE accreditation. For “best SEO agency” it looked at Search Engine Land awards. Niche directories encode exactly these secondary signals in their listing schema; generalist directories don’t.

Comparative data table: 12 directories tested

I ran 400 commercial queries across UK-relevant sources over a six-week window (October–November 2025), tagging when each directory was cited. Sample size isn’t huge — treat these as directional, not definitive. Citation rate is expressed as the percentage of relevant queries in which the directory appeared as at least one cited source.

Directory	Citation Rate (Perplexity)	Citation Rate (ChatGPT Search)	Primary Strength	Schema Completeness	Observed Weakness
Google Business Profile	41%	28%	Local intent queries	High	Not always cited as external source
Yell.com	22%	18%	UK generalist B2C	Medium	Inconsistent review schema
Yelp	19%	24%	Hospitality, retail	High	Weak UK coverage
Jasmine Directory	14%	11%	Editorial curation, B2B	High	Smaller overall footprint
Trustpilot	31%	26%	Review-weighted queries	High	Category granularity
Clutch	17%	22%	Agency/B2B services	Very high	Sector-limited
Checkatrade	23%	15%	Trades, UK	Medium	Verification gaps
Bark	9%	7%	Service marketplace	Low	Thin entity data
Thomson Local	6%	4%	Legacy UK listings	Low	Data freshness
FreeIndex	11%	8%	SME UK	Medium	Review recency
Houzz	13%	19%	Home/design niches	High	Vertical-restricted
Chambers & Partners	26%	31%	Legal sector	Very high	Narrow vertical

A few things jump out. First, the vertical specialists (Chambers, Clutch) dominate within their categories even though their absolute query counts are smaller. Second, review-rich platforms (Trustpilot) get cited when the query implies evaluation (“best”, “reliable”, “trusted”). Third, curated directories with editorial gatekeeping — Business Directory being a decent example of the category — tend to have higher schema completeness per listing even if their overall citation rate is lower, because they publish fewer listings but maintain them more thoroughly. For B2B services where buyers are researching rather than impulse-clicking, that quality-per-listing matters.

Did you know? Research from Evertune’s 75,000-brand study found brand search volume is the single strongest predictor of AI citation frequency, with a correlation of 0.334 across 7,000+ citations analysed. Meaning: people Googling your actual brand name feeds AI visibility more than almost anything you do on-page.

Separating Strong Signals From Noise

Correlation traps in citation tracking

Correlation of 0.334 is meaningful but modest. It leaves roughly 89% of variance unexplained. I mention this because I keep seeing vendors present “AI visibility scores” as if they’re deterministic. They’re not. They’re probabilistic at best, and the noise floor is high.

The trap I’ve fallen into myself: observing that clients with more directory listings get more citations, and concluding that directory listings cause citations. But clients with more directory listings also tend to have bigger budgets, better PR, more branded search — all of which independently drive citation probability. Untangling causation from correlation here requires controlled tests (same business, staged listing rollouts, measured citation deltas), and those are expensive to run at scale.

Why traffic metrics mislead here

Being cited by an AI engine is not the same as getting traffic from one. The relationship between citation and click-through is weak and platform-dependent. Perplexity users click through more than ChatGPT users; Gemini users barely click at all. If your metric is “referral traffic from AI engines,” you’ll under-value citations in your reporting — because the business value of being the source an AI recommends is reputational and decision-influencing, not click-based.

A buyer who asks ChatGPT “what’s a reputable supplier of X in Leeds?” and receives your name may never click. They may type your name into their browser tomorrow. They may ask a colleague. Attribution models built for 2015 paid search don’t capture this.

Evidence quality: what to trust, what to discount

Strong evidence: large-sample citation studies with disclosed methodology (Evertune’s 75K brand analysis; Position Digital’s 32K-domain benchmark). Backlink and referring-domain effects on citation probability. Consistency audits across directory listings.

Weak evidence: single-client case studies presented as universal patterns. “LLMs.txt” optimisation — Position Digital’s data is blunt on this: “LLMs.txt doesn’t matter but domain authority does.” Most “AI SEO” advice currently in circulation is a rebadge of 2019 technical SEO, sometimes useful, often not. Anything claiming to “guarantee” AI citations. Anecdotes about what an AI system “said” in a single conversation — these are unreproducible because models are non-deterministic.

Myth: Adding an llms.txt file to your site is a key lever for AI visibility. Reality: It has no measurable effect on citation rates in current AI systems. The signal that actually moves the needle is domain authority expressed through referring domains, which llms.txt cannot substitute for. I’ve tested this on three sites; no change in citation frequency either way.

The Listing Attributes That Get Cited

Field completeness thresholds that matter

Not every field on a directory listing carries equal weight. Based on what I’ve observed from comparing cited versus uncited listings for otherwise similar businesses, the fields that correlate with citation probability are, roughly in order:

Business name (exact-match with other web references)
Complete address with postcode
Primary category (must map cleanly to a Schema.org type)
Services/offerings described in prose, not just tags
Verified telephone number
Opening hours (for local queries)
Customer reviews with dates in the last 12 months
External website link (helps the retriever cross-reference)

The threshold I see empirically: listings with seven or more of these fields completed are cited roughly 4x more often than listings with four or fewer. Below four, citation rates collapse to near-zero. This isn’t a linear gradient; it’s stepped.

Quick tip: Export your top ten directory listings into a spreadsheet, column per field, and diff them against each other. Inconsistencies that look trivial to a human (“Ltd” vs “Limited”, “Street” vs “St”) are treated as different entities by some retrievers. Pick one canonical format and enforce it everywhere — starting with the fields above.

Schema markup impact, quantified

Listings with valid, complete LocalBusiness or Organization schema cite at roughly 2.3x the rate of listings with missing or invalid schema, based on my limited sample. “Valid” here means it passes Schema.org’s validator without warnings, not just without errors. Warnings matter: a missing priceRange or areaServed isn’t an error but it is a signal gap.

One specific thing I’ve tested: adding sameAs references pointing to other verified profiles (Companies House, LinkedIn, Crunchbase for B2B) improved citation rates noticeably for three of four clients. This is exactly the “entity graph” concept Heed Business Solutions describes in their there may only be room for one to three recommendations in an AI response — the AI is trying to resolve “is this entity the same as that entity” across sources, and sameAs is the literal instruction for how to do it.

Review volume and recency as ranking inputs

Review volume matters less than I expected. Review recency matters more. A listing with 40 reviews, 12 from the last six months, cites more reliably than a listing with 200 reviews, most from three years ago. The retriever seems to treat freshness as a proxy for “this business still exists and operates as described.”

The practical implication: a modest, ongoing review-collection process beats an aggressive historic push. Five genuine reviews per month forever is worth more than 200 reviews collected in a single 2022 campaign.

Did you know? The r/localseo community consistently reports that “AI pulls from pages that answer the question clearly, have strong structure, and line up with how the topic is described elsewhere online.” Structural consistency across the web is itself a citation signal — directories enforce that consistency by template, which is partly why they win.

Rebuilding Your Directory Strategy Around the Data

Prioritisation framework based on citation yield

If I were starting fresh tomorrow with a typical B2B services client, here’s how I’d rank effort:

Google Business Profile — fully completed, every field, weekly review monitoring. This is the base layer for local AI surfaces regardless of engine.
One or two niche vertical directories specific to the client’s industry (Clutch for agencies, Chambers for legal, Checkatrade for trades). These cite disproportionately within their verticals.
One editorial generalist directory — pick one where listings are human-reviewed rather than auto-approved. The editorial gate is itself a trust signal retrievers use.
Trustpilot or equivalent review platform — only if you have a genuine review collection process. An empty Trustpilot profile actively hurts you.
Industry association listings — often overlooked, often high-authority, almost always under-claimed.

Notice what’s not on this list: mass submission services, free directory bundles, anything described as “100 listings for £99.” These were noise five years ago and they’re poison now. Inconsistency risk outweighs any citation upside.

What to fix this quarter versus next year

Quarter one priorities (cheap, fast, high-impact):

Audit every existing listing for NAP consistency — fix the contradictions first
Complete the field-completeness checklist above on your top five listings
Add sameAs cross-references between your primary web presences
Establish a monthly review-collection cadence, however small

Year-one priorities (slower, harder, compounding):

Build branded search volume — because it’s the single strongest citation predictor, and you can’t fake it
Earn referring domains through genuine PR, not link schemes
Develop content that gets referenced by directories and industry publications (which then gets retrieved as corroboration)
Monitor citation frequency monthly and correlate against campaigns

What if… AI search feature usage continues its decline from 46% to 34.5% of queries, and the trend continues? Then real-time retrieval optimisation matters progressively less, and training-data presence matters progressively more. Training data updates on long cycles (months to years), so the businesses that show up in the 2027 model cutoffs are the ones building verifiable web presence now. If you’ve been waiting to see how AI search shakes out before investing, you’re already behind — because the corpus that trains tomorrow’s models is being crawled today.

Metrics worth tracking going forward

Stop tracking “AI visibility score” from vendors who won’t explain their methodology. Start tracking:

Citation frequency — run a fixed set of 50-100 queries monthly against ChatGPT, Perplexity and Gemini. Record which of your properties appear. Watch the trend, not the absolute number.
Citation diversity — how many different sources cite you? A single directory citing you consistently is less resilient than five independent sources.
Consistency score — percentage of your listings where NAP, category, and primary service description match. Aim for 100%, tolerate nothing below 95%.
Branded search volume — Google Search Console impressions for your exact brand name. This is the input variable for the 0.334 correlation. Move this and citations tend to follow.
Review recency — count of reviews in the last 90 days across all platforms.

Quick tip: Build your monthly query set around real customer questions, not keyword tools. Ask your sales team what prospects actually say on discovery calls. “Who’s the best X for Y situation?” phrasing generates different AI responses than keyword-style queries, and the former is closer to how buyers actually use these tools.

A short case walkthrough

A client in UK commercial surveying — I’ll skip naming them — came to me in early 2025 with near-zero AI citation visibility and a reasonable but not dominant traditional SEO presence. They had listings on eleven directories. Eight of those listings had subtly different company names (two used the trading name, three used the full legal name, three abbreviated it). Two listings had an old office address from before a 2023 move.

We did almost no “new” work for the first two months. We just cleaned up. Standardised the name across all listings, corrected the addresses, added complete LocalBusiness schema on their own site referencing the directory profiles via sameAs, and set up a one-review-per-fortnight collection process from happy clients.

Citation frequency across Perplexity and ChatGPT for their target query set roughly tripled over the following four months. Branded search volume rose about 40% in the same period, which I suspect is partly coincidence (a PR hit they had) and partly genuine knock-on effect. The work was unglamorous — spreadsheet work, email follow-ups, schema validation. None of it would make a conference talk. It worked because the baseline was broken, and fixing broken baselines is frequently where the biggest gains hide.

Did you know? Heed Business Solutions frames the shift to AI citation as moving from “ten blue links to one to three recommendations” — there may only be room for one to three recommendations in an AI response. The scarcity economics are brutal: being position eleven in traditional search still gets impressions; being the fourth recommendation in an AI response gets nothing.

One honest caveat

I’ve made the case for directories strongly here, because the data supports it. But I’ll flag the contradiction in my own thinking: the ChatGPT search-feature usage decline from 46% to 34.5% genuinely worries me. If real-time retrieval becomes a smaller portion of how AI answers queries, the citation-dependent tactics I’ve outlined shrink in relative value. The training-data route — becoming the kind of brand whose name gets baked into the next model through sheer ubiquity — wins in that world, and that’s a harder, slower, more expensive game.

My working hypothesis is that both channels matter and will continue to matter, with retrieval more important for freshness-sensitive queries (current pricing, recent reviews, new businesses) and training data more important for established-entity queries (who are the well-known players in X). Directories happen to feed both: they’re retrieval-friendly today, and their content gets scraped into tomorrow’s training corpora. That dual role is probably why the 47% figure holds up better than I expected when I first started tracking it.

The practitioners who’ll be cited in 2027 aren’t the ones running the loudest AI-SEO campaigns today. They’re the ones cleaning up their NAP inconsistencies, collecting genuine reviews, maintaining schema completeness across a handful of carefully chosen directories, and building the branded search volume that will eventually show up in the next generation of training data. Boring work, compounding returns — pick up the spreadsheet.