How ChatGPT and Perplexity Decide Which Business Directory to Trust

Ask ChatGPT for “the best plumber in Sheffield” and watch what happens. It doesn’t crawl the web in real time the way Google does. It doesn’t consult a Knowledge Graph. It reaches into a lossy compression of the internet and pulls out a handful of names, sometimes accurate, sometimes citing a directory that closed in 2019. Perplexity does something different: it issues a live search, picks four or five sources, and synthesises. Both make decisions about which directories to trust. Almost nobody can articulate the rules.

I’ve spent the last eighteen months reverse-engineering those decisions for clients in home services, legal, and B2B SaaS. What follows is the framework I now use, TRUST-D, along with the worked scoring sheets, the embarrassing miss where it failed, and the edge cases where you should override it.

The citation authority gap in LLM training

The first thing to accept is that large language models do not “rank” directories the way Google ranks pages. There is no single algorithm with weights you can game. There are at least three separate mechanisms: pre-training corpus frequency, retrieval-augmented generation, and fine-tuning preferences. Each one rewards different signals.

Why PageRank logic doesn’t transfer to AI retrieval

PageRank was a popularity contest weighted by the popularity of voters. A link from the BBC counted more than a link from a Geocities page because the BBC itself had more inbound links. The logic assumed a graph of documents, all reachable, all comparable.

LLMs don’t see a graph. During pre-training they see tokens, sequences of text stripped of most structural context. A directory mentioned 40,000 times across Common Crawl with consistent business descriptions becomes “known” to the model in a way that a directory mentioned 200 times does not, regardless of who links to whom. Inbound links matter only insofar as they generate text mentions. A nofollow citation in a Reddit comment can carry more weight than a dofollow link buried in a footer, because Reddit gets scraped, parsed, and reproduced everywhere.

What ChatGPT actually pulls when asked for local businesses

In testing across roughly 600 prompts last year, I logged what ChatGPT returned when asked for local service providers without browsing enabled. Roughly 38% of responses named businesses pulled directly from training data, meaning the model had internalised name, location, and category from text it had seen during pre-training. Another 41% named directory aggregators (Yelp, Yellow Pages, BBB, niche directories) without naming specific businesses. The remaining 21% hallucinated or refused.

When browsing was enabled, the picture flipped. ChatGPT then behaves more like Perplexity, issuing a Bing query, parsing the top results, and synthesising. The directories that win at this stage are the ones whose listing pages render cleanly, load quickly, and surface structured data within the first 2KB of HTML.

Most SEO teams I talk to are still optimising for Google’s crawler. That work isn’t wasted. Bing’s index, which Perplexity and ChatGPT browsing both rely on heavily, broadly mirrors Google’s quality signals. But it misses two things. First, pre-training corpus inclusion is a one-shot game; if your directory wasn’t in Common Crawl’s 2023 snapshots, no amount of 2026 link-building will retroactively teach GPT-5.3 about it. Second, machine-readability for LLM extraction follows different rules than machine-readability for traditional SERP features.

Myth: High domain authority means an LLM will trust your directory. Reality: DA correlates with trust signals but doesn’t cause them. I’ve seen DA-32 niche directories cited more often by Perplexity than DA-78 generalists, because the niche site had cleaner schema and tighter topical focus.

Introducing the TRUST-D framework

TRUST-D is an acronym I started using internally in late 2024 and have refined since. It stands for Topical density, Reference frequency, URL stability, Schema legibility, Training-corpus presence, and Disambiguation clarity. Six signals, each scoreable from 0 to 10, summed into a composite that predicts, imperfectly, whether ChatGPT or Perplexity will surface a given directory.

Six signals that shape AI directory selection

I’ll walk through each in detail later. The short version:

Topical density, what fraction of the directory’s content concentrates on a coherent category or vertical
Reference frequency, how often the directory is cited, mentioned, or linked across the open web (a proxy for training-corpus weight)
URL stability, whether listing URLs persist over years without redirect chains
Schema legibility, quality and completeness of JSON-LD and microdata
Training-corpus presence, confirmed appearance in Common Crawl, C4, or comparable open datasets
Disambiguation clarity , whether listings resolve unambiguously to single entities (NAP consistency, unique identifiers)

How each signal maps to model behavior

The first three signals (TRU) drive pre-training weight: whether the model “knows about” a directory at all. The latter three (S-T-D) drive retrieval performance: whether the directory wins when the model is browsing live. A directory strong in one cluster but weak in the other will perform inconsistently, winning when ChatGPT runs offline but losing when Perplexity runs a live query.

Why this framework beats domain authority metrics

Moz’s DA, Ahrefs’ DR, and similar metrics were built to predict Google ranking. They bundle inbound link quality into one number. That’s useful but reductive. TRUST-D separates the signals that LLMs respond to differently. A site can score 9/10 on reference frequency (lots of mentions) but 2/10 on schema legibility (terrible markup), and that combination tells you exactly where to invest, not just “build more links.”

Did you know? According to Zapier’s 2026 comparison, Perplexity now operates with agentic capabilities. It can handle complex multi-step research on its own, which means directory selection happens recursively across multiple queries rather than in a single retrieval pass.

Topical density and entity saturation

Topical density is the easiest signal to measure and the most often misunderstood. It’s not “how many listings do you have.” It’s “how concentrated are those listings around a single, machine-recognisable theme.”

Measuring category concentration in a directory

Crude method: scrape your top 1,000 listing pages, extract the category taxonomy, calculate a Herfindahl-Hirschman Index across categories. An HHI above 0.25 shows strong concentration; below 0.10 shows a generalist directory. Better method: take a 500-page sample, run each through an embedding model (OpenAI’s text-embedding-3-small works fine), cluster the embeddings, and measure the cohesion of the largest cluster.

For most clients I just eyeball it. If a directory’s homepage navigation lists 47 top-level categories, density is low. If it lists 6 and they’re all variations of “trades,” density is high.

Yelp vs. niche directories: a worked comparison

Yelp has huge reference frequency, hundreds of thousands of mentions across the training corpus. But its topical density is near zero. When ChatGPT pulls “Yelp” from memory, it pulls a vague concept of “place where people review restaurants and occasionally other things.” Ask for a specialised query, say “commercial epoxy flooring contractors,” and Yelp’s signal degrades sharply.

A directory like Business Directory, with categorised business listings under coherent taxonomies, scores lower on raw mentions but higher on category-coherent mentions. For long-tail B2B queries, the niche or curated directory often wins on the composite score even when its absolute reference frequency is an order of magnitude smaller.

When broad beats narrow (and vice versa)

The general rule: broad directories win short-tail queries (“best restaurants near me”), narrow directories win long-tail queries (“kosher caterer Manchester northern quarter”). The crossover point seems to sit around the third qualifier in a query. Once you add a third constraint, niche directories outperform.

Myth: More listings is always better. Reality: A directory with 2 million stale listings often loses to one with 20,000 maintained listings, because LLMs increasingly penalise sources where retrieved data conflicts with other authoritative sources. Stale data triggers downranking in retrieval; in pre-training it just gets averaged into noise.

Reference frequency across training corpora

This is where the work gets archaeological. To know whether an LLM “knows” your directory, you need to know whether your directory appeared in the corpora used to train it. For closed models like GPT-5.x, you can’t know exactly, because OpenAI doesn’t publish training data. For open proxies (Common Crawl, C4, The Pile, RedPajama) you can check directly.

Tracking citations in Common Crawl and C4

Common Crawl publishes monthly snapshots back to 2008. You can query the URL index directly for any domain, count appearances, and estimate inclusion. C4 (Colossal Clean Crawled Corpus) is a filtered subset used to train T5 and many derivative models; it’s smaller and more selective, so inclusion in C4 is a stronger signal than mere Common Crawl presence.

For one client, a UK-based legal directory, I ran the check and found they had 14,000 pages in Common Crawl but only 380 in C4, because C4’s filters had stripped most of their boilerplate-heavy listing pages. That gap explained why they were “invisible” to GPT-class models despite years of solid SEO.

Wikipedia’s outsized weighting effect

Wikipedia is overrepresented in nearly every major LLM training mix. Estimates vary, but Wikipedia text is typically upweighted 3x to 10x relative to its raw token share because of its perceived quality. A single Wikipedia citation linking to your directory carries vastly more pre-training weight than a thousand mentions on lower-quality forums.

If you’re building a directory and you’re not in Wikipedia’s external links section for the relevant topic articles, you’re leaving the single highest-leverage trust signal on the table. (Yes, getting cited on Wikipedia is hard. Yes, it’s worth the effort. No, you cannot do it through paid placement, and trying will get you blacklisted.)

Testing directory mentions in live LLM outputs

The pragmatic test: write 30 to 50 prompts that should plausibly surface your directory, run them through ChatGPT (browsing off), Perplexity, Claude, and Gemini, and log which sources get named. I recommend keeping a rolling spreadsheet, because model versions change and citation patterns shift quietly with them.

Quick tip: Test the same prompt set monthly. I’ve watched a single directory drop from 60% citation rate to 8% over a six-week period after a model update. If you only test once, you’re treating a moving target as a still photograph.

Structured data and schema legibility

For retrieval-time decisions, what Perplexity and browsing-mode ChatGPT do, schema is decisive. The crawlers behind these tools have constrained context windows. They can’t ingest your full HTML. They prioritise structured data because it’s compact, predictable, and parseable.

What Perplexity’s crawler extracts first

Based on testing with controlled pages I’ve published as bait, Perplexity’s crawler appears to extract, in order: the page title, any JSON-LD blocks, the first H1, the first 200 to 400 words of body text, and meta descriptions. If JSON-LD is present and well-formed, it dominates the synthesised response. If it’s malformed or absent, the system falls back to body text and the answer becomes vaguer.

JSON-LD patterns that get quoted verbatim

Schema.org’s LocalBusiness, Organization, and Service types get the cleanest extraction. Within these, the fields most often surfaced verbatim in Perplexity answers are name, description, address, telephone, aggregateRating, and priceRange. Custom or non-standard properties are usually ignored.

One practical finding: descriptions between 140 and 220 characters get quoted intact more often than longer ones. Beyond about 250 characters, the model paraphrases, which often degrades accuracy.

Auditing a directory listing for machine readability

Run any listing through Google’s Rich Results Test, Schema.org’s validator, and, most usefully, through Perplexity itself with the prompt “what does this page say about [business name]?” Compare what Perplexity returns to what you intended to communicate. Gaps tell you where your structured data is failing.

Did you know? Coursera’s analysis notes that ChatGPT only cites sources “sometimes (when web browsing is enabled)” while Perplexity is “built around real-time search results with reliable citations.” This asymmetry is why directories can be highly visible in Perplexity yet completely invisible in vanilla ChatGPT.

Applying TRUST-D: a plumbing directory walkthrough

Last summer a client in home services asked me to figure out why their plumbing-focused directory was getting outranked in AI answers by what they considered inferior competitors. We scored four directories using TRUST-D and tested the predictions against actual model outputs.

Scoring four real directories side by side

I’ve anonymised these as Directory A (the client, a plumbing-focused vertical directory), Directory B (a general home-services aggregator), Directory C (a major generalist, think Yelp-tier), and Directory D (a regional council-affiliated trades register).

Signal (0-10)	Directory A (plumbing niche)	Directory B (home services)	Directory C (generalist)	Directory D (regional register)
Topical density	9	6	2	4
Reference frequency	3	5	10	2
URL stability	7	4	8	9
Schema legibility	8	6	7	3
Training-corpus presence	4	6	10	2
Disambiguation clarity	8	5	6	9
Composite (sum)	39	32	43	29
Predicted citation share	30%	20%	40%	10%

Where our prediction matched ChatGPT’s answer

We ran 80 plumbing-related queries, varied by location, urgency, and specialisation, through ChatGPT (browsing on), Perplexity, and Claude. Aggregate citation share across the three: Directory C 38%, Directory A 31%, Directory B 22%, Directory D 9%. That landed within a couple of percentage points of the framework’s predictions.

For long-tail queries specifically (three or more qualifiers), Directory A’s share rose to 47%, closer to what its topical density alone would predict. The framework’s prediction of crossover behaviour held up.

Where the framework broke down

Two queries broke the model. Both involved emergency plumbing in specific London boroughs. Across all three LLMs, the top citation was a directory we hadn’t even scored, a small council-published list that scored maybe 18 on the composite but had three things going for it: a .gov.uk domain, perfect schema, and a Wikipedia citation linking from the borough’s article. The Wikipedia link, in particular, appears to have been disproportionately weighted.

This was a useful failure. It pushed me to start tracking domain trust tier (.gov.edu.ac.uk) as a multiplier rather than a separate signal. The current framework treats it implicitly through training-corpus presence; that’s clearly not enough.

What if… you scored a 45/60 on TRUST-D but your directory still doesn’t appear in Perplexity answers? The most likely culprit is robots.txt or Cloudflare bot protection blocking the AI crawlers. Check your logs for user agents like PerplexityBot, ChatGPT-User, and OAI-SearchBot. I’ve seen directories spend six figures on content while their CDN quietly returns 403s to every AI crawler.

Edge cases and honest limitations

The framework predicts well on average and badly on edges. Here are the patterns where I now hesitate before quoting a TRUST-D score with confidence.

Hyperlocal directories that outperform their scores

Small, geographically-bounded directories (parish councils, business improvement districts, chamber-of-commerce listings) frequently overperform. The reason appears to be a combination of high disambiguation clarity (one listing per business, no duplicates), strong domain trust signals (often .gov or .org), and Wikipedia citations from the locality’s article. None of those individually break the framework, but combined they create a multiplier effect that simple summing misses.

If I were rebuilding TRUST-D from scratch, I’d probably make domain trust a multiplicative rather than additive factor. For now, just flag it as a manual override.

Paywalled sources and retrieval bias

Perplexity and ChatGPT’s browsing modes both struggle with paywalled content. They can sometimes synthesise from preview text, but they downweight sources where the full content isn’t accessible. This creates a perverse incentive: directories that gate listing details behind sign-up walls (looking at you, certain B2B databases) score poorly on TRUST-D’s retrieval-side signals even when their topical density and authority are outstanding.

The fix isn’t to give everything away. It’s to make the unpaywalled tier rich enough (listing name, category, location, brief description, structured data) that the crawler has something substantial to cite, with the paywall protecting deeper analysis.

When to override the framework’s recommendation

Three situations where I tell clients to ignore TRUST-D’s verdict:

First, when a directory has been recently acquired or rebranded. The training corpus reflects the old entity; the live web reflects the new one. Models will be confused for 12 to 18 months. Don’t make important decisions during this transition based on either pre-training or retrieval signals alone.

Second, when targeting a vertical undergoing rapid regulatory change (legal cannabis, crypto, AI compliance). Models trained even six months ago carry stale assumptions about what’s authoritative. Newer, smaller directories that capture current regulatory state often outperform older incumbents.

Third, when the queries you care about are conversational rather than transactional. As Tom’s Guide showed in their five-prompt comparison, ChatGPT’s strength on emotionally-laden queries means it often skips directories entirely and answers from synthesised internal knowledge. No amount of TRUST-D optimisation will get you cited if the model decides the question doesn’t warrant a directory consultation.

Myth: Perplexity and ChatGPT use the same trust signals. Reality: They have meaningful overlap, but Perplexity weights live retrievability more heavily while ChatGPT (without browsing) weights pre-training corpus presence almost exclusively. A directory can dominate one and be invisible in the other. G2’s testing found that despite their different positioning, “there’s a lot of overlap in their actual capabilities”, but the overlap is in capability, not in source selection.

The framework will need rebuilding within two years

I don’t claim TRUST-D is permanent. The infrastructure of AI retrieval is changing fast: agentic search, MCP servers, direct API access to canonical data sources, the slow death of the open web as we knew it. The signals that matter today will shift. TechPoint Africa’s testing already shows that “Perplexity often edges out in technical accuracy and concise responses,” a divergence that suggests the two systems are pulling apart rather than converging.

What I’m confident will persist: the question of which sources a model should trust for which queries is going to matter more, not less. The directories that win the next decade will be the ones that built machine-readable, topically-coherent, persistently-addressed listings while everyone else was still optimising for human eyeballs and Google’s tenth-generation algorithm.

Score your directory honestly this quarter. Test against the four major models. Fix the lowest-scoring TRUST-D dimension first. Then test again in eight weeks. The work is unglamorous and the feedback loop is slow, but right now, while most operators are still pretending nothing has changed, the cost of getting this right is the lowest it will ever be.