If a generative model has to choose between citing a brand’s own website and citing its Yelp page, why does it so often pick the directory?
That question hangs over almost every conversation about discoverability in 2025. Marketing teams have spent two decades building content hubs, schema-rich landing pages, and pillar architectures designed to please Google’s crawler. Then a model trained on web-scale text and structured data started answering customer queries — and the citations that surfaced did not match the assets the brand had spent the most money producing. They came from listings. Aggregator pages. Map data. Review platforms. The expensive thought-leadership PDFs sat unread in the corner.
This article examines the evidence behind that pattern and isolates four authority signals that large language models appear to weight when reading business listings: citation consistency, review velocity and sentiment density, category depth and entity linking, and photo and media authority. The argument is empirical rather than rhetorical — what the data suggests, what it does not, and where practitioners are still flying blind.
The 73% Citation Gap Nobody Talks About
Across a sample of branded queries posed to consumer-grade generative assistants in mid-2024, the proportion of responses that cited a third-party listing or directory page rather than the brand’s own domain hovered around 73%. That is not a published academic number; it is a recurring observation from agency-side audits and a figure that should be treated with appropriate caution. But the directional pattern shows up everywhere practitioners look. When a model needs to ground a factual claim about a local business — its hours, its specialism, its reputation — it tends to reach for a structured listing, not a marketing page.
The interesting question is not whether the gap exists. It is why models behave this way, and which signals inside a listing actually move the needle.
How AI Models Weight Listing Signals
Large language models do not “weight” anything in the way a search algorithm weights backlinks. They learn statistical associations from training corpora and, increasingly, retrieve grounded snippets at inference time from indexes their providers have curated. Two mechanisms are relevant here. First, repetition: a fact that appears in twenty consistent forms across a training corpus becomes a high-confidence association. Second, structural cleanliness: information served as JSON-LD, microdata, or in tabular form is easier to parse than prose buried in a hero section.
Listings tend to win on both counts. A canonical Google Business Profile is republished, scraped, syndicated, and re-presented across hundreds of downstream surfaces. Each republication reinforces the same name, address, phone number, and category triple. Meanwhile, the brand’s own “About” page may exist in exactly one place, with phrasing that varies between the homepage, the footer, and the contact page.
As Douglas Merrill argued in Harvard Business Review (2012), the practical work of reading signals in data is less about exotic techniques and more about disciplined attention to which inputs are reliable and which are noisy. Models, in their crude way, are doing exactly this: they trust inputs that agree with themselves.
Measuring Authority Across Platforms
Authority, in the listing context, is a composite. It is partly the consistency of the underlying record across platforms; partly the volume and pattern of public engagement (reviews, photos, questions); partly the structural metadata that connects the entity to a known graph of concepts; and partly the freshness of the whole assembly. No single platform expresses authority cleanly. A 4.9-star average on Google means little if the same business appears on TripAdvisor with a misspelled name, a different phone number, and three reviews from 2019.
Measuring authority across platforms therefore requires triangulation. Practitioners who only look at their primary listing will miss the way a model assembles its picture from the cumulative agreement — or disagreement — among sources.
Why Listings Outrank Websites in LLM Training Data
Three structural reasons explain the asymmetry. Listings are densely linked: each profile sits inside a category taxonomy, a geographic hierarchy, and a review network, giving the model multiple paths to the same entity. Listings are deduplicated by aggregators: data providers like Foursquare, Factual (now part of Foursquare), and Localeze normalise records before redistribution, which means a model often sees a clean canonical version. And listings carry implicit verification signals — claimed status, owner replies, photo uploads — that resemble the kind of provenance markers a retrieval system would actively prefer.
By contrast, brand websites are messy. They use bespoke navigation, inconsistent schema, and content that frequently contradicts itself across product, marketing, and legal pages. A model parsing a corporate site is doing harder work for a noisier reward.
The Sample Set Behind These Findings
The figures discussed throughout this article draw on three pools of evidence. The first is the published academic and industry literature cited inline — Harvard Business Review, Forrester, eMarketer, MIT Sloan Management Review, Deloitte and Brookings reporting that bears, sometimes obliquely, on signal interpretation in digital systems. The second is a set of agency-side audits, where SEO teams have documented before-and-after changes in citation frequency by ChatGPT, Perplexity, Gemini, and Claude after structured listing remediation. The third is platform-published data: Google’s own disclosures about how Business Profiles are used, Apple Maps’ developer documentation, and the public schema specifications that govern how structured data is exchanged.
None of these pools is, on its own, sufficient. Read together, they sketch a coherent picture — but a sketch, not a photograph. Where evidence is strong, this article says so. Where it is suggestive, the article says that too.
Strong Versus Weak Evidence Categories
Strong evidence, in this context, means a finding that has been measured across multiple independent samples, reproduced by different teams, and reported with consistent magnitudes. The link between NAP consistency and local pack ranking, for example, has been replicated for over a decade. Weak evidence means a finding that has been observed once or twice, in narrow conditions, or reported by a vendor with a commercial interest in the result. The claim that “video listings get 3x more AI citations” sits firmly in the weak category until somebody runs a controlled experiment.
Practitioners should hold both kinds of evidence in mind simultaneously. The temptation is to act only on the strong stuff and ignore the weak. But weak digital signals, as the MIT Sloan Management Review has noted, often precede strong ones; the cost of ignoring them entirely is missing the next shift.
Signal One: Citation Consistency Across Directories
NAP Variance and Trust Decay
Name, address, phone — NAP — is the oldest local SEO signal and, as the data suggests, the one large language models have absorbed most thoroughly. When the same triple appears identically across dozens of listings, a model treats it as a high-confidence fact. When the triple varies — “Smith & Co Ltd” versus “Smith and Company Limited” versus “Smith Co” — the model’s confidence decays. In retrieval-augmented systems, decayed confidence often manifests as omission rather than error: the model declines to cite the entity at all.
One agency audit of a 240-location restaurant group found that locations with NAP consistency scores above 95% were cited by ChatGPT in 62% of relevant queries; locations with consistency below 70% were cited in 11%. The numbers are not generalisable beyond the sample, but the gradient is striking and consistent with what local SEO professionals have measured against Google’s own ranking systems for years.
Cross-Platform Match Rates by Industry
Match rates vary by industry in ways that surprise people. Restaurants and hospitality businesses tend to have high match rates because their data is aggressively syndicated by reservation platforms. Professional services — law firms, accountants, consultants — have notoriously low match rates because partners join, leave, and rebrand without updating every listing. Healthcare sits in between, complicated by the fact that practitioner records and practice records often live in separate hierarchies.
Multi-location retailers have a further wrinkle: corporate marketing teams often push canonical data from a master record, while individual store managers update their own profiles inconsistently. The resulting variance is not random; it follows the org chart.
Threshold Where AI Stops Citing You
There is no public threshold disclosed by any model provider. But empirical observation suggests a soft cliff somewhere in the 70–80% consistency range. Below it, citation frequency drops sharply. Above 90%, the marginal returns of further consistency improvements appear to flatten. This is consistent with how confidence thresholds behave in many machine learning contexts: a sigmoid curve, not a linear ramp.
Practitioners chasing the last percentage point of consistency may be over-investing. The first jump, from 60% to 85%, is where the meaningful gains live.
Measured Impact on ChatGPT Mentions
In one widely-circulated case study from a regional dental group, remediation of NAP variance across 47 listings — taking the consistency score from 64% to 91% over six weeks — corresponded with a roughly fourfold increase in ChatGPT mentions for branded and category queries. The case is anecdotal and uncontrolled. It is not proof. But it sits alongside enough similar reports to constitute a pattern worth taking seriously.
Forrester research on buying signals, while focused on B2B contexts, observes that buyers now expect providers to “use information to deliver relevant experiences.” The same logic applies to models: when the data is internally consistent, the system has something to deliver.
Auditing Your Current Consistency Score
An audit begins with a fixed canonical record — the version of the business name, address, and phone that the organisation considers official. Every listing is then compared character by character against this canonical. Tools like Yext, BrightLocal, Whitespark, and Moz Local automate this. The resulting score is rarely as high as the marketing team expects. It is common to find that 30–40% of listings carry some form of variance, often introduced by aggregators that reformat data during ingestion.
The remediation is laborious but mechanical. The pay-off, in citation frequency, tends to materialise within a few weeks of platforms re-indexing the corrected records.
Signal Two: Review Velocity and Sentiment Density
Volume, Recency and Semantic Patterns
Review signals are richer than star averages. Models appear to read three things from a review corpus: how many reviews exist, how recently they were posted, and what semantic patterns they contain. A business with 400 reviews averaging 4.6 stars, all posted between 2017 and 2020, looks dead. A business with 80 reviews averaging 4.4 stars, with a steady cadence of new entries every month, looks alive — and “alive” is what models implicitly prefer when they ground a recommendation.
Semantic density refers to the specificity and variety of language across the review set. A corpus where 200 reviews all say “great service” carries less informational weight than a corpus where reviews mention specific menu items, named staff members, particular procedures, or distinctive aspects of the experience. The latter gives a model substrate for grounded answers; the former gives it nothing to say beyond a numerical average.
The eMarketer reporting on signal loss is instructive here, even though it concerns advertising rather than local search. According to eMarketer, 89% of US brand and agency buyers reported in February 2024 that personalisation tactics had been somewhat or significantly impacted by privacy legislation and signal loss. The tactical implication is that first-party signals — including reviews, which customers volunteer publicly — have become disproportionately valuable as third-party signals erode. Reviews are one of the few channels where rich, identifiable, semantically dense customer data remains both legal and abundant.
Velocity matters because models trained or fine-tuned on rolling corpora favour recent data. A review posted last week is more likely to surface in a grounded response than one from three years ago, even if the older review is more articulate. This creates a structural reason for businesses to treat review acquisition as a continuous operational discipline rather than an episodic campaign.
Sentiment patterns matter because models are increasingly capable of distinguishing between superficially positive language (“nice place”) and substantively positive language (“the consultant explained the warranty exclusions clearly before I signed”). The latter contains evidence; the former contains affect. When models cite reviews to justify a recommendation, they tend to quote the evidence-bearing fragments.
One quietly important point about the semantic dimension: review responses by the business itself contribute to the corpus. A thoughtful owner reply that re-states the service offered, acknowledges a specific complaint, and references a resolution adds entity-relevant text to the listing. That text is read alongside the customer’s review. Generic “Thank you for your feedback!” replies waste the surface area.
Forrester’s observation, in its work on buying signals, that “linear tactic sequences” no longer suffice and that buyers’ needs are “in constant motion” applies neatly to the review surface. Reviews are not a static asset. They are an ongoing conversation between the business, its customers, and now the models that read both sides.
Signal Three: Category Depth and Entity Linking
Primary Versus Secondary Category Weighting
Category selection looks trivial. It is not. The primary category of a Google Business Profile or Apple Maps listing functions as the principal entity-type assignment that downstream systems inherit. A “law firm” classified as “law firm” is generic. A “law firm” classified as “intellectual property law firm” is specific, and that specificity propagates into the way models answer queries about IP law in a given city.
Google Business Profile Category Data
Google publishes roughly 4,000 categories, of which a meaningful subset are useful for any given business. Practitioners commonly choose a primary category that is too broad — “Restaurant” rather than “Vietnamese restaurant”; “Doctor” rather than “Endocrinologist”. The cost of broadness is that the listing competes in a crowded entity class with weak distinguishing features. Models faced with the query “best Vietnamese restaurant in Bristol” will preferentially cite the listings whose primary category exactly matches.
Secondary categories provide additional entity links but carry less weight. They are useful for capturing legitimate adjacencies — a restaurant that genuinely also functions as a wine bar — but should not be used to claim coverage the business does not actually deliver. Models cross-check secondary category claims against review content and photo content; mismatches dampen authority rather than enhancing it.
Apple Maps Category Comparisons
Apple Maps uses a different, smaller category taxonomy with somewhat different semantic boundaries. A business should not assume that the categories chosen on Google translate one-to-one. Apple’s taxonomy tends to be more rigid, which has the effect of producing cleaner entity assignments at the cost of finer granularity. For models that ingest Apple’s data — and there is reason to believe several major providers do — the primary category on Apple Maps is consequential even when traffic from Apple Maps itself is modest.
Entity Disambiguation Through Categories
Categories do work that names cannot. Two businesses called “Atlas Consulting” can only be distinguished by what they do, where they are, and how they are categorised. Without category data, a model has no reliable way to know which Atlas the user is asking about. Category depth — the specificity of the assignment — is therefore a disambiguation signal as much as a relevance signal.
This matters most for businesses whose names are common nouns or nouns shared with other entities. A bakery called “The Daily Bread” needs aggressive category specificity to avoid being conflated with the religious phrase. A consultancy called “Compass” needs disambiguation to avoid the cartographic association.
Knowledge Graph Connections
Wikidata Cross-References
Wikidata is, for better or worse, one of the most influential structured data sources informing how models disambiguate entities. A Wikidata entry for a business, with properties linking it to its parent company, founders, location, and industry codes, provides a rich anchor that models can resolve queries against. Most local businesses do not have Wikidata entries, and those that do often have incomplete ones. Investing in a properly populated Wikidata record — particularly the P-properties for industry classification, headquarters location, and parent organisation — is a low-cost, high-impact move.
Schema.org Structured Data Pickup
Schema.org markup on the brand’s own website extends and reinforces the listing data. The most relevant types for a local business are LocalBusiness (with its many subtypes), Organization, Place, and Review. When the schema on the website agrees with the data in the listings, the model has a coherent entity to bind to. When the schema is missing or inconsistent, the website becomes a weak input rather than a reinforcing one.
Particular fields that practitioners under-use include sameAs (which explicitly links the entity to its profiles on social platforms, Wikipedia, Wikidata, and industry directories), areaServed (which constrains geographic relevance), and knowsAbout (which signals topical authority). Each of these fields is read by structured data parsers and contributes to how the model resolves the entity.
Measured Lift From Category Refinement
Before-and-After Citation Frequency
In the cleanest case studies — and clean cases are rare in this domain — category refinement alone has produced measurable lift in citation frequency. One regional veterinary chain that moved from a generic “Veterinarian” primary category to “Emergency veterinarian service” on the locations that genuinely offered 24-hour care reported a roughly threefold increase in mentions when prompts contained urgency language (“my dog ate something now”). The refinement worked because it aligned the listing’s claimed entity type with the semantic shape of the query.
Statistical Significance of Results
Most before-and-after studies in this space lack the controls needed to claim statistical significance in a formal sense. The samples are small. The interventions are not isolated; teams typically refine categories and update photos and prompt for reviews simultaneously. Confounders abound. What can be said is that the direction of effect is consistent across many such reports, and that the magnitude is large enough to dwarf the noise floor in most cases.
Practitioners should be cautious about citing a specific multiplier as gospel. The honest claim is: refining category depth tends to increase citation frequency, often substantially, and the cost of doing it is low.
Signal Four: Photo and Media Authority
Image Volume and Update Cadence
Owner-Uploaded Versus User Photos
Photos function as both content and verification. A listing with a hundred owner-uploaded photos, updated monthly, signals an active, present operator. A listing with three photos from 2018 signals a dormant entity, regardless of whether the business is in fact thriving. Models, like search algorithms before them, treat the photo cadence as a liveness check.
The split between owner-uploaded and user-uploaded photos is informative in a different way. Owner photos tell the model what the business says about itself; user photos tell the model what customers actually see. Disagreement between the two — slick studio shots from the owner versus grainy customer shots showing a different interior — is a corroboration failure that models can and do detect.
Geotagged Image Verification
Geotagged photos, where the EXIF location matches the listed address within a reasonable margin, function as a quiet verification signal. Most consumer-grade phones embed location metadata by default. When a business uploads photos taken at the listed location, the metadata corroborates the address. When photos are uploaded from a marketing agency in a different city, the metadata tells a different story — though most platforms strip EXIF data on upload, which mutes this signal in practice. Where the data survives, it informs verification.
Monthly Upload Benchmarks
There is no published benchmark for monthly photo uploads, but informal industry guidance clusters around the idea that two to four owner photos per month, plus an active user-upload stream, is enough to maintain a “fresh” signal. Less than one upload per month begins to read as inactive. More than ten per month is fine but produces diminishing returns; the ceiling is set by the model’s interest in photographic novelty, which is finite.
Alt Text and EXIF Data Reading
What Vision Models Extract
Modern vision-language models extract a surprisingly rich set of features from photographs. Object recognition identifies what is in the frame: dishes, products, people, equipment, signage. Scene classification labels the setting: interior, exterior, retail, medical, hospitality. Text recognition reads visible signs, menus, and product labels. Aesthetic scoring approximates how pleasant the image is to look at. Each of these features can become a token in the entity’s representation.
The implication is that a listing’s photographic content is, in effect, additional unstructured text from the model’s point of view. A photo of a menu adds menu items to the entity’s profile. A photo of staff in scrubs adds healthcare context. A photo of a dusty unused corner adds nothing — or worse, adds confusion.
Metadata Fields That Matter
Where platforms preserve metadata, the fields that matter are the timestamp (which establishes recency), the location coordinates (which corroborate the address), the camera model (which is a weak liveness signal — a phone is more credible than a 2009 DSLR for current operations), and any embedded captions or descriptions. Most of this is invisible to humans browsing the listing. It is read by machines.
Video Presence Across Listings
YouTube Channel Linkage
A business with a linked YouTube channel — referenced in the website’s schema sameAs property and verified through ownership — gains a connection to a content-rich, well-indexed entity graph. YouTube is exhaustively scraped, transcribed, and indexed by many of the same systems that train language models. Linking a YouTube channel that contains genuine business content (procedures explained, products demonstrated, premises toured) extends the model’s ability to ground claims about the business.
Embedded Video Performance
Videos embedded directly into listings, where the platform supports it, behave similarly to photos but with richer payload. A 30-second walkthrough of the premises functions as an immersive verification: the model can extract the layout, the signage, the product range, the apparent activity level. Performance metrics on these embeds — views, completion rates — are less consequential than the existence of the asset.
Caption and Transcript Signals
Captions and transcripts are where video becomes legible to language models. A video without a transcript is, to the model, just a set of frames sampled at intervals. A video with a clean transcript is text — searchable, embeddable, citable. Closed captions on YouTube, transcripts in podcast feeds, and subtitle files attached to social video are unglamorous infrastructure that disproportionately move the model’s understanding of the entity.
Comparative Data Table Across Four Signals
To bring the four signals into a single view, the following comparison sets out the strength of evidence, typical effort, and time-to-impact for each. See Table 1 for a comparison of the relative weight, audit difficulty, and remediation tempo associated with each signal as observed across published case material and agency-side audits.
Table 1: Comparative profile of the four authority signals AI models read from business listings
| Signal | Evidence strength | Typical audit effort | Time to measurable impact | Cost profile |
|---|---|---|---|---|
| NAP consistency across directories | Strong; replicated for over a decade | Low (automated tooling) | 2–6 weeks | Low — mostly tooling and labour |
| Aggregator data syndication | Strong; documented by data providers | Medium | 4–12 weeks | Medium — provider fees |
| Review volume | Strong; correlates with citation frequency | Medium | 3–6 months | Medium — outreach systems |
| Review recency | Strong; recency decay observable | Low | Continuous | Low if integrated into operations |
| Review semantic density | Moderate; harder to engineer ethically | High | 6–12 months | Medium — staff training |
| Owner reply quality on reviews | Moderate; under-studied | Low | 2–4 weeks | Low — operational discipline |
| Primary category specificity | Strong; aligns with query semantics | Low | 4–8 weeks | Negligible |
| Secondary category breadth | Moderate; diminishing returns | Low | 4–8 weeks | Negligible |
| Wikidata entity entry | Moderate; rare but high impact | Medium | 3–6 months | Low |
| Schema.org structured data on website | Strong; long-established | Medium | 4–12 weeks | Low to medium |
| Owner-uploaded photo cadence | Moderate; correlates with liveness | Low | 1–3 months | Low |
| User-uploaded photo volume | Moderate; harder to influence | Low to monitor | Continuous | Low |
| Geotagged image metadata | Weak in practice; often stripped | Medium | Variable | Low |
| Linked YouTube channel | Moderate; depends on content | High | 3–9 months | Medium to high |
| Video transcripts and captions | Moderate; underused | Medium | 1–3 months | Low |
| Embedded video on listings | Weak; platform-dependent | Medium | Variable | Medium |
| Cross-platform sameAs linkage | Strong; explicit graph signal | Low | 4–8 weeks | Negligible |
The shape of the table matters as much as any individual cell. Strong evidence clusters around the structural signals — consistency, categorisation, schema, sameAs — while the media-related signals trail behind in evidentiary clarity. This does not mean media signals are unimportant; it means the field has not yet measured them as carefully.
Correlating Media Depth With AI Citations
Across the case material reviewed for this article, businesses in the top quartile of media depth — measured as a composite of photo count, photo recency, video presence, and transcript availability — were cited more frequently in generative answers than businesses in the bottom quartile. The correlation is real but the causal direction is unclear. Businesses that invest in media depth also tend to invest in everything else; isolating the contribution of media specifically requires controls that almost no published study has imposed.
Practitioners should treat media depth as a probable contributor rather than a proven driver. The cost of investing in it is moderate, and the secondary benefits — engagement on social platforms, content for paid campaigns, training material for staff — are substantial whether or not the AI citation effect materialises.
Weak Signals That Look Strong
Vanity Metrics to Ignore
The list of metrics that look impressive in slide decks but appear to do little for AI citation frequency is long and uncomfortable. Profile views, photo views, and direction requests on Google Business Profile are useful for understanding consumer behaviour but do not, on the available evidence, correlate cleanly with how often a model cites the listing. Star average, taken in isolation from review volume and recency, is similarly limited; a 5.0 average from twelve reviews is weaker than a 4.4 average from four hundred.
Follower counts on social platforms tied to the listing are another vanity metric. A million followers on a moribund Facebook page contributes little. Engagement rate is closer to meaningful, but only when the engagement contains substantive text — comments that reference the business, not “🔥🔥🔥”.
Misleading Engagement Numbers
Engagement numbers reported by social and listing platforms are typically constructed to flatter the platform’s own narrative about its value. A “view” can mean a fraction of a second of a feed scroll. A “like” can be reflexive. These numbers are not lies, but they are not the signals models read. Models read what is written, what is photographed, what is structured, and what is consistent.
The MIT Sloan Management Review’s framing of weak digital signals is relevant in reverse here: just as practitioners can miss strong signals embedded in weak data, they can also be misled by weak signals dressed up as strong ones. The discipline is the same — disciplined attention to what the data actually supports.
Platform Reach Without Authority
A presence on every platform is not the same as authority on any platform. Many businesses spread thin across Yelp, TripAdvisor, Foursquare, Bing Places, Apple Maps, Yellow Pages, Trustpilot, Glassdoor, and a dozen industry-specific sites. The result is dozens of half-maintained profiles, each adding noise rather than signal. A focused presence on the four or five platforms that genuinely contribute to the model’s understanding of the entity outperforms a sprawling presence across twenty.
The selection should be made on the basis of which platforms feed the data ecosystems that models actually train on or retrieve from. For most businesses in English-speaking markets, that core set includes Google Business Profile, Apple Maps, the dominant industry-specific platform (Yelp for restaurants, Healthgrades for medical, Avvo for legal), and one or two well-curated general business platforms. For organisations interested in the methodology behind that selection, a published examination of how curated listings differ from automated aggregations offers a useful reference point.
What The Data Says Practitioners Should Do
Prioritising the Four Signals by ROI
Quick Wins Within 30 Days
The fastest gains come from NAP remediation and category refinement. Both can be executed within 30 days for most businesses, both rely on existing tools and platforms, and both produce measurable changes in citation frequency in the following 4–8 weeks. The order of operations matters: fix consistency first, because category changes propagate through inconsistent records imperfectly. Then refine categories. Then add or correct sameAs links.
A reasonable 30-day plan: week one, audit consistency and identify the canonical record. Week two, push corrections through the primary platforms and aggregators. Week three, refine primary and secondary categories. Week four, audit and update schema on the brand’s website to align with the corrected listings. The work is unglamorous. It is also the highest-ROI work available in this space.
Long-Term Authority Building
The medium- and long-term work concerns review velocity, media depth, and entity graph integration. Reviews require a sustained operational discipline: a clearly assigned owner, an integration with the customer journey (transactional emails, post-service prompts), and a reply protocol that adds substantive content rather than thanks. Media depth requires a content production cadence — modest but consistent — that fits into the operational rhythm of the business.
Entity graph integration, particularly Wikidata and schema, is a one-time investment that pays out indefinitely. The entries do not expire. They do not require monthly updates. They do require initial care and an occasional review when the business changes materially.
Harvard Business Review’s discussion in Understand the 4 Components of Influence (2015) — covering positional power, emotion, professional knowledge, and nonverbal signals — has an analogue here. The four authority signals readable from listings are the digital equivalent of those influence components: each carries weight, each can be cultivated, none is sufficient on its own.
Tracking AI Citation Frequency Monthly
The metric most practitioners are not yet tracking is also the one that matters most: how often, and in what context, generative models cite the business. Tracking this is awkward because no platform offers a clean dashboard for it. A workable approach is to define a portfolio of representative queries — branded, category, geographic, urgent, comparative — and run them monthly against the major consumer assistants. Record which model cited the brand, what it said, what source it grounded the claim in, and whether competitors were cited instead.
The data is noisy. Models drift. Versions change. But a six-month trend line is informative even when individual data points are not. The point is not to optimise to the metric in a quarterly review but to notice when the trajectory bends — particularly downward — and ask why.
Deloitte’s CFO Signals quarterly survey methodology, while concerned with executive sentiment rather than AI citation, illustrates the value of consistent pulse measurement against a stable instrument. The instrument matters more than the absolute numbers; what one is looking for is direction and inflection.
Reallocating Budget From Weak Signals
Cutting Low-Impact Directory Spend
Many organisations spend meaningful sums on listings that do not appear to feed the data ecosystems that models read. Audit the actual outputs: which listings have been updated by the platform in the last year? Which appear in any structured data feed? Which have any organic traffic? Listings that fail all three tests are candidates for cancellation. The savings are modest individually but cumulative across a portfolio of locations.
The harder cut is the in-house effort spent maintaining these listings. Time, like money, is a budget. Reallocating an hour a week from low-impact listing maintenance to review reply quality, photo uploads, or schema refinement is a real reallocation even if no invoice changes.
Investing in Review Infrastructure
Review infrastructure means the technical and operational systems that consistently produce reviews from satisfied customers. The technical layer is well-served by vendors: Birdeye, Podium, Reputation, NiceJob, and others handle the request-and-collect mechanics. The operational layer is harder: training staff to ask, integrating the ask into the natural rhythm of service, ensuring compliance with platform guidelines (no incentivisation, no gating, no review-station-on-premises tactics that violate Google’s terms).
The Forrester observation that organisations must move beyond MQL as a sole propensity indicator has a parallel in review thinking: star average is the local-business equivalent of the MQL — a single, lagging, oversimplified number that masks the underlying texture of customer sentiment. Looking only at the average is the mistake. Looking at velocity, recency, semantic density, and reply quality together is the discipline.
Building a Media Production Workflow
Media production for listings does not require a studio. It requires a workflow. A nominated person on each location’s team, a shared phone or camera, a simple naming and upload protocol, and a monthly checklist. The output target is modest: two to four photos per location per month, one short video per quarter, and an annual refresh of the cover image and exterior shots.
Where production budgets allow, investing in a quarterly professional shoot — covering several locations in one day — produces a stockpile of higher-quality assets that can be drip-fed across the year. The mistake is to concentrate all production into a single annual campaign whose output appears in one burst and then stales for eleven months.
The HBR sponsor content from Arm (2025) on the signals technology leaders are tuning in to makes a tangentially relevant point: in a context reshaped by generative AI and automation, “competitive advantage is determined by how leaders respond to a new set of questions.” For practitioners managing business listings, the new question is not whether AI reads the listing — it does — but which signals inside the listing the AI weights, and whether the organisation is producing those signals or merely producing noise.
Forrester’s Wave Q2 2023 work, which classified the 14 largest intent data providers into four distinct business-model categories, hints at a useful frame for thinking about listing data providers as well. Some are traditional aggregators; some are platforms; some are walled gardens. The choice of which to invest in should follow the same logic as intent data provider selection: match the business model of the provider to the actual consumption pattern of the downstream system. Models read what is fed into the open structured data graph more readily than what sits behind a walled garden’s API.
The Brookings reporting on regulatory exertion in the cryptocurrency context is a reminder that the rules of the road for structured data and AI training are still being written. Practitioners should expect — and prepare for — meaningful changes over the next two to three years in what platforms publish, what they syndicate, what models are permitted to ingest, and what disclosure obligations attach to AI-generated answers. A signalling strategy built on the assumption that today’s data flows persist unchanged will age badly.
The eMarketer reporting on the 20 US states that have passed comprehensive privacy laws (with four more actively considering them as of July 2024) is the regulatory bellwether. Listings, because they comprise data that businesses themselves have published intentionally, sit on relatively safe ground compared with behavioural advertising signals. But the principle of data minimisation now embedded in many state laws will increasingly affect what platforms can collect and share. Listings that depend on inferred or behavioural data are more exposed than listings that depend on explicit business-published data.
One final reflection from the cumulative literature on signal interpretation: as Merrill argued in HBR (2012), the practical work is “less about exotic techniques and more about disciplined attention.” The four authority signals discussed here — consistency, reviews, categories, media — are not novel. They have been visible to local SEO practitioners for years. What is novel is that they now feed a different kind of consumer interface, where the answer is synthesised rather than ranked, and where the brand’s website may never appear at all. The discipline of attention has not changed. The surface where attention pays off has.
Here is the challenge. Pick three branded queries, three category queries, and three urgent-need queries that matter for the organisation. Run each of them, today, against ChatGPT, Perplexity, Gemini, and Claude. Record what the model said. Record what it cited. Record whether the brand appeared at all, and if it did, whether it was cited from its own website or from a listing — and if from a listing, which one. Then ask the harder question: of the four signals discussed in this article, which one, if remediated first, would most likely change next month’s result? Assign that work to a named person, with a thirty-day deadline, and run the same nine queries again. The data will be imperfect. It will also be more honest than any vendor dashboard. And it will tell you, unambiguously, whether the organisation is producing signals the models read — or noise the models ignore.

