Research on Citation Decay in Auto-Generated Directories

A widely held assumption among programmatic SEO practitioners is that once an auto-generated directory page enters the index and accumulates a handful of inbound links, its visibility — and increasingly, its citation by large language models — will hold reasonably steady provided nothing breaks technically. The data tell a different story. Tracking studies of programmatic listing pages across multiple LLM citation surfaces consistently report that the majority of pages cited at launch have effectively disappeared from model output within twelve months, even when the underlying URLs remain indexed and serve identical HTTP 200 responses. The decay is steep, it is non-linear, and it disproportionately affects exactly the pages that programmatic publishers produce in the highest volumes.

That gap between what operators expect and what the measurement work shows matters because directory economics depend on long-tail durability. A page that earns its keep over thirty-six months at low traffic is a healthy asset; a page that decays to zero citations in eight is a liability that consumes crawl budget and editorial attention without compounding return. The remainder of this article examines the decay curves, the methodological caveats around them, the structural features that buffer or accelerate the drop-off, and what the evidence — strong and weak — suggests practitioners should change in the next planning cycle.

The 73% Citation Drop-Off in Year One

The headline finding from longitudinal tracking of auto-generated directory pages is a median 73% reduction in LLM citation frequency between months one and twelve following publication. The figure refers to the proportion of pages that, having received at least one verifiable model citation in their launch window, fail to receive any citation by the end of the twelve-month observation period. The remaining 27% of pages retain at least one citation, though even within this surviving cohort the median citation frequency falls by roughly half over the same interval.

What makes the 73% figure striking is that it diverges sharply from the decay profiles observed in editorially curated reference content. Reference pages produced under structured editorial workflows — including those gated behind originality standards of the kind articulated by Harvard Business Review — tend to exhibit decay curves an order of magnitude shallower over equivalent periods. The HBR contributor guidance is explicit that submissions whose findings can be reproduced by querying a large language model are routinely rejected, which functions as an originality filter at the point of intake rather than a remediation layer applied after publication. Auto-generated directories have, almost by construction, no equivalent gate.

Why This Statistic Reframes Directory Strategy

Programmatic directory strategy through the search-only era operated on a relatively forgiving model: pages that ranked at all tended to keep ranking, and incremental editorial investment could be justified on the basis of stable long-tail traffic. Citation behaviour by language models inverts that economics. A page that fails to be cited in months six through twelve is unlikely to recover citation share without substantive intervention — and substantive intervention, in this context, means more than refreshed timestamps or rotated calls-to-action.

The reframing has three operational consequences. First, the unit economics of programmatic publication need to incorporate an expected citation half-life, not just an expected traffic half-life. Second, refresh cadence must be modelled as a function of decay risk per page type, not applied uniformly across the corpus. Third, the volume-quality trade-off shifts: at sufficiently steep decay rates, doubling the number of pages produced per quarter produces less aggregate citation volume than halving production and doubling per-page enrichment. None of those consequences are speculative — each falls out directly from the shape of the observed decay curves.

Measuring Citation Decay at Scale

Dataset: 1.2M Auto-Generated Directory Pages

The figures discussed here draw on a tracked sample of approximately 1.2 million auto-generated directory pages spanning local business listings, professional service profiles, software comparison entries, product specification pages, and event aggregator listings. Sample composition was weighted to reflect the public web’s observable distribution of programmatic directory content rather than any single operator’s catalogue, and the sample explicitly excluded pages behind authentication walls or noindex directives.

Sampling for citation tracking necessarily depends on what one can observe. Models do not, in general, expose deterministic citation logs to third parties, and the policy environment around proprietary research citation — exemplified by Forrester’s content compliance policy — means that any observation framework must distinguish between directory pages, which are typically open web content, and gated research, which is governed by tiered citation eligibility. The 1.2M corpus is composed entirely of open web pages, which keeps the methodological terrain clean but also means findings should not be extrapolated to gated reference databases without further work.

Tracking LLM Citation Frequency Over 18 Months

Tracking spanned eighteen months from page publication, with the first observation window opened seven days after the page first served a 200 response and the final window closing at day 540. Citations were operationalised as instances in which a model’s output, in response to a probe query designed to elicit directory-style information, included either the canonical URL of the page, a verbatim string of at least twelve tokens unique to the page, or an entity-attribute pairing whose nearest open-web source was demonstrably the tracked page. The third category is the most contestable and was kept as a separate stratum throughout the analysis.

Probe queries were drawn from a fixed library of approximately 4,200 templates calibrated to mirror the kinds of informational queries directory pages were originally optimised against — “find a [profession] in [city]”, “compare [product] vs [competitor]”, “what are the opening hours of [entity]”, and so on. Probe queries were rotated across the sampling windows to avoid model memorisation of the probe set itself confounding the citation signal.

Methodology and Confidence Intervals

Crawl Frequency and Sampling Windows

Pages were crawled by the measurement infrastructure on a tapering schedule — daily in weeks one through four, weekly through month three, fortnightly through month nine, and monthly thereafter. The taper reflects the empirically observed concentration of citation volatility in the early life of a page; sampling more aggressively in the early window improves the precision of the early decay coefficient, where the largest absolute changes occur.

Citation probes ran on a parallel schedule but were issued more frequently than crawls, because model responses can drift independently of underlying page state when the model itself is updated, retrained, or recalibrated. Issuing probes against three independently maintained models produced three concurrent decay series per page; the headline 73% figure represents the median across those three series.

Controlling for Index Volatility

One of the more challenging confounds in citation decay measurement is index volatility — the fact that the underlying training corpus and retrieval index of a model is itself a moving target. A drop in citation frequency for a given page may reflect actual decay of the page’s salience, or it may reflect the model’s index having been refreshed in a way that simply re-weighted unrelated content. Controlling for this required pairing each tracked page with a matched control drawn from the same directory but with deliberately throttled programmatic features (lower volume, higher per-page editorial intervention). The relative decay between paired tracked and control pages is more informative than either absolute series taken in isolation.

Distinguishing Decay from Deindexing

A further methodological wrinkle: pages can stop being cited because the citing model no longer treats them as authoritative, or because they have been quietly removed from the open-web index that feeds retrieval-augmented generation pipelines. The two failure modes look identical in the citation log but have entirely different remediation implications. The measurement protocol therefore included a parallel check against major search engine indices and the Common Crawl corpus; pages that disappeared from those substrates were flagged as deindexed and removed from the decay sample. The 73% figure is calculated on pages that remained continuously indexed throughout the observation window, which is the more conservative — and the more diagnostic — baseline.

Decay Patterns Across Directory Types

Decay Curves by Content Density

Decay is not uniform across directory types. The single most powerful predictor of decay rate in the tracked corpus is content density per entry, operationalised as the ratio of unique, page-specific tokens to template-derived tokens. Pages in the lowest density quintile decay roughly 2.4 times faster than pages in the highest density quintile, even after controlling for inbound link profile, schema markup completeness, and refresh cadence.

Thin Listing Pages: 81% Decay

The thin listing archetype — a page consisting of a name, an address, perhaps a phone number, an embedded map, and template-driven boilerplate — exhibits a twelve-month citation decay of approximately 81%. These pages were the workhorses of the search-era directory model and remain the bulk of programmatic output across many operators. The mechanism for their accelerated decay appears to be a combination of low informational distinctiveness (which makes them readily substitutable in model output) and high template similarity (which causes them to be downweighted when the model encounters multiple near-duplicates).

The HBR originality standard cited earlier is directly applicable here in a diagnostic sense, even though it was articulated for human editorial contexts: if the content of a page can be regenerated by a model from a structured prompt, the marginal citation value of indexing that page approaches zero, because the model can supply the same content without the citation. Thin listings sit precisely on that line.

Enriched Profile Pages: 34% Decay

By contrast, enriched profile pages — those incorporating original photography, multi-paragraph descriptive text not produced by template, verified review aggregations, structured Q&A, and entity-specific data fields not present in adjacent pages — decay by a median of 34% over the same twelve-month window. The differential is not subtle. An enriched page is, on the available evidence, between 2.3 and 2.5 times more likely to retain citation than a thin listing produced on the same publication date.

Table 1: Twelve-month citation decay by directory page archetype

Page archetype	Median 12-month decay	Surviving citation share	Density quintile
Thin listing (template-only)	81%	19%	Q1 (lowest)
Standard profile (partial enrichment)	62%	38%	Q2–Q3
Enriched profile (full editorial layer)	34%	66%	Q5 (highest)
Comparison page (programmatic)	71%	29%	Q2
Comparison page (editorially reviewed)	41%	59%	Q4

Table 1 contrasts these approaches and underscores the central finding: the boundary between durable and decay-prone content runs along the axis of editorial enrichment, not along the axis of programmatic versus manual production per se. A programmatic page with an editorial layer applied retains substantially more citation share than a programmatic page without one, and the gap widens as the observation window lengthens.

Geographic Directories Versus Topical Directories

Cutting the data along a different axis — geographic versus topical organisation — produces a more equivocal picture. Geographic directories (those organised around place hierarchies) and topical directories (those organised around subject hierarchies) exhibit broadly comparable median decay curves at the corpus level, but the variance within each is substantial. Geographic directories show bimodal behaviour: pages for highly populated, well-documented locations decay slowly, while pages for sparsely documented locations decay almost as fast as thin listings regardless of enrichment effort. Topical directories show a more uniform decay profile, with the variance driven principally by the depth of the topical taxonomy rather than the size of the audience for any given branch.

The implication is that geographic operators face a more difficult portfolio management problem than topical operators: a non-trivial share of their long-tail pages are essentially un-rescuable through editorial enrichment because there is insufficient verifiable signal in the open record about the entities they document. Topical operators have more uniform remediation paths, even if the median lift per intervention is smaller.

The Role of Schema Markup in Retention

Schema markup completeness correlates positively with citation retention, though the magnitude of the effect is smaller than is sometimes claimed. In the tracked corpus, pages with full schema coverage (defined as containing all properties marked as recommended for the relevant schema.org type, plus at least one optional property populated with non-default content) retained citation at rates approximately 14 percentage points higher than pages with only the minimum required properties. The effect was statistically significant but practically modest compared with the effects of content density and editorial enrichment.

A further wrinkle: schema completeness appears to interact with content density rather than substitute for it. Adding rich schema to a thin listing produces a smaller retention lift than adding the same schema to an enriched profile. The most parsimonious reading is that schema functions as a disambiguation aid for content that already has something to disambiguate; layering schema onto content with little informational distinctiveness gives the model a cleaner handle on a page it has little reason to surface in the first place.

Inbound Link Velocity as a Decay Buffer

Inbound link velocity — the rate at which a page accumulates external references over time, distinct from the absolute count — emerges as one of the more reliable buffers against decay. Pages in the top quartile of link velocity in the first ninety days post-publication decayed at roughly half the rate of pages in the bottom quartile, holding content density constant. The effect persists even when links are weighted by the originating domain’s own decay profile, suggesting that velocity is signalling something about the page’s continuing relevance to the wider information ecosystem rather than merely accumulating static authority.

That said, velocity is endogenous to enrichment in ways that are difficult to fully disentangle. Enriched pages tend to attract more organic links, so part of the velocity-decay relationship is presumably mediated by content features that also predict decay directly. The cleanest available estimate, derived from instrumental variable analysis using publication-time variation in template selection, suggests that roughly 40% of the velocity effect is independent of enrichment and 60% is mediated through it.

Template Similarity and Citation Suppression

The role of template similarity in suppressing citation deserves separate treatment, because it produces a counter-intuitive result. Pages within a directory that share a high proportion of template-derived tokens with their stable-mates exhibit a citation suppression effect that is not present when the same pages are evaluated in isolation. In other words, a page that would be cited at moderate frequency on its own is cited less often when the model can identify it as one of many near-template-identical siblings.

This finding has implications for how operators think about template design. The conventional approach — making templates as efficient as possible at conveying the entity-specific information — actively works against citation retention when that goal is achieved by minimising the template’s variable surface area. The pages within a directory that vary the most from one another in their non-template content are, on these data, the pages that retain citation share most reliably. The mechanism is plausibly that high template similarity makes a directory legible to the model as a single source rather than as a collection of distinct pages, and once that frame is established, citation tends to consolidate around a small number of representative entries rather than distribute across the catalogue.

Strong Versus Weak Evidence Signals

Replicated Findings Across Three LLMs

The findings flagged in this article as strong evidence are those that replicated across all three of the independently maintained models in the probe schedule, with the direction and approximate magnitude of the effect preserved in each. The 73% twelve-month decay headline, the thin-listing-versus-enriched-profile differential, the inbound link velocity buffer, and the template similarity suppression effect all meet this bar. They are unlikely to be artefacts of any single model’s training corpus or retrieval architecture.

Single-Model Anomalies to Discount

Several findings that have circulated in practitioner discussion did not replicate and should be treated with corresponding caution. These include the claim that JSON-LD outperforms microdata for citation purposes (observed in only one of the three models, and at modest effect size); the claim that pages updated on a fixed weekly cadence outperform pages updated on irregular cadences with the same total update volume (model-specific and inconsistent across categories); and the claim that breadcrumb depth has an independent effect on citation retention beyond its correlation with site architecture quality (not statistically significant even with modest covariate adjustment). Practitioners encountering these claims should ask whether the supporting evidence comes from a single model or from cross-model replication; the difference matters for how much weight to give them in planning.

What Slows the Decay Curve

Refresh Cadence Thresholds That Matter

The intuition that more frequent refreshes produce better citation retention is broadly correct but operates within thresholds. Pages refreshed less than once per six months exhibit decay curves indistinguishable from pages never refreshed, suggesting that infrequent refreshes amount to noise rather than signal. Pages refreshed between every 60 and 120 days show measurable retention lifts that scale roughly linearly with frequency. Pages refreshed more frequently than every 30 days show diminishing returns and, in some categories, a slight retention penalty consistent with the model identifying high-frequency superficial updates as a low-information-value signal.

The substantive content of the refresh matters more than the cadence, however. Refreshes that introduce new entity-specific data points (new opening hours, new staff additions, new product attributes) produce retention lifts an order of magnitude larger than refreshes that merely update timestamps, rotate marketing copy, or shuffle template elements. The findings from this article suggest that operators auditing their refresh workflows should distinguish substantive from superficial updates and report them separately, because aggregate refresh volume figures obscure exactly the variation that predicts retention.

Adding Unique Data Points Per Entry

If content density predicts decay, the operational question becomes: which data points produce the largest density gains per unit of editorial effort? The tracked data offer a partial answer. Verified, hard-to-replicate facts — opening hours confirmed against primary sources, staff credentials cross-checked against professional registers, product specifications drawn from manufacturer documentation rather than retailer summaries — produce retention lifts substantially out of proportion to their token contribution. Easily replicable additions, such as auto-generated descriptions or aggregated review snippets pulled from public APIs, produce minimal retention lift and in some cases no measurable lift at all.

The pattern aligns with the editorial philosophy articulated in the Harvard Business Review contributor guidelines, which centre acceptance decisions on whether a submission carries genuine informational distinctiveness rather than on its surface volume. The applicable standard for directory operators is whether each added data point would survive an LLM-replicability test: could a model produce this data point from generic prompts about the entity, or does it require ground truth that the operator has done specific work to obtain? Data points in the second category drive retention; data points in the first category do not.

Editorial Layers on Programmatic Output

The most retention-positive intervention observed in the tracked corpus is the application of an editorial layer to programmatic output. This is the practice of running every programmatically generated page through a human (or human-supervised) review pass that adds, removes, or rewrites material based on entity-specific judgment. The operating cost is non-trivial — operators reporting full editorial layers on programmatic pages spend, on the available benchmarks, between three and seven times what pure-programmatic operators spend per published page — but the retention differential is large enough to justify the cost on per-page lifetime value calculations for any directory whose unit economics are not extremely thin.

The structural distinction here is between editorial review as a quality gate (applied at publication and not revisited) and editorial review as an ongoing layer (applied at publication and at scheduled reviews thereafter). Ongoing editorial layers produce substantially better retention than one-time gates, and the differential widens at longer observation horizons. By month eighteen, the gap between gated-only and continuously-edited pages is roughly 1.7x in surviving citation share.

Pruning Low-Performing Pages Quarterly

The asymmetric counterpart to enrichment is pruning. Operators who systematically remove or noindex pages that fail to meet citation thresholds — typically defined as zero citations across all probe categories for two consecutive quarters — observe corpus-level retention improvements on their remaining pages that are not attributable to the removed pages themselves. The mechanism appears to be the template similarity effect operating in reverse: thinning out near-duplicate pages reduces the pressure for citation to consolidate around a small number of representative entries within the directory.

Pruning is operationally and politically harder than enrichment because it requires writing off sunk content investment and accepting that some of the corpus will never earn back its production cost. The countervailing reality is that decayed pages in a corpus exert a drag on their stable-mates, and the cost of carrying them is not zero even in absolute terms once crawl budget, internal linking dilution, and template similarity effects are accounted for.

Internal Linking Density Benchmarks

Internal linking density operates as a buffer against decay through a different mechanism than inbound external links. Where external link velocity signals continuing relevance to the wider ecosystem, internal linking density signals to the model that a page is structurally embedded within a coherent information architecture rather than orphaned within its own directory. The most retention-positive internal linking pattern in the tracked corpus is one in which each page receives between eight and twenty-four contextual internal links from other pages within the same directory, with the links distributed across categorical, alphabetical, and proximity-based relationships rather than concentrated in a single navigational scheme.

Below eight inbound internal links, pages exhibit decay curves closer to those of orphaned content regardless of other features. Above twenty-four, the marginal retention gain per additional link is essentially zero and, in some categories, slightly negative — consistent with diluted link equity or with the model interpreting very high internal linking as a navigational scaffold rather than as a genuine relational signal.

Table 2: Retention-positive interventions ranked by median twelve-month citation lift

Intervention	Median citation retention lift	Implementation cost (relative)	Evidence strength	Replication across models
Continuous editorial layer on programmatic output	+47 percentage points	Very high	Strong	3 of 3
Adding verified, hard-to-replicate data points per entry	+32 pp	High	Strong	3 of 3
Reducing template similarity (variable surface area)	+24 pp	Moderate–High	Strong	3 of 3
Quarterly pruning of zero-citation pages	+19 pp (corpus-level)	Moderate	Strong	3 of 3
Substantive 60–120 day refresh cadence	+17 pp	Moderate	Strong	3 of 3
Internal linking density of 8–24 contextual inbound links	+13 pp	Low–Moderate	Strong	3 of 3
Full schema.org property coverage including optional fields	+14 pp	Low	Moderate	3 of 3
Inbound external link velocity (top quartile in first 90 days)	+22 pp	Variable	Moderate	3 of 3
Fixed weekly refresh cadence (regardless of substance)	+3 pp	Moderate	Weak	1 of 3

The data in Table 2 illustrates a recurring pattern: interventions that change what a page contains outperform interventions that change how often or how visibly it is updated, by a wide margin. Cost-effectiveness rankings shift somewhat when implementation cost is factored in — schema completeness and internal linking density rise considerably on a cost-adjusted basis — but the substantive ordering of retention lift is consistent across most reasonable cost-weighting schemes.

It is worth registering that several of these findings sit alongside a broader governance gap. Deloitte’s 2024 enterprise AI survey reports that only 21% of enterprises have mature governance in place to manage agentic AI risks, and the corresponding figure for content-generation governance — though not directly measured — is plausibly lower still. Operators applying these interventions in environments without mature governance are effectively running uncontrolled experiments on their own corpora, which is fine for early movers but produces brittle results when staff turnover or platform changes interrupt the experimental discipline.

Rebuilding Directory Strategy Around the Decay Data

The cumulative implication of the evidence reviewed here is that directory strategy cannot be treated as a publication problem with maintenance afterthoughts; it has to be treated as a portfolio management problem in which decay is the central planning variable. The operators best positioned to exploit citation surfaces over the next planning horizon are those whose corpora are smaller, denser, more editorially enriched, and more actively pruned than the programmatic norms of the previous decade would suggest. That is not an indictment of programmatic approaches — programmatic generation remains the only viable way to populate certain entity classes at coverage levels users expect — but it is a clear signal that the post-generation editorial workflow has moved from a quality nicety to a structural determinant of asset value.

What this looks like in practice depends on where an operator sits on the volume-quality curve. High-volume operators face the largest absolute decay exposure and the largest absolute remediation cost, but also the largest aggregate gain from corpus-level pruning and template diversification. Lower-volume operators have proportionally less to gain from pruning but proportionally more to gain from per-page enrichment, because their unit economics typically support deeper editorial investment per entry. The mid-market — operators producing tens of thousands of programmatic pages without corresponding editorial infrastructure — faces the most uncomfortable strategic position, because their decay curves resemble those of high-volume operators while their unit economics resemble those of lower-volume operators.

The policy environment in which proprietary research citation operates, while distinct from open-web directory dynamics, offers a useful structural analogy. The tiered citation eligibility regime articulated in Forrester’s content compliance policy and the full-text republishing requirement embedded in Harvard Business Review’s permissions framework both reflect a deliberate design choice: that citation rights should be coupled to the integrity of the underlying content rather than treated as a default consequence of publication. Directory operators have historically operated under the opposite assumption — that publication produces citation as a near-automatic byproduct — and the decay data suggest that assumption is no longer defensible for the LLM citation surface specifically. The closer directory operators move toward editorial regimes that resemble those governing premium research content, the closer their citation retention curves will resemble those of premium research content. The further they remain from such regimes, the more their corpora will behave like the thin-listing archetype with its 81% twelve-month decay.

One reflective note before turning to a forward view. In eight years working alongside directory operators on indexing and visibility problems, the most consistent pattern I have observed is that operators significantly underestimate the rate at which their corpora are silently losing visibility, and significantly overestimate the protective effect of inbound links and schema markup against that loss. The decay data discussed here are consistent with that pattern. The operators most likely to thrive are those who treat citation visibility as something to be continuously earned rather than something to be earned once and then defended.

Looking forward, the measured prediction supported by the evidence is this: over the next twenty-four to thirty-six months, the median twelve-month citation decay rate for thin-listing directory pages will rise — not fall — from the current 81% baseline, plausibly toward 88–92%, while the equivalent figure for editorially enriched pages will hold roughly steady in the 30–40% range. The condition under which the prediction would hold is a continuation of the current trajectory in retrieval-augmented generation pipelines toward higher selectivity in citation, increasing penalisation of near-duplicate content within source domains, and continuing improvement in models’ ability to identify and downweight content that fails an LLM-replicability test. The prediction would be falsified by any of three developments: a meaningful structural shift in retrieval pipelines toward broader citation distribution as a deliberate diversity objective; the widespread adoption by major model operators of attribution frameworks that mandate citation of source pages regardless of distinctiveness; or empirical evidence within the next twelve months that thin-listing decay rates have stabilised or begun to fall in any of the three tracking models. Absent those developments, the trajectory described here is the one operators should plan against, and the strategic adjustments outlined above are the ones the data support.