Author. Gombos Atila Robert, PhD — Owner and Chief Executive Officer, Jasmine Business Directory (D-U-N-S 10-276-4189), Valley Cottage, New York. ORCID: 0000-0001-6468-2811. Correspondence through the author profile.
Data statement. The empirical material analysed in this study was extracted directly from the production database of the Jasmine Business Directory. The directory has operated since 2009, has never used paid advertising to acquire listings, and adds approximately ninety per cent of its entries through manual editorial review. The analysed export is current as of 25 May 2026. This study is the fourth in a connected series; its companions examine category concentration, listing completeness, and the geography of the same corpus.
Abstract
This study examines how the 14,362 listings of a curated business directory accumulated over the seventeen years between the directory’s founding and the analysis date. The data comprise the listings table of the Jasmine Business Directory; the directory has operated since 2009, has never used paid advertising, and adds approximately ninety per cent of its entries through manual editorial review. For each listing the study reads the timestamp recording when the listing was added to the database, assigns it to a calendar-year cohort, and reconstructs the directory’s yearly intake and cumulative growth.
The accumulation is found to be highly uneven. Growth was negligible in the directory’s first three years, rose sharply in 2012, and reached its maximum in 2013 and 2014, when 4,334 and 2,829 listings were added; those two years alone account for 49.9% of the entire corpus. After 2014 the yearly intake fell to a few hundred listings and has remained near that level since, with a modest rise in 2025.
The study reads the 2013–2014 concentration alongside the directory’s documented record of editorial recognition in those same years, and it treats the period as one of intensive editorial accumulation rather than as an unexplained spike. It interprets the resulting age structure of the corpus through the data-quality literature, in which the currency of a record is an established dimension (Wang & Strong, 1996), and it cautions, on the evidence of the directory’s update field, against reading that field as a measure of genuine content revision. Reasoned projections for the directory’s growth are offered.
Keywords. business directories; directory growth; temporal analysis; listing cohorts; cumulative growth; data currency; editorial curation; data quality; web directory history; online visibility; corpus accumulation; directory listings.
Introduction
A business directory is not built in a single act. It accumulates, listing by listing, over a span of years, and the corpus that a directory holds at any moment is the deposit left by everything that came before. The shape of that accumulation — whether it was steady or uneven, front-loaded or recent, the work of a sustained burst or of a long slow gathering — is a history, and it can be read directly from the dates the directory records against its listings.
The question this study examines is when the listings in a curated directory were added, and what the pattern of their addition reveals. For each listing, the directory records the moment it entered the database; the study reads that timestamp for all 14,362 listings, groups the listings into calendar-year cohorts, and reconstructs both the directory’s intake in each year and the cumulative total the corpus reached at the end of each year. The result is a year-by-year account of how the directory grew from its founding to the date of analysis.
The approach is descriptive and quantitative. The timestamp recording when each listing was added was extracted for every listing; the listings were tallied by year; the cumulative total was computed; and the years were grouped into broader eras to make the long pattern legible. A second timestamp, recording when each listing was last updated, was examined separately and, for reasons the methodology sets out, treated with caution. As with its companion studies, this is an exploratory empirical analysis of a single corpus, and its claims are calibrated to that scope.
The data and their setting are the same as in the companion studies. The corpus is the production database of the Jasmine Business Directory, founded in 2009 and headquartered in Valley Cottage, New York. The directory has never used paid advertising to acquire listings, and it adds approximately ninety per cent of its entries through manual editorial review. The export analysed here was taken on 25 May 2026 and contains 14,362 listings whose addition dates span the seventeen years from June 2009 to May 2026. One point of interpretation should be fixed at the outset: the date a listing was added is the date it entered the directory’s database, which, for an editorially added listing, is the date an editor added it — not the date the business itself was founded.
This distinction is easy to state and important to keep. A listing added by an editor in 2013 may describe a business founded decades earlier or one founded that same year; the addition date records only when the listing entered the directory, and it carries no information about the age of the business itself. The growth curve of this study is, accordingly, a curve of listing creation, and every statement made from it is a statement about the directory’s activity, not about the formation of the businesses it lists.
Why the temporal shape of a directory matters can be put at three levels. For a business that holds a listing, the era in which its listing was created is the era of the information the listing is likely still to carry, and an old listing left untended is an old representation of the business. For the directory, the history of its own accumulation is the context for every decision about how the corpus should be maintained and grown. And for the wider understanding of digital discovery, the growth curve of an advertising-free directory shows how a corpus assembles when no marketing budget is pacing its expansion. Each level is developed in the discussion.
The contribution of the study is threefold. It provides, first, a precise year-by-year reconstruction of how a substantial real-world directory accumulated its corpus. It interprets, second, the markedly uneven shape of that accumulation — and in particular the concentration of growth in 2013 and 2014 — in the light of the directory’s documented editorial history, rather than leaving the pattern unexplained. It draws, third, the consequences of an age-skewed corpus for data quality, and offers reasoned projections for the directory’s future growth. The paper proceeds through a review of the relevant literature, a description of the dataset, the methodology, the results, the discussion, the projections, the limitations, and the concluding remarks.
Background and related work
The study draws on three bodies of work: the understanding of directories as institutions that accumulate a corpus over time, the data-quality treatment of currency and the age of a record, and the relationship between editorial curation and the pace at which a corpus grows. Each is summarised here, and the section closes by stating the gap the study addresses.
Directories as accumulating institutions
In the first period of the public web, curated directories were a primary means by which people found sites and businesses. The early literature on web search recognised the directory and the algorithmic search engine as the two principal models of discovery: Brin and Page (1998), in describing an early search engine, wrote against the backdrop of human-curated directories as the established alternative, and Kleinberg (1999) analysed the link structure of the web in terms that treated curated lists as authoritative hubs. Arasu, Cho, Garcia-Molina, Paepcke, and Raghavan (2001), surveying how the web was searched, placed directory-style organisation alongside crawling and indexing as a recognised approach.
The cumulative character of a directory has one further consequence worth naming. Because each year’s listings are added to, rather than replacing, those of the years before, a directory carries its whole history forward with it; nothing in the corpus is automatically retired as it ages. A directory’s present state is therefore the accumulated residue of every editorial decision it has ever made, and a temporal analysis is, in effect, an excavation of that residue layer by layer.
What these accounts share is an understanding of the directory as a corpus built deliberately and over time. A directory has value in proportion to the breadth and the organisation of what it has gathered, and that gathering is cumulative: each year’s additions rest on the years before. A directory’s corpus at a given date is therefore a historical object, and its size is the sum of a sequence of annual contributions that may have been very unequal. This study reconstructs exactly that sequence for one directory.
The sequence is worth recovering because it is ordinarily invisible. A directory presents itself to a visitor as a corpus of a certain size, organised in a certain way, with no indication of the order in which its listings arrived; the history is folded into the present total. The addition timestamp is the record that unfolds it again, and reading that timestamp across a whole corpus is what allows a directory’s history to be examined rather than merely inferred.
Cohorts, currency, and the age of a record
The data-quality literature supplies the concepts for thinking about the age of the records a directory holds. Wang and Strong (1996) established currency, sometimes termed timeliness, as a dimension of data quality distinct from accuracy and completeness: a record may have been correct when created and have since fallen out of date, and currency is the dimension that captures this. Pipino, Lee, and Wang (2002) treated currency as something that can be assessed, often as a function of how long ago a record was created or last verified against the world.
A directory listing is subject to exactly this form of decay. A business changes its address, its telephone number, its ownership, or ceases trading altogether, and a listing created years earlier and never revisited will not reflect those changes. The age of a listing — the cohort it belongs to — is therefore not a neutral fact but a standing risk to the listing’s currency, and a corpus heavily weighted toward old cohorts carries that risk across a large share of its records. The temporal analysis in this study is, in this sense, also a map of where the currency risk in the corpus is concentrated.
It is worth being precise about what the cohort does and does not tell us here. The cohort fixes the age of a listing as a record, and age is a risk factor for staleness rather than staleness itself; an old listing may have been revised many times and be entirely current, and a recent listing may already be wrong. What the cohort analysis yields is therefore a map of risk, not a map of error, and the distinction is maintained throughout the discussion that follows.
Editorial curation and the pace of accumulation
How a directory acquires its listings shapes the rhythm of its growth. A directory that relies on unmediated self-submission grows at a pace set by the external flow of businesses choosing to submit themselves, which tends to be relatively steady. A directory that adds the majority of its listings through editorial review, as the directory studied here does, grows at a pace set in part by editorial effort — and editorial effort can be concentrated, directed at a backlog or a campaign of curation, in a way that an external submission flow cannot.
This distinction matters for the interpretation of an uneven growth curve. Where a self-submission directory shows a sudden burst of additions, the explanation must be sought outside the directory; where an editorially curated directory shows one, the explanation may lie in a deliberate period of intensive editorial work. The economics of information frames why such work has value: a directory lowers the cost of search (Stigler, 1961) and organises information about sellers for buyers (Nelson, 1970), and a curated directory does so through deliberate editorial judgement about what to include. A burst of curation is, on this account, a burst of that judgement being exercised.
The contrast with a self-submission directory can be put one more way. In a self-submission directory, the question prompted by a burst of growth is what happened in the world to send so many businesses to the directory at once. In an editorially curated directory, the question is what the directory chose to do, because the directory’s own staff are the proximate cause of most additions. The present study, examining a directory of the second kind, accordingly looks inward for its explanation of the 2013 and 2014 peak.
Self-submission and editorial addition as two growth regimes
It is worth drawing the distinction between the two ways a directory grows a little more sharply, because the shape of the curve reported in this study depends on it. Under self-submission, a directory grows as businesses choose, of their own accord, to submit themselves; the rate is set by external demand to be listed, and that demand, being the aggregate of many independent decisions, tends to vary smoothly rather than abruptly.
Under editorial addition, the directory’s own staff add listings, and the rate is set by editorial capacity and editorial direction. This regime can produce a far less even curve. A directory may devote a concentrated period to building out its corpus — clearing a backlog, covering a set of categories systematically, pursuing a deliberate campaign of expansion — and then return to a lower rate of maintenance. Editorial growth can, in short, be intensified and relaxed in a way that demand-driven growth cannot.
The directory studied here adds approximately ninety per cent of its listings editorially, and its growth is therefore predominantly of the second kind. An uneven curve, with periods of intensive accumulation separated by periods of steadier maintenance, is the shape such a directory would be expected to show, and it is against this expectation that the results of the study should be read. A burst in the curve is not, for an editorially built directory, an anomaly to be explained away; it is the visible trace of a period of directed editorial work.
The gap addressed by this study
These literatures are well developed individually, but they are seldom brought together on a precise, openly documented reconstruction of how a real, substantial, advertising-free directory accumulated its corpus over its full history. The literature on directories explains why accumulation is cumulative and historical; the data-quality literature explains why the age of the resulting records matters; the distinction between editorial and self-submission growth explains why an uneven curve is to be expected and how it should be read. The present study occupies the point at which these meet. It does not claim that its growth curve generalises to other directories; it claims to reconstruct one directory’s history exactly, and to interpret that history, editorial context included, in a way that is informative beyond itself.
The dataset: seventeen years of a directory’s listings
The empirical material for this study is the production database of the Jasmine Business Directory, the same corpus analysed in the companion studies of category concentration, listing completeness, and geography. The directory was founded in 2009 by Pécsi András and Robert Gomboș, is headquartered in Valley Cottage, New York, and has operated continuously since then as a general business directory organised by subject category.
Two characteristics of the directory’s operation bear on the interpretation of its growth. The directory has never used paid advertising to acquire listings; its corpus has accumulated through organic submission and editorial addition. And it adds approximately ninety per cent of its entries through manual editorial review. The growth curve measured here is therefore the curve of an organically and editorially assembled corpus, not the curve of a corpus paced by a marketing budget.
The same provenance gives the temporal record a particular evidential value. A growth curve shaped by advertising spend would tell a reader chiefly about the directory’s marketing decisions; a growth curve shaped by organic submission and editorial work tells a reader about the directory’s editorial history and about the rate at which businesses sought it out unprompted. The curve reconstructed here is a record of the second kind, and it is, for that reason, a reasonably direct trace of how the directory was actually built.
The unit of analysis is the individual listing record, of which the export contains 14,362. Each listing record carries a timestamp, in the field named added, recording the date and time the listing was entered into the directory’s database; every one of the 14,362 listings carries a valid value in this field, so the temporal analysis covers the corpus without exception. Each listing also carries a second timestamp, in a field recording when the listing was last updated, and this field is examined separately. The addition dates span from June 2009 to May 2026. Table 1 summarises the dataset and the time fields examined.
| Attribute | Value |
|---|---|
| Source | Jasmine Business Directory, production database |
| Directory founded | 2009 |
| Headquarters | Valley Cottage, New York |
| Listing-acquisition model | Organic submission and editorial addition; no paid advertising |
| Editorial curation | Approximately 90% of entries added through manual review |
| Export reference date | 25 May 2026 |
| Listings (universe of analysis) | 14,362 |
| Listings carrying a valid addition date | 14,362 (100%) |
| Span of addition dates | June 2009 to May 2026 |
| Primary time field analysed | Addition timestamp (the year a listing entered the database) |
| Secondary time field examined | Last-updated timestamp (examined separately, treated with caution) |
Methodology
The methodology is set out in four parts: the extraction of the addition timestamp, the construction of yearly cohorts and cumulative totals, the separate and cautious treatment of the update timestamp, and a note on the descriptive design. The analytical framework is shown in Figure 1.
Data extraction and the addition timestamp
The dataset was obtained as a complete export of the directory’s production database on the reference date. The addition timestamp is held in the listings table itself, as a full date-and-time value, and required no joining of tables; the year of addition was read directly from it for each of the 14,362 listings.
One feature of the field is worth recording, because it is unusual among the fields examined across this series of studies. Every one of the 14,362 listings carries a valid addition timestamp; there are no missing or malformed values. This completeness is itself informative: it indicates that the addition date is set automatically by the database when a listing is created, rather than entered by hand, and an automatically set timestamp is a reliable record of when a listing entered the corpus. The temporal analysis therefore rests on a field that is both complete and trustworthy as to what it records.
This reliability is worth contrasting with the fields examined in the companion studies. Those studies confronted fields filled, or left empty, by many hands over many years — address fields, content fields, geographic fields — whose completeness was itself one of the findings. The addition timestamp is of a different character: set by the database rather than by a person, it is present without exception and records what it purports to record. The temporal study therefore begins from firmer ground than the studies of completeness and geography, and its central limitation, set out later, concerns interpretation rather than the reliability of the data.
Yearly cohorts and cumulative growth
Each listing was assigned to the calendar year of its addition timestamp, forming year cohorts from 2009 to 2026. Two quantities were then computed. The yearly intake is the number of listings in each year’s cohort — the count of listings added in that year. The cumulative total is the running sum of the yearly intakes, giving the size the corpus had reached at the end of each year. Together these describe both the rhythm of growth and the trajectory of the total.
To make the long pattern legible, the seventeen years were also grouped into four eras, with boundaries placed where the yearly intake itself changes character: a formative period, a period of intensive accumulation, and two later periods of steadier growth. The era boundaries are a presentational device, chosen after inspecting the yearly figures, and the results report both the year-by-year detail and the era summary so that the grouping can be checked against the underlying data.
The reader should treat the four eras, then, as a reading aid rather than as a natural feature of the data. The underlying reality is the eighteen yearly figures; the eras are a way of holding those figures in mind, drawn so that each era is internally similar in its rate of intake. A reader who preferred different boundaries could draw them from Table 2 without losing anything, and the substantive findings of the study — the concentration in 2013 and 2014, the front-loaded cumulative curve — do not depend on where the era lines are placed.
The update timestamp and why it is treated cautiously
The listings table carries a second timestamp recording when each listing was last updated, and this field was examined with the thought that it might indicate how current the corpus is. The examination led to the field being set aside as a measure of genuine revision, and the reason is given here so that the decision is transparent.
The update timestamps do not spread smoothly across the years. They cluster, instead, in a small number of years in blocks far too large to represent ordinary editorial revision: as the results report, particular years each carry the update stamp of more than a third of the entire corpus. A pattern in which a third of all listings are recorded as updated within a single year is far more consistent with a database-wide batch operation — a migration, a schema change, a bulk re-save — than with the editorial revision of individual listings. The update field is therefore reported in the results as a finding in its own right, but it is not used as a measure of how current the corpus’s content is, because it does not reliably record that.
Setting the update field aside is a deliberate choice in keeping with the conservative approach taken across this series. Where a field cannot be shown to record what it would need to record to support a claim, the field is reported as an observation and the claim is not made. The update timestamp is genuinely informative about one thing, the occurrence of database-wide write events, and the results report it for that. It is simply not informative about editorial revision, and the study declines to use it as though it were.
The reference date and the partial final year
Two consequences of the export date warrant a note. The export was taken on 25 May 2026, and every figure in this study is therefore current as of that date; the cumulative totals are the totals reached by then, and a later export would extend them.
The more important consequence concerns the final year. Because the export was taken in May, the 2026 cohort contains only the listings added in the first part of that year, and the figure of 259 listings for 2026 is not comparable with the full-year figures that precede it. The study marks 2026 as a partial year wherever it appears, and it does not include 2026 in any comparison that depends on a full year’s intake; the year is reported for completeness of the record but is read as the fragment it is.
The reference date also fixes the vantage point from which the corpus is described as old. When the study observes that the typical listing is more than a decade old, that age is measured from the addition date to 25 May 2026; a study of the same corpus conducted later would find the same cohorts correspondingly older, and the currency concern correspondingly sharper.
The descriptive design and reproducibility
As in the companion studies, the design is descriptive. The corpus is the entire production database of one directory, analysed in full; the yearly counts and cumulative totals are exact properties of that corpus rather than estimates, and no inferential test is applicable, because there is no sampling beyond the corpus itself. The procedure is reproducible: each figure is a count or a running sum of counts, derived by a deterministic reading of a single timestamp field. The study reports what the corpus is, with precision, and reserves interpretation for the discussion.
Results
The results are presented in five parts: the yearly intake of listings; the cumulative growth of the corpus; the grouping of the seventeen years into four eras; the concentration of growth within 2013 and 2014; and the separate finding from the update timestamp. Each part is accompanied by the relevant figure or table, and every figure and table is referred to directly in the text.
The yearly intake of listings
The directory’s intake of listings varied enormously from year to year. Figure 2 shows the number of listings added in each year from 2009 to 2026, and Table 2 reports the same counts together with the cumulative total and its share of the corpus.
| Year | Listings added | % of corpus | Cumulative total | Cumulative % |
|---|---|---|---|---|
| 2009 | 132 | 0.9% | 132 | 0.9% |
| 2010 | 96 | 0.7% | 228 | 1.6% |
| 2011 | 148 | 1.0% | 376 | 2.6% |
| 2012 | 1,547 | 10.8% | 1,923 | 13.4% |
| 2013 | 4,334 | 30.2% | 6,257 | 43.6% |
| 2014 | 2,829 | 19.7% | 9,086 | 63.3% |
| 2015 | 404 | 2.8% | 9,490 | 66.1% |
| 2016 | 371 | 2.6% | 9,861 | 68.7% |
| 2017 | 569 | 4.0% | 10,430 | 72.6% |
| 2018 | 472 | 3.3% | 10,902 | 75.9% |
| 2019 | 432 | 3.0% | 11,334 | 78.9% |
| 2020 | 478 | 3.3% | 11,812 | 82.2% |
| 2021 | 475 | 3.3% | 12,287 | 85.6% |
| 2022 | 300 | 2.1% | 12,587 | 87.6% |
| 2023 | 349 | 2.4% | 12,936 | 90.1% |
| 2024 | 400 | 2.8% | 13,336 | 92.9% |
| 2025 | 767 | 5.3% | 14,103 | 98.2% |
| 2026 (partial) | 259 | 1.8% | 14,362 | 100.0% |
The pattern is unmistakable. The directory’s first three years were quiet: 132 listings were added in 2009, 96 in 2010, and 148 in 2011, so that by the end of 2011 the corpus held only 376 listings, or 2.6% of its eventual size. The year 2012 marked a sharp change, with 1,547 listings added. The two years that followed were the directory’s largest by a wide margin: 4,334 listings were added in 2013 — nearly a third of the entire corpus in a single year — and 2,829 in 2014.
After 2014 the intake fell back abruptly, to 404 listings in 2015, and it has remained in the range of a few hundred listings a year ever since, with one modest exception: 2025, in which 767 listings were added. The figure for 2026, 259 listings, covers only the part of the year to the export date and is not comparable with the full years.
One feature of the yearly figures deserves emphasis before the cumulative view is taken up. The drop after 2014 is not gradual but abrupt: the intake falls from 2,829 in 2014 to 404 in 2015, a reduction of more than four-fifths in a single year. A decline of that steepness is not the natural tailing-off of a process losing momentum; it is the mark of a deliberate change of pace, the end of one regime of activity and the beginning of another. The yearly figures, in other words, do not describe a directory that gradually slowed but one that changed gear.
Cumulative growth
The cumulative total tells the same history as a trajectory rather than a sequence of annual figures. Figure 3 plots the size the corpus had reached at the end of each year.
The curve is sharply front-loaded. It is almost flat through 2011, when the corpus stood at 376 listings; it then climbs steeply for three years, passing 6,257 listings at the end of 2013 and 9,086 at the end of 2014; and from 2015 onward it settles into a long, shallow ascent that carries it from 9,086 to the final 14,362. The single most compact statement of the directory’s history is that it reached 63.3% of its eventual size by the end of 2014, in its sixth year of operation, and accumulated the remaining 36.7% over the following eleven and a half years.
The cumulative curve also makes a point about the directory’s present that the yearly figures alone might obscure. Although the directory continues to add listings every year, the curve is now so close to flat that each year’s contribution is barely visible against the accumulated total. A directory in this condition grows in absolute terms while being, in proportional terms, essentially stable; its size changes little from one year to the next, and its character is set by the listings it accumulated long ago.
Four eras of accumulation
Grouping the years into eras makes the long pattern legible at a glance. Figure 4 and Table 3 divide the seventeen years into four eras, with boundaries placed where the yearly intake changes character.
| Era | Years | Listings added | % of corpus |
|---|---|---|---|
| Formative | 2009–2012 | 1,923 | 13.4% |
| Intensive accumulation | 2013–2014 | 7,163 | 49.9% |
| Steady growth | 2015–2019 | 2,248 | 15.7% |
| Recent growth | 2020–2026 | 3,028 | 21.1% |
The era view states the central fact of the directory’s history in one line. The intensive era of 2013 and 2014, a span of just two years, added 7,163 listings, or 49.9% of the entire corpus — more than the other three eras combined, even though the formative era spans four years, the steady-growth era five, and the recent era seven. The four-year formative era contributed 13.4%, the five-year steady-growth era 15.7%, and the seven-year recent era 21.1%. Half of everything the directory holds was gathered in one two-year window, and the other half was gathered across the fifteen years around it.
The era figures repay one further comparison. Measured as an average annual intake, the contrast is starker still than the totals suggest: the intensive era added roughly 3,580 listings a year, while the steady-growth era added roughly 450 and the recent era roughly 430. The intensive era’s annual rate was, on these figures, around eight times that of the periods on either side of it. The two years of 2013 and 2014 were not merely the largest era; they operated at a rate the directory has not approached before or since.
The concentration of growth within 2013 and 2014
The concentration of growth is visible not only between years but within them. Examined month by month, the intensive era is itself a sequence of discrete bursts rather than a sustained even flow. The single largest month in the directory’s history is October 2013, in which 1,772 listings were added — more listings in that one month than in the whole of the directory’s first three years together. March and April 2013 added 655 and 734 listings; July 2014 added 731. Between these bursts lay quiet months: June 2013, for instance, added 31.
This monthly pattern is the signature of editorial accumulation proceeding in concentrated campaigns rather than as a steady trickle. A burst of several hundred or more listings in a single month, separated from the next burst by quiet weeks, is what the processing of a backlog, or a directed period of curation, looks like in the data. The interpretation of why these campaigns fell in 2013 and 2014 is taken up in the discussion; the result reported here is simply that the directory’s largest era was, internally, a series of distinct intensive efforts.
The monthly view also disposes of one possible misreading of the annual figures. A year that added several thousand listings might be imagined as a year of uniformly heightened activity, a steady elevated flow sustained across all twelve months. The months show that this is not what happened: the intake within 2013 and 2014 was itself concentrated, piled into particular months and thin in others. The directory’s largest period was built not from a sustained high rate but from a sequence of sharp, separable pushes.
The shape of the growth curve in one description
The yearly intake, the cumulative curve, and the four eras are three views of a single shape, and it is worth stating that shape in one description. The directory’s history has three phases. There is a slow start, the three years from 2009 to 2011, in which the corpus barely grew. There is an intensive build, the three years from 2012 to 2014, in which the curve rises almost vertically. And there is a long plateau, the eleven and a half years from 2015 onward, in which the curve climbs only gently.
The defining feature of this shape is that it is a step rather than a slope. A directory might, in principle, have grown along a straight line, adding a similar number of listings every year, or along a smooth accelerating curve. This directory did neither. It grew in a single great step, a near-vertical rise set between two near-horizontal stretches, and the step is narrow: it occupies three of the seventeen years and accounts for the majority of the corpus.
Describing the curve as a step is not merely a figure of speech; it has an interpretive consequence carried through the discussion. A corpus built in a step is a corpus most of whose listings share an age, because most of them were created in the same narrow window. The step shape and the age skew examined later are the same fact stated twice, once as a curve and once as a distribution of listing ages.
The update field: batch events rather than organic revision
The second timestamp, recording when each listing was last updated, was examined separately, and its pattern is reported here as a finding in its own right. Table 4 shows, for each year, how many listings carry an update stamp from that year.
| Year of last update | Listings | % of corpus |
|---|---|---|
| 2012 | 9 | 0.1% |
| 2013 | 19 | 0.1% |
| 2014 | 127 | 0.9% |
| 2015 | 9 | 0.1% |
| 2016 | 1,047 | 7.3% |
| 2017 | 10 | 0.1% |
| 2018 | 13 | 0.1% |
| 2019 | 114 | 0.8% |
| 2020 | 5,236 | 36.5% |
| 2021 | 398 | 2.8% |
| 2022 | 213 | 1.5% |
| 2023 | 5,679 | 39.5% |
| 2024 | 424 | 3.0% |
| 2025 | 794 | 5.5% |
| 2026 | 270 | 1.9% |
The update timestamps do not behave like the record of ordinary editorial revision. They are concentrated in two years to an implausible degree: 5,236 listings, 36.5% of the corpus, carry an update stamp from 2020, and 5,679, 39.5%, carry one from 2023, while most other years register only a handful. A pattern in which more than a third of all listings are recorded as updated within a single calendar year, twice over, is consistent with database-wide batch operations — a migration, a re-save, a structural change applied across the table — and not with the individual revision of listings by editors. For this reason, as the methodology stated, the update field is reported here as an observation about the data but is not used as a measure of how current the corpus’s content is; it records when rows were written, which is not the same as when listings were meaningfully reviewed.
Discussion
The results establish that the directory’s growth was sharply uneven, that half the corpus was added in the two years 2013 and 2014, that the cumulative curve is front-loaded, and that the update timestamp records batch events rather than editorial revision. The discussion now interprets these findings: it reads the 2013–2014 concentration against the directory’s documented editorial history, considers what it means that the corpus is mostly more than a decade old, draws out the consequences for data currency, relates the temporal finding to the three companion studies, examines the steady recent years, and sets out the implications for businesses and for the directory.
The 2013–2014 period and the directory’s editorial recognition
The concentration of half the corpus in two years calls for an explanation, and the editorially curated model of the directory, set out in the background, indicates where to look for one. A directory that adds the majority of its listings through editorial review can accumulate in concentrated bursts, because editorial effort can be directed and intensified in a way that an external flow of self-submissions cannot.
There is, in this case, an independent and documented record that bears directly on the period. During 2013 and 2014 the directory received eight awards recognising its editorial practice — specifically, its exercise of editorial discretion and its practice of adding resources through manual review. The two records describe the same two years from two directions: the addition dates show, from inside the database, that 2013 and 2014 were the directory’s most intensive period of accumulation, and the awards show, from outside it, that those same two years were when the directory’s editorial curation was most actively recognised. It can be concluded that 2013 and 2014 were a deliberate period of intensive editorial accumulation, in which the directory built the larger part of its corpus through manual curation and was externally acknowledged for exactly that work. The monthly bursts within the period — the 1,772 listings of October 2013 foremost among them — are consistent with this reading: they are what concentrated editorial campaigns, rather than a steady external trickle, leave in the data.
It is worth being careful about the strength of this conclusion. The addition dates and the award record are two independent pieces of evidence, and they agree; that agreement is genuine corroboration, and it makes the reading of 2013 and 2014 as a deliberate editorial period considerably more than a guess. What the two records together cannot supply is the finer detail of why the effort was mounted when it was, or how it was organised internally. The conclusion is therefore confident as to the character of the period — intensive, deliberate, editorial — and silent, by necessity, on the particulars the data do not contain.
Why an organically and editorially built corpus grows this way
The provenance of the corpus, as in the companion studies, conditions how the growth curve should be read. Because the directory has never used paid advertising to acquire listings, the curve is not paced by an advertising budget; there is no media spend whose rises and falls the intake would track. The curve reflects, instead, editorial decisions and the organic flow of submissions, and the first of these is the larger component.
This permits a reasoned supposition about what the curve records. It can be concluded that the growth curve is, to a large degree, a fairly direct trace of editorial intent — a record of when the directory decided to build its corpus intensively and when it decided to maintain it. The step shape is, on this reading, the fingerprint of a particular history of decisions: a concentrated effort to assemble a substantial corpus in 2012 to 2014, followed by a settled judgement to maintain that corpus at a steady rate rather than to keep expanding it at the earlier pace.
One qualification, familiar from the companion studies, keeps the supposition honest. Some part of the intake in every year is genuine self-submission, not editorial addition, and the curve is the sum of both. The term editorially paced should therefore be read as a statement about the dominant component of the growth, not as a claim that no listing arrived except by an editor’s hand. The directory’s own estimate, that approximately ninety per cent of entries are added editorially, is the basis for treating editorial decision as the principal driver of the shape.
Reading a corpus that is mostly more than a decade old
A direct consequence of the front-loaded growth curve is that the corpus, viewed from the 2026 analysis date, is mostly old. By the end of 2014, the directory had accumulated 63.3% of its eventual size; the median listing, ordered by addition date, belongs to the intensive era of 2013 and 2014. The typical listing in this directory is, as a database record, some eleven or twelve years old.
This is a defining structural fact about the corpus, and it reframes how the directory should be understood. It is not a recent corpus with a long tail of older entries; it is a largely decade-old corpus with a comparatively thin recent layer. Any property of the corpus measured at the analysis date — its completeness, its geographic coverage, its category structure — is, in the main, a property of listings created during or before 2014. The companion studies, in measuring the corpus as it stands, were therefore measuring, for the most part, the state of a set of listings created more than a decade ago, and the temporal finding supplies that context to all of them.
What an age-skewed corpus means for currency
The data-quality literature identifies currency, or timeliness, as a dimension of quality distinct from completeness and accuracy (Wang & Strong, 1996; Pipino, Lee, & Wang, 2002), and an age-skewed corpus is, by its nature, a corpus at risk on that dimension. Figure 5 sets out the mechanism.
A listing is, in the ordinary case, correct when it is created. What follows is the problem: in the years that pass afterward, the business may move premises, change its telephone number, change ownership, or cease trading altogether, and a listing that is never revisited will reflect none of these changes. For a corpus in which the typical listing is more than a decade old, the lower path in Figure 5 — the untended listing whose currency has decayed — is not a marginal case but a description of a large part of the corpus.
This finding bears directly on a limitation of the companion study of completeness. That study measured whether a field held a value, not whether the value was accurate or current, and it noted that this made its figures an upper bound on completeness in any stricter sense. The temporal finding shows why that limitation matters: a filled field in a listing created a decade ago and never revised is materially more likely to be stale than a filled field in a recent one. And the update-timestamp finding compounds the concern, because it removes the reassurance that the update field might have offered: since that field records batch writes rather than editorial review, the data cannot tell us which old listings have genuinely been checked. This study cannot measure how stale the corpus is — that would require verifying listings against the world — but it can locate where the currency risk is concentrated, and the answer is the large 2013 and 2014 cohorts.
Growth, completeness, and geography across the series
This is the fourth of four studies of a single corpus, and the temporal finding supplies a dimension the other three did not have. The first study measured how listings are distributed across categories; the second, how complete listings are; the third, where listings are located. Each measured the corpus as it stands at the analysis date. The present study establishes that the corpus as it stands is, in the main, a corpus of listings created during or before 2014.
This permits a reasoned conjecture that joins the studies together. The incompleteness documented in the second study and the geographic gaps documented in the third are properties of a corpus that is mostly old, and an old listing left unrevised is at once a completeness risk and a currency risk. It may be conjectured that the sparse listings and the old cohorts substantially overlap — that a listing created quickly during a 2013 editorial campaign and never afterward revisited would be, today, both an old listing and an incomplete one. The present study cannot demonstrate that overlap, because it has not crossed the addition date with the completeness measures listing by listing; a combined analysis, noted below as future work, could establish it directly. The conjecture is offered as the most plausible way the four studies’ findings fit together.
If that conjecture were borne out, it would carry a constructive implication rather than only a critical one. A directory able to identify the listings that are at once old, incomplete, and unrevised would have, in that identification, a precise work list: the same listings would appear on it whether the directory approached the corpus from the angle of completeness, of geography, or of currency. The four studies, taken together, would then point not to four separate maintenance tasks but to one population of listings on which all four concerns converge.
The recent years: a steady directory rather than a growing one
From 2015 onward, the directory’s intake settled into a narrow band of a few hundred listings a year, and it has stayed there. This steadiness is itself a finding. After the intensive accumulation of 2013 and 2014, the directory did not continue to grow rapidly, nor did it stop adding listings; it shifted into what is best described as a maintenance phase, adding modestly and steadily rather than expanding.
One recent year departs from the band. In 2025, 767 listings were added, against the 300 to 570 of the surrounding years — a modest rise, and possibly the sign of a renewed editorial effort. It would be unwise to read a single year as a trend, particularly when the following year, 2026, is represented in the data only by its first months. Whether 2025 marks the start of a new period of growth or is a single busier year among steady ones is a question that only a later export can answer, and it is taken up among the projections.
The directory in the wider history of web discovery
The timeline reconstructed here can be placed against the broader history of how the web has been searched. Curated directories were a primary mode of discovery in the web’s first years, but by the time this directory was founded in 2009, and still more by its intensive era of 2013 and 2014, the algorithmic search engine had long been the dominant means by which most people found most things (Brin & Page, 1998; Kleinberg, 1999). The directory built the larger part of its corpus, in other words, in a period when the directory as a category was already past its position as the public’s first recourse.
That the directory nonetheless built and has maintained its corpus through that period invites a reasoned conjecture about the value such a directory offers. Its value proposition shifted: from being a primary way to find sites, a curated directory became, increasingly, a structured and editorially vetted source of organised information about businesses, useful less as a destination in itself than as a curated dataset. It may be conjectured that this shift is now continuing in a particular direction. As discovery becomes mediated by systems that read and synthesise structured data, a curated, organised, human-vetted body of business information may find a renewed kind of relevance, and a directory’s persistence through the intervening period would, on that conjecture, prove to have been worthwhile.
Implications for businesses
For a business that holds a listing in this directory, the temporal finding carries a direct and practical message. Given that the typical listing was created more than a decade ago, a business whose listing dates from the directory’s intensive era should assume, as a working hypothesis, that the listing now carries information that is at least partly out of date. The age of the corpus makes staleness the expected condition of an untended listing rather than an unlucky exception.
The action that follows is the same one the companion studies recommended, with the temporal finding adding urgency to it. A business should locate its listing, read it as a stranger would, and bring its content — the address, the telephone number, the description, the location — up to date. For a listing created during 2013 or 2014, this is not a refinement but a correction of a record that has had more than a decade in which to fall out of step with the business it describes. Refreshing an old listing is, in effect, completing and locating it anew, and it is no more costly than doing so for the first time.
Implications for the directory
For the directory, the age structure of the corpus is an agenda. A corpus in which most listings belong to decade-old cohorts is a corpus with a large, identifiable body of currency risk, and that risk can be addressed systematically: a programme of review that works through the corpus cohort by cohort, beginning with the oldest, would target the listings most likely to have decayed. The intensive era that built half the corpus is also, a decade on, the era most in need of revisiting.
A second implication concerns measurement. The update timestamp, as this study found, cannot at present distinguish a genuine editorial review from a database-wide batch write, and so the directory has no reliable internal measure of how current its corpus is. A directory that recorded genuine review events separately from batch operations would gain exactly that measure, and could then track currency over time as the companion studies proposed tracking completeness and coverage. The steady recent intake, finally, is a choice point: the directory may continue in its maintenance phase, or it may, as the 2025 figure perhaps hints, undertake a renewed period of growth, and the temporal record gathered here is the baseline against which either path would be measured.
Whichever path the directory takes, the value of having measured its own history should not be understated. A directory that does not know the shape of its own accumulation cannot tell a maintenance phase from a decline, nor a genuine revival from a single busy year; it has only its present size, which is a sum that conceals its own composition. The reconstruction offered here gives the directory a record against which any future change — in intake, in currency, in the balance between the two — can be read as a change rather than merely observed as a state.
Projections and future developments
This study is a snapshot of one corpus at one reference date, and it is designed to be repeated. The projections below are reasoned conjectures drawn from the observed pattern and the mechanisms discussed; they are not statistical forecasts, and they are marked as conjectures.
The first projection concerns the rate of growth. It can be projected that, absent a deliberate new campaign of editorial accumulation, the directory’s intake will continue in the band of a few hundred listings a year that has held since 2015, and that the corpus will therefore grow only slowly from its present size. Whether the elevated figure of 2025 is the beginning of a departure from that band or a single busier year cannot be determined from one year of data, and it is precisely the kind of question a later export would resolve.
The second projection concerns currency rather than size. It can be projected that, in the absence of a systematic review programme, the currency risk identified in the discussion will grow, simply because the large old cohorts age further with every passing year. The gap between the directory’s nominal size — the count of listings it holds — and its genuinely current size — the count of listings that still describe their businesses accurately — may be projected to widen over time unless review is undertaken. Size and currency are different measures, and a corpus can grow in the first sense while declining in the second.
The third projection concerns the changing cost of staleness. As retrieval-augmented and answer-composing systems increasingly mediate discovery (Lewis et al., 2020; Aggarwal et al., 2024), an out-of-date listing is no longer merely unhelpful; it becomes a source of active misinformation, supplying such a system with a wrong address or a defunct telephone number to present as fact. It may be conjectured that currency will accordingly become a more decisive measure of a directory’s value than size. A future version of this study could cross the addition-date cohorts with the completeness and geographic measures of the companion studies, testing directly the conjecture that old and incomplete listings overlap; and repeating the temporal analysis against later exports would show whether any review programme the directory undertakes is succeeding.
The projections are, in the end, an invitation to repetition rather than a set of predictions to be scored. The deliberate language of conjecture marks the distance between what the data establish and what they merely suggest, and that distance is bridged only by gathering the data again. A second export, read against this one, would convert the snapshot of this study into the first interval of a time series, and it is as the opening entry in such a series, more than as a freestanding account, that the present study is best understood.
Limitations of the study
The limitations follow from the design and are stated plainly. The study analyses a single corpus, the database of one directory, and is descriptive rather than inferential; it characterises that corpus and does not generalise to directories at large, and no significance testing is applied because there is no sampling beyond the corpus itself.
The central interpretive limitation concerns what the addition date records. It is the date a listing entered the directory’s database, which, for an editorially added listing, is the date an editor created the record — not the date the business itself was founded, nor the date the business first became known to the directory. The growth curve is therefore a curve of editorial and submission activity, not a census of business formation, and it should be read as such. The reliability of the field as a record of database entry is, by contrast, high: the timestamp is set automatically and is present for every listing.
A further limitation concerns the interpretation of the quiet years. The low intake of 2009 to 2011 is read in this study as a formative period, and the low intake from 2015 onward as a maintenance phase, but the addition dates alone cannot distinguish a year of deliberate restraint from a year in which the directory simply had little editorial capacity to spare. The characterisation of the eras as formative, intensive, steady, and recent is therefore a description of the rates observed, not an account of the intentions behind them, and it should be read in that spirit.
Several narrower limitations should also be recorded. The grouping of the seventeen years into four eras is a presentational device, and the era boundaries, though placed where the yearly intake changes character, are a choice; the year-by-year table is provided so that the grouping can be checked. The update timestamp is reported as an observation but, as explained, is not used as a measure of revision.
The study measures when listings were added, not their present accuracy or currency, so it cannot state how stale the corpus is, only where the risk is concentrated. The year 2026 is represented by a partial year. And the entire analysis reflects the single reference date of 25 May 2026.
Concluding remarks
This study set out to reconstruct how the 14,362 listings of a curated business directory accumulated over the seventeen years of the directory’s operation. The accumulation was found to be sharply uneven. After three quiet formative years, the directory’s intake rose steeply in 2012 and reached its maximum in 2013 and 2014, when 4,334 and 2,829 listings were added; those two years alone account for 49.9% of the entire corpus. The cumulative curve is correspondingly front-loaded: the directory reached 63.3% of its eventual size by the end of 2014 and accumulated the remaining 36.7% over the following eleven and a half years, settling into a steady intake of a few hundred listings a year with a single modest rise in 2025.
The concentration of growth in 2013 and 2014 was read, not as an unexplained spike, but against the directory’s documented record of editorial recognition in those same two years; the internal evidence of the addition dates and the external evidence of the directory’s eight awards describe one period of intensive, deliberate editorial accumulation. The lasting consequence is an age-skewed corpus: the typical listing is more than a decade old, and an old listing left untended carries a real and rising currency risk, one that the directory’s update timestamp — recording batch operations rather than genuine review — cannot be used to measure.
Across the connected series, the temporal finding gives the studies of completeness and geography their missing dimension: the gaps those studies measured are gaps in a corpus that is mostly old, and old, incomplete, and untended are conditions that plausibly coincide in the same listings. For a business, the implication is to treat an old listing as one needing review; for the directory, it is to address the currency risk cohort by cohort and to make currency measurable in the first place. This study is the fourth in a connected series analysing the same corpus; the fifth and final study will draw the findings of all four together into a single account of the state of the directory’s listings.
A closing reflection concerns, as in the companion studies, the standing of this work. An analysis of a directory’s own database, conducted and published by the directory, is at once authored by its subject and about its subject, and the temporal study reports findings that are, in part, uncomfortable for that subject: that the corpus is mostly old, and that its update field cannot vouch for its currency. Reporting them plainly is what the descriptive design requires, and the methodology, the treatment of the partial final year, and the cautious handling of the update field have all been set out so that any reader may reproduce the figures and weigh the interpretation independently.
Related reading
- How complete is the average business listing? A field-level analysis of 14,362 records in a curated directory
- The geography of a curated business directory: where 14,362 listings are located, and where they are not
- The state of business listings in 2026: a synthesis of four analyses of a curated directory of 14,362 records
- What we learned from analysing all 14,362 listings in our directory
References
Aggarwal, P., et al. (2024). Improving search systems with large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 5–16). Association for Computing Machinery.
Akerlof, G. A. (1970). The market for “lemons”: Quality uncertainty and the market mechanism. The Quarterly Journal of Economics, 84(3), 488–500.
Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., & Raghavan, S. (2001). Searching the Web. ACM Transactions on Internet Technology, 1(1), 2–43.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.
Broder, A. (2002). A taxonomy of web search. ACM SIGIR Forum, 36(2), 3–10.
Google. Google Search Essentials. Google Search Central documentation. [Industry guidance, not peer-reviewed.]
Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.
Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (Vol. 33, pp. 9459–9474).
Nelson, P. (1970). Information and consumer behavior. Journal of Political Economy, 78(2), 311–329.
Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5), 323–351.
Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211–218.
Pirolli, P., & Card, S. K. (1999). Information foraging. Psychological Review, 106(4), 643–675.
Spence, M. (1973). Job market signaling. The Quarterly Journal of Economics, 87(3), 355–374.
Stigler, G. J. (1961). The economics of information. Journal of Political Economy, 69(3), 213–225.
Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5–33.

