How complete is the average business listing? A field-level analysis of 14,362 records in a curated directory

Author. Gombos Atila Robert, PhD. Owner and Chief Executive Officer, Jasmine Business Directory (D-U-N-S 10-276-4189), Valley Cottage, New York. ORCID: 0000-0001-6468-2811. Correspondence through the author profile.

Data statement. The material analysed in this study was taken directly from the production database of the Jasmine Business Directory. The directory has operated since 2009, has never used paid advertising to acquire listings, and adds about ninety per cent of its entries through manual editorial review. The analysed export is current as of 25 May 2026. This study is the second in a connected series; its companion examines category concentration in the same corpus.

Abstract

This study measures how complete the individual business listings in a curated web directory are, field by field, across the directory’s entire corpus. The data are the complete listings table of the Jasmine Business Directory, joined to its address, contact, and content tables. The directory has operated since 2009, has grown without paid advertising, and adds most of its entries through manual editorial review. The analysed export, current as of 25 May 2026, contains 14,362 listings. For each listing the study records whether ten informational fields carry a value: company name, street, city, state, postal code, country, telephone, title, description, and keywords. It then computes two composite measures of completeness.

The corpus is substantially incomplete. The address and contact fields are filled for roughly a third of listings, between 30.2% and 34.3%. The content fields, title and description, are filled for a little over half, at 56.8% and 56.7%. The mean listing carries 3.67 of the ten fields. A composite of four core contact fields is sharply U-shaped: 64.6% of listings have none of the four filled, 28.1% have all four, and few sit between.

The study reads these findings through the data-quality literature, in which completeness is an established dimension (Wang & Strong, 1996; Pipino, Lee, & Wang, 2002), and through the economics of information, in which a listing is information offered to a searcher (Nelson, 1970; Stigler, 1961). It argues that an incomplete listing carries a real but largely invisible cost, that the cost rises with the crowding of the listing’s category, and that completing a listing is an unusually inexpensive form of competitive advantage. Reasoned projections for the corpus’s completeness are offered.

Keywords. business directories; listing completeness; data quality; data completeness; online visibility; business listings; information economics; local search; digital discovery; curated directory; field-fill rate; contact information.

Introduction

A business directory listing can be one of two very different things. It can be substantial: the business named, described, placed at a verifiable address, reachable by telephone, its work explained in a few accurate sentences. Or it can be a near-empty shell, little more than a name and a link, with the fields that would tell a searching person who the business is and how to reach it left blank. Between a directory of the first kind of listing and a directory of the second lies a large difference in usefulness, and yet how complete the listings in a real directory actually are is seldom measured directly.

The question this study asks is therefore simple, put precisely. For each of the listings in a curated business directory, which informational fields carry a value and which are empty? Completeness is treated field by field, whether a listing has a company name, an address, a telephone number, a title, a description, keywords, and it is summarised by two composite measures. Throughout, the study measures the presence of information and not its accuracy or quality. A field containing any value is counted as filled, and whether that value is correct or current is a separate matter the analysis does not reach.

The approach is descriptive and quantitative. The complete listings table of the directory was extracted and joined to the address, contact, and content tables that hold the relevant fields. For each of the 14,362 listings the presence of each field was recorded, and the resulting field-fill rates, together with two composite completeness measures, were computed across the whole corpus. No survey and no experiment are involved. As with its companion study, this is an exploratory empirical analysis of a single corpus, and its claims are held to that scope.

The data and their setting are the same as in the companion study and warrant brief restatement. The corpus is the production database of the Jasmine Business Directory, founded in 2009 and headquartered in Valley Cottage, New York. The directory has never used paid advertising to acquire listings, and it adds about ninety per cent of its entries through manual editorial review rather than unmediated self-submission. The export analysed here was taken on 25 May 2026 and contains 14,362 listings. Because the corpus has accumulated organically and through editorial curation, its completeness reflects how listings have actually been created and maintained, not how a promotional process has dressed them.

An advertising-free corpus is, for a study of completeness, an unusually informative object. Where listings are acquired through paid placement, their completeness partly reflects the requirements of the acquisition process rather than the unprompted behaviour of the businesses listed. Here, no such process intervenes, and the completeness measured is closer to the completeness that businesses and editors produce when nothing obliges them to do more. The corpus therefore gives a relatively clean view of how complete listings are when their completeness is no one’s commercial requirement.

Why listing completeness matters can be stated at three levels. For the business that holds a listing, completeness determines whether a person who finds the listing is given the means to act on it, and it bears on whether the listing is surfaced by the systems that mediate discovery. For the directory, the aggregate completeness of its listings is a direct measure of the usefulness of the asset it maintains. And for the wider understanding of digital discovery, the completeness of a real, advertising-free corpus shows how much information businesses actually supply when they are not prompted to do so by a paid process. Each of these is developed in the discussion.

The study makes three contributions. First, it gives a precise field-level measurement of listing completeness in a substantial real-world directory. Second, it reads that measurement through the data-quality literature and the economics of information, connecting an observed pattern to established theory. Third, it argues that the cost of an incomplete listing is real, that it rises with the crowding of the listing’s category, and that completeness is therefore an inexpensive and reliable form of advantage; and it sets out reasoned projections for how the corpus’s completeness may evolve. The paper proceeds through a review of the relevant literature, a description of the dataset, the methodology, the results, the discussion, the projections, the limitations, and the concluding remarks.

The study draws on three bodies of work: the treatment of completeness as a dimension of data quality, the economics of information as it bears on what a listing is, and the literature on how completeness affects discovery and choice. Each is summarised here, and the section closes by stating the gap the study addresses.

Data quality and the dimension of completeness

The systematic study of data quality established that the quality of data is not a single property but a set of distinct dimensions. Wang and Strong (1996), in the work that organised the field, built a hierarchical framework of data-quality dimensions grounded in what matters to the consumers of data rather than only to its producers. Their framework places completeness, the extent to which data are of sufficient breadth and are not missing, among the dimensions that data consumers consistently regard as important. Quality, on this account, is fitness for the use to which data are put, and a record missing the values a user needs is deficient regardless of how accurate its present values may be.

Pipino, Lee, and Wang (2002) carried the framework toward measurement, setting out principles for building usable data-quality metrics and treating completeness as a dimension that can be assessed by the ratio of present values to the values that should be present. A directory listing is, in these terms, a data record, and the directory’s corpus is a body of records whose completeness can be assessed in exactly the manner the data-quality literature prescribes: by counting, field by field, the values present against the values a complete record would carry. This study adopts that approach directly, and its central measures, field-fill rates and composite completeness scores, are completeness metrics in the established sense.

The directory listing as a record and as a representation

A directory listing is two things at once, and the distinction organises the literature this study draws on. It is a database record, a structured set of fields held in a directory’s tables, and it is a public representation of a business, the form in which that business presents itself to anyone who consults the directory. The two aspects are not in tension, but they are assessed by different literatures.

The data-quality literature treats the listing as a record. From that standpoint, completeness is a measurable property of the record: the proportion of its fields that carry values, assessed exactly and without reference to anyone’s purposes. The economics of information treats the listing as a representation. From that standpoint, completeness matters because of what the representation does: how much it tells a searcher, and how credibly it signals a real and serious business.

This study measures completeness as a record property, because that is what the data support and what can be assessed exactly. But it reads completeness through the listing’s role as a representation, because that is where the consequences of completeness are felt. The two views are complementary, and keeping both in mind prevents two errors: treating a field count as if it were the whole story, and discussing what a listing communicates without measuring what it actually contains.

A listing as information offered to a searcher

The economics of information clarifies what a directory listing is for, and therefore what an incomplete one fails to do. Stigler (1961) established that search is costly and that institutions which reduce the cost of search have value; a directory listing is an instrument through which a business becomes findable at lower cost to the searcher. Nelson (1970, 1974) analysed how buyers acquire information about sellers, separating qualities that can be assessed before contact from those that cannot, and treating the information available about a seller as the raw material of the buyer’s decision.

A listing is, in this light, information offered to a searching party, and an incomplete listing simply offers less of it. The point extends through the related literature on informational asymmetry and signalling. Akerlof (1970) showed that markets function poorly when buyers cannot assess sellers, and Spence (1973) showed that observable attributes can serve as signals of unobservable quality; Darby and Karni (1973) extended the analysis to qualities a buyer cannot verify even after purchase. A complete listing takes part in these dynamics as a fuller and more credible signal than a bare shell: it conveys more information, and by its evident completeness it indicates a business that has taken the trouble to present itself properly. An empty listing conveys little and signals less.

Completeness, discovery, and limited attention

The third body of work concerns how completeness affects whether a listing is found and acted upon. Pirolli and Card (1999) introduced the notion of information scent: the cues, such as titles and descriptions, by which a searcher judges the likely value of a source before committing to it. A listing’s content fields are precisely such cues; a listing without them emits little scent and is harder for a foraging searcher to evaluate. Miller (1968) and the broader literature on attention establish that a searcher’s capacity to consider options is narrow, and Nah (2004) documented how readily users abandon a digital interaction that fails to reward them quickly.

Broder’s (2002) taxonomy of search intent supplies a final connection. A searcher with transactional intent, one seeking to contact or transact with a business, needs precisely the contact information that an incomplete listing lacks; for such a searcher, a listing without an address or a telephone number has failed at the moment of greatest need. The systems that mediate discovery also depend on the information a listing carries: industry guidance on local discovery holds that complete and consistent information helps a business be surfaced for relevant queries (Google, local-ranking guidance). An incomplete listing is, by this combined account, both harder to surface and, once surfaced, less able to convert the searcher’s attention into action.

The gap addressed by this study

These literatures are well developed individually, but they are seldom brought to bear together on a precise, openly documented measurement of completeness in a real, substantial, advertising-free business directory. The data-quality literature supplies the concept and the metric; the economics of information explains what completeness is for; the discovery literature explains why it matters for being found and chosen. The present study occupies the point at which the three meet. It does not claim that its specific figures apply to all directories; it claims to measure one well-defined corpus exactly, and to read that measurement in a way that is informative beyond itself.

The dataset: a curated directory’s listing records

The empirical material for this study is the production database of the Jasmine Business Directory, the same corpus analysed in the companion study of category concentration. The directory was founded in 2009 by Pecsi Andras and Robert Gombos, is headquartered in Valley Cottage, New York, and has operated continuously since then as a general business directory organised by subject category.

Two characteristics of the directory’s operation bear on how completeness should be read, and they belong to the dataset’s provenance. The directory has never used paid advertising to acquire listings; its corpus has accumulated through organic submission and editorial addition. And the directory adds about ninety per cent of its entries through manual editorial review, an editorial practice for which it received eight awards during 2013 and 2014. The completeness measured here is therefore the completeness of listings as they have actually been created and maintained over seventeen years, not the completeness of records dressed by a promotional process.

The directory’s seventeen-year operating history also bears on the reading. The corpus is not a recent snapshot of newly created listings but an accumulation reaching back to 2009, and the completeness measured here is therefore the completeness of listings of widely varying age. Some have been present for many years; others are recent. The single reference date captures all of them at once, and the figures should be read as the state of a long-accumulated corpus rather than as the completeness of any single cohort.

The unit of analysis is the individual listing record, of which the export contains 14,362. For each listing, the directory’s database can hold information of three kinds, spread across linked tables.

An address table can hold a company name, a street, a city, a state or region, a postal code, and a country. A contact table can hold a telephone number. A content table can hold a title, a description, and keywords.

The directory’s data also carries an internal flag marking a listing as a full listing. This study examines the ten informational fields named above, together with the full-listing flag, across all 14,362 listings. Table 1 summarises the dataset and the fields examined.

**Table 1.** The dataset and the fields examined.
Attribute	Value
Source	Jasmine Business Directory, production database
Directory founded	2009
Headquarters	Valley Cottage, New York
Listing-acquisition model	Organic submission and editorial addition; no paid advertising
Editorial curation	Approximately 90% of entries added through manual review
Export reference date	25 May 2026
Listings (universe of analysis)	14,362
Address fields examined	Company name, street, city, state/region, postal code, country
Contact field examined	Telephone
Content fields examined	Title, description, keywords
Internal flag examined	Full-listing flag
Listings with no address record at all	123

Methodology

The methodology is set out in four parts: the extraction and joining of the tables that hold the fields, the definition of the presence criterion by which a field is judged filled, the construction of the completeness measures, and a note on the descriptive design. The analytical framework that connects the database records to the study’s completeness measures is shown in Figure 1.

Figure 1. The analytical framework of the study. Each listing is joined to the tables holding its fields; a presence criterion judges each field filled or empty; and three completeness measures are derived from the resulting matrix of presence.

Data extraction and the joined tables

The dataset was obtained as a complete export of the directory’s production database on the reference date. The fields relevant to completeness are not held in a single table but spread across the linked tables that Figure 1 depicts: an address table, a contact table, and a content table, each of which records information against a listing identifier. The analysis joined each of these tables to the listings table by that identifier, so that for every one of the 14,362 listings the presence or absence of each field could be determined.

One consequence of the joined structure must be recorded. A listing has the fields of a given table only if a corresponding row exists in that table; where no row exists, the listing is treated as not having those fields filled. In the address table, 123 listings have no corresponding row at all, and these are counted as lacking all six address fields. This is the correct treatment for a study of completeness, since a listing with no address record is, for the searcher, a listing with no address, but it is stated here so that the handling of the joined tables is explicit.

The joined structure is a property of the directory’s database design and not an obstacle introduced by the analysis. Holding address, contact, and content information in separate tables linked by identifier is a conventional and sound way to organise a directory’s data. The analysis simply reverses the separation for the purpose of measurement, reuniting each listing’s fields so that completeness can be assessed at the level of the listing as a searcher would meet it.

Field definitions and the presence criterion

The presence criterion is deliberately simple. A field is counted as filled if it contains any non-empty value, and as empty otherwise. The criterion records the presence of information, not its accuracy, currency, or quality; a field holding an outdated address or a perfunctory description is counted as filled, because it holds a value. This is a genuine limitation of the measure and is treated as such in the limitations section, but it is also the criterion that the data can support cleanly and that another analyst would apply identically.

Measuring presence rather than quality is also the conservative choice for the study’s central argument. If completeness measured as mere presence is already low, then completeness measured more demandingly, requiring each present value to be accurate and current, could only be lower. The figures reported here are therefore upper bounds on the corpus’s completeness in any stricter sense, and the finding that the corpus is substantially incomplete is not weakened by the simplicity of the criterion.

Completeness measures

Three measures are computed from the matrix of field presence. The first is the field-fill rate: for each of the ten informational fields, the proportion of the 14,362 listings that have it filled. The second is the count of fields filled per listing, an integer between zero and ten, whose distribution across the corpus reveals whether listings cluster at particular levels of completeness. The third is a contact composite: a count, between zero and four, of how many of four core contact fields, country, company name, street, and telephone, a listing has filled. These four were chosen because together they answer the searcher’s basic questions of who a business is, where it is, and how to reach it.

Two further quantities are reported as context. The full-listing flag, an internal marker set by the directory, is reported as found, so that the directory’s own notion of a complete listing can be set against the field-level measurement. And a cross-tabulation of two representative fields, the title, a content field, and the street, an address field, is reported to show how the two kinds of field co-occur. These contextual quantities are described where they appear in the results.

The choice of ten fields and two composites

The ten fields examined were not chosen arbitrarily; they are the fields through which a listing answers the questions a searcher brings to it. Four broad questions organise them, and each maps onto a group of fields. The question of who the business is maps onto the company name and the title; the question of what it does maps onto the description and the keywords; the question of where it is maps onto the street, city, state, postal code, and country; and the question of how it is reached maps onto the telephone. A listing that answers all four questions is, in the ordinary sense, complete, and the field count measures how many of the answers a listing actually supplies.

The two composites serve two different readings of completeness. The count of fields filled, from zero to ten, measures overall completeness and reveals, through its distribution, whether listings cluster at particular levels. The four-field contact composite isolates the subset of fields a searcher needs in order to act, identity, location, and a means of contact, because a listing may be informative about what a business does and still leave the searcher unable to reach it. The first composite describes the listing as a whole; the second describes its fitness for the searcher’s final step.

One choice within the field selection should be made explicit. Keywords are treated here as a content field, alongside title and description, because they describe what a business does rather than where it is or how it is reached. The internal full-listing flag, by contrast, is not counted among the ten, because it is the directory’s own marker rather than a field of business information; it is reported separately, as context.

The descriptive design and reproducibility

As in the companion study, the design is descriptive. The corpus is not a sample from a population of directories; it is the entire production database of one directory, analysed in full. The measures reported are therefore exact properties of the corpus rather than estimates, and no inferential test or confidence interval applies, because there is no sampling beyond the corpus itself.

The procedure is fully reproducible. Each measure is a count or a ratio of counts, computed by a deterministic join and a deterministic presence criterion; an analyst applying the same procedure to the same export would obtain the same figures. The rating and vote fields carried by the listings table were examined and, as in the companion study, excluded as unreliable on the evidence of default or seeded values; no claim in this paper rests on them. The study reports what the corpus is, with precision, and reserves interpretation for the discussion.

A final methodological note concerns comparability with the companion study. Both studies analyse the same export of the same corpus, taken on the same reference date, and both adopt the same descriptive design; the listing counts, the universe of 14,362, and the directory’s characteristics are therefore identical across the two. A reader moving between them is moving between two analyses of one body of data, not two separate datasets. This shared basis is what allows the interaction examined in the discussion, between completeness and category crowding, to be drawn without qualification.

Results

The results are presented in five parts: the field-fill rates across the corpus; the contrast between content fields and contact fields; the distribution of completeness per listing; the contact-completeness composite; and the full-listing flag. Each part is accompanied by the relevant figure or table, and every figure and table is referred to directly in the text.

Field-fill rates across the corpus

The central finding can be stated without qualification: the corpus is substantially incomplete. For most of the ten informational fields, a minority of listings carry a value, and for none of them does the fill rate approach completeness. Figure 2 shows the fill rate of each field, and Table 2 reports the same figures with the field groups identified.

Figure 2. Field-fill rates across all 14,362 listings. The content fields, title and description (shown in the accent colour), are filled for a little over half of listings; the address and contact fields for roughly a third; keywords for under a third.

**Table 2.** Field-fill rates across the corpus, by field group (universe: 14,362 listings).
Field	Group	Listings filled	Share
Title	Content	8,151	56.8%
Description	Content	8,138	56.7%
Keywords	Content	4,251	29.6%
Country	Address	4,931	34.3%
Company name	Address	4,800	33.4%
City	Address	4,632	32.3%
Postal code	Address	4,526	31.5%
Street	Address	4,452	31.0%
State / region	Address	4,425	30.8%
Telephone	Contact	4,335	30.2%
Full-listing flag set	Internal marker	8,761	61.0%

The pattern in Figure 2 and Table 2 is consistent. The six address fields and the telephone field are each filled for between 30.2% and 34.3% of listings; roughly a third of listings carry address and contact information, and roughly two-thirds carry none. The content fields fare better but not well: a title and a description are each present for a little over half of listings, and keywords for under a third. The most economical statement of the finding is that the average listing in this corpus has a name and a link, more likely than not a title and a short description, and, for most listings, no address and no telephone number.

The narrowness of the band into which the address and contact fields fall is itself worth noting. The seven address-and-contact fields range only from 30.2% to 34.3%, a spread of four percentage points across seven distinct fields. Such tight clustering suggests that these fields are, in practice, supplied together or not at all: a listing that has an address tends to have all of it, and a listing that lacks one such field tends to lack the rest. The contact composite, examined below, confirms this directly.

Content fields and contact fields

The data show a consistent unevenness between the two kinds of field: the content fields are filled roughly twice as often as the address and contact fields. The reason for the gap is open to reasoned inference. A title and a short description are close to the minimum needed to make a listing intelligible at all, whereas an address and a telephone number require whoever creates the listing to supply specific, verifiable, real-world detail; the path of least effort fills the content fields and stops short of the contact ones.

The cross-tabulation of two representative fields makes the relationship precise. Taking the title as a representative content field and the street as a representative address field, the 14,362 listings divide as follows: 35.4% have neither; 23.2% have both; 33.6% have a title but no street; and 7.8% have a street but no title. The largest of the partial groups is substantial, a third of all listings carry content but no address, and it shows that the two kinds of field are only loosely coupled. A listing in this corpus may describe a business adequately while giving no indication of where the business is or how to reach it, and a third of the corpus does precisely that.

The unevenness matters because the two kinds of field do different work for the searcher. A description tells the searcher what a business does; the address and telephone tell the searcher who and where it is and how to act. A listing strong on description and empty of contact detail describes a business that the searcher is then given no direct means of reaching, which, as the discussion argues, is a real cost borne by the business.

The cross-tabulation also has a constructive reading for a business. The single largest partial group, the third of listings with content but no address, consists of listings that are already half-built: they carry a title and a description, which means someone has attended to them once. For such a listing, completeness is not a matter of starting from nothing but of finishing what is begun, and the missing element is specifically the contact information. The data thus identify not only how many listings are incomplete but, for a large group of them, exactly which part is missing.

The distribution of completeness per listing

Field-fill rates describe each field in isolation; the distribution of completeness per listing describes the listings themselves. The mean listing carries 3.67 of the ten informational fields. The distribution behind that mean, however, is not concentrated around it; Figure 3 and Table 3 show that listings cluster at several distinct levels of completeness.

Figure 3. The distribution of the number of informational fields filled per listing. The distribution is multimodal, with peaks at zero fields (34.1% of listings), two fields (25.6%), and all ten fields (19.0%); the values between these three peaks are low.

**Table 3.** Distribution of the number of informational fields filled per listing (universe: 14,362 listings; mean 3.67 of 10).
Fields filled (of 10)	Listings	Share
0	4,895	34.1%
1	103	0.7%
2	3,677	25.6%
3	744	5.2%
4	192	1.3%
5	126	0.9%
6	249	1.7%
7	961	6.7%
8	128	0.9%
9	561	3.9%
10	2,726	19.0%

The distribution is multimodal. It has three clear peaks: at zero fields, where 34.1% of listings sit; at two fields, where 25.6% sit; and at ten fields, the fully complete listing, where 19.0% sit. The values between these peaks are low, with no single intermediate level holding even seven per cent of the corpus. A listing in this corpus tends, in other words, to belong to one of three recognisable kinds rather than to occupy a smooth continuum of partial completeness.

The two-field peak has a specific composition that the content composite confirms. A separate count of the three content fields shows that title and description behave almost as a single unit: the listings with exactly one of the three content fields number only thirteen in the entire corpus, while 27.1% have exactly two content fields and 29.6% have all three. The two-field peak in Figure 3 is therefore, in the main, the population of listings that carry a title and a description and nothing else: intelligible as descriptions of a business, but empty of every address and contact field. This kind of listing is the second most common in the corpus.

The multimodal shape tells you more than a single average could. A mean of 3.67 fields might suggest a typical listing that is moderately complete, carrying about a third of its fields. Figure 3 shows that almost no listing is of that description: the level around three to four fields is among the least populated in the distribution. The corpus is made up of listings that are mostly empty, listings that carry the content pair, and listings that are essentially complete, and the mean falls in a gap between these populations rather than describing any of them.

The contact-completeness composite

The contact composite isolates the four core fields that answer a searcher’s basic questions, country, company name, street, and telephone, and counts how many of them each listing carries. Figure 4 shows the distribution, and Table 4 reports it with cumulative shares.

Figure 4. The contact-completeness composite. Of four core contact fields, 64.6% of listings have none filled and 28.1% have all four; the intermediate values are sparse. The distribution is sharply U-shaped.

**Table 4.** The contact-completeness composite: number of four core contact fields filled (universe: 14,362 listings).
Core contact fields filled	Listings	Share	Cumulative
0	9,273	64.6%	64.6%
1	283	2.0%	66.6%
2	217	1.5%	68.1%
3	555	3.9%	71.9%
4	4,034	28.1%	100.0%

The contact composite is sharply U-shaped. Of the 14,362 listings, 9,273 (64.6%) have none of the four core contact fields filled, and 4,034 (28.1%) have all four. The three intermediate values together account for 7.4% of the corpus. A listing in this corpus is therefore, with respect to contact information, almost always at one extreme or the other: it is either substantially complete or substantially empty, and the partially completed listing is uncommon.

This U-shape echoes, in a stricter form, the multimodal distribution of Figure 3. Where the ten-field count showed three peaks, the four-field contact composite shows two, because it excludes the content fields that produce the middle peak. Read together, the two distributions describe the same underlying structure: the corpus does not contain listings of smoothly varying completeness but a small number of distinct populations, and the contact composite isolates the two that matter most for a searcher seeking to act, the complete and the empty.

Completeness summarised: the corpus in three figures

Before turning to the directory’s internal flag, it is useful to draw the results so far into a compact summary. Three figures capture the state of the corpus. The first is the content reach: a title and a description are present for a little under 57% of listings, so a slight majority of listings are at least intelligible as descriptions of a business. The second is the contact reach: the address and contact fields are present for roughly 31% of listings, so under a third of listings tell a searcher where the business is and how to reach it.

The third figure is the mean completeness, 3.67 of the ten informational fields. A mean below four, on a ten-field scale, states directly that the average listing is far closer to empty than to complete. Taken together, the three figures describe a corpus in which a slight majority of listings can be understood, under a third can be acted upon, and the typical listing carries rather more than a third of the information it could.

These summary figures should be read alongside the distributions that produced them, because the averages conceal the multimodal structure that Figures 3 and 4 revealed. The corpus is not a body of average listings; it is a mixture of distinct populations, and the mean of 3.67 is the arithmetic of that mixture rather than a description of any typical listing. The summary figures are offered as a way to hold the scale of the finding in mind, not as a substitute for the distributions.

The full-listing flag

The directory’s database carries an internal marker, the full-listing flag, which records whether a listing is treated as a full listing. The flag is set for 8,761 listings, or 61.0% of the corpus. It is reported here as context rather than as a completeness measure, and the comparison with the field-level findings is instructive.

At 61.0%, the full-listing flag sits above the fill rate of the content fields, near 57%, and well above the roughly 31% fill rate of the address and contact fields. The flag and the field-level measurement are therefore clearly not the same quantity. The most reasonable reading is that the flag reflects the directory’s own internal notion of listing type or tier rather than a field-by-field audit of completeness: a listing can carry the full-listing flag and still, on inspection of its fields, be missing its address and its telephone number. For the purpose of this study, how complete listings actually are, the direct field-level measurement is the more reliable guide, and the flag is recorded for completeness of reporting rather than relied upon.

Discussion

The results establish that the corpus is substantially incomplete, that its content fields are filled about twice as often as its contact fields, and that listings cluster at a few distinct levels of completeness rather than spreading along a continuum. The discussion now reads these findings: it names the finding in the vocabulary of data quality, explains why completeness matters for being found, sets out the cost an incomplete listing carries, considers why listings are left unfinished, examines how completeness interacts with the category crowding measured by the companion study, and draws the implications for businesses and for the directory.

Completeness as a data-quality dimension

The data-quality literature supplies the finding with its proper name and its proper measure. Completeness is an established dimension of data quality (Wang & Strong, 1996), and it can be assessed, as Pipino, Lee, and Wang (2002) set out, as the ratio of the values actually present to the values that a complete record would carry. By that measure the corpus scores low: a mean of 3.67 filled fields out of ten is a completeness ratio of roughly 0.37, and the contact composite, on which most listings hold none of four core fields, is lower still.

The data-quality frame also clarifies what such a finding means. Quality, in this literature, is fitness for the use to which data are put; a directory listing’s use is to inform a searcher, and a listing missing the fields a searcher needs is unfit for that use to the precise extent of what it omits, however accurate its present values may be. The frame carries one further reminder. Completeness is only one dimension of quality among several, including accuracy and currency, and this study measures only completeness; the corpus could be weaker still on the dimensions not measured here, which is why the presence criterion yields an upper bound rather than a flattering estimate.

It is worth being explicit about how completeness relates to the other dimensions of data quality. A record can be complete but inaccurate, or accurate but incomplete; the dimensions are distinct, and a deficiency on one is not offset by sufficiency on another. This study isolates completeness because it is the dimension the available data allow to be measured cleanly and exactly. A fuller data-quality assessment of the same corpus would add accuracy and currency, and the present study should be read as one component of such an assessment rather than as the whole of it.

Why completeness matters for being found

A directory listing is, in the terms of the economics of information, a body of information offered to a searching party (Nelson, 1970), and an incomplete listing simply offers less of it. The deficit operates at two distinct stages of discovery, and it is worth separating them.

At the stage of evaluation, a searcher judges a listing by its cues before committing attention to it. Pirolli and Card (1999) termed these cues information scent, and a listing’s title and description are exactly such cues; a listing without them emits little scent and is difficult for a foraging searcher to assess. At the stage of action, a searcher who has decided to contact a business needs the contact fields, and a searcher with transactional intent (Broder, 2002) who reaches a listing with no address and no telephone has been stopped at the moment of greatest need. Discovery systems compound the effect: industry guidance on local discovery holds that complete and consistent information helps a business be surfaced for relevant queries (Google, local-ranking guidance), so an incomplete listing is also harder to surface in the first place. Incompleteness, in short, makes a listing both harder to find and, once found, harder to act upon.

The two stages are worth holding apart because they imply different remedies. A deficit of content cues is remedied by supplying a genuine title and description, which improve the listing’s information scent and help a searcher decide it is worth examining. A deficit of contact fields is remedied by supplying the address and telephone, which give the searcher who has already decided to act the means of doing so. A listing can fail at either stage independently, and a business reviewing its listing benefits from knowing which stage its own gaps belong to.

What an incomplete listing costs a business

The cost of an incomplete listing is real, and it is for the most part invisible. Its largest component is the searcher who found the listing, discovered no way to act on it, and moved on; the business never learns of this person, because a contact that does not happen leaves no trace. The cost has no entry on any account, which is precisely why it is so easily overlooked by the business that bears it.

Two further components complete the cost. There is the discovery the listing did not receive, because the systems that surface listings had little information to work with. And there is the trust a sparse listing forfeits: a searcher comparing a complete listing with a bare shell reasonably reads the complete one as the more established and serious business, an inference that the signalling literature would recognise as rational (Spence, 1973). None of these costs is visible to the business, and all of them are borne, quietly and continuously, for as long as the listing remains unfinished. It is worth naming who bears the cost: not the directory, and not the searcher, who simply moves to a listing that serves them, but the business itself, the very party the listing exists to help.

It is fair to ask how large this cost is, and the honest answer is that this study cannot put a figure on it. What the study can establish is the direction of the cost and the party that bears it: the cost is real, it falls on the business, and it accrues quietly for as long as the listing remains unfinished. A cost that cannot be measured precisely is still a cost worth removing, and the case for removing it is strengthened by the fact, established here, that doing so is inexpensive.

Completeness, organic accumulation, and the absence of advertising

The provenance of the corpus, set out in the dataset section, bears on how the incompleteness should be read. Because the directory has never used paid advertising to acquire listings, the incompleteness documented here is not the residue of a promotional funnel in which listings are created as a by-product of an advertising transaction. It reflects, instead, how listings are filled when their creation is organic or editorial rather than paid.

This permits a reasoned supposition. It can be concluded that, absent a paid intake or a strongly prompting process, a large share of listings settle at minimal completeness, that the natural completeness of an unprompted listing is low. The multimodal distribution supports the supposition: the two low peaks, at zero fields and at two, are close to what an unprompted process would be expected to produce, and the high peak at ten fields is the minority of listings that were, for whatever reason, carried through to completion.

The editorial character of the directory refines the point rather than overturning it. Because about ninety per cent of entries are added through manual review, editorial work ensures that listings are relevant and correctly placed; but field-level completeness depends on business information that the editorial process cannot itself invent. The finding of low completeness is therefore, in part, a finding about unprompted human behaviour: it describes what listings look like when neither a payment nor a pointed request has obliged anyone to finish them.

Why listings are left unfinished

The data establish that listings are incomplete; they do not, by themselves, explain why. An explanation can be inferred, and the multimodal distribution of Figure 3 is itself the strongest clue available.

The peak at two fields is, as the results showed, in the main the population of listings carrying a title and a description and nothing else. This is the signature of a listing created with the minimum needed to be intelligible and never afterwards extended. The peak at zero fields is the signature of a listing registered as little more than a name and a link. Both peaks point to the same process: a listing created quickly begins life as a shell, and unless someone returns to complete it, a shell is what it remains. A business owner who created a listing and moved on has no natural prompting occasion to return, and does not see their own listing as a searching stranger sees it; the empty fields are, to the owner, invisible.

These are inferences and not findings, and they are presented as such. But they bear directly on the remedy. If listings are incomplete chiefly because they were created minimally and never revisited, then the problem is not a hard one but an unaddressed one, and what it requires is an occasion to revisit and a few minutes of attention. The distinction between a hard problem and an unaddressed one is the difference between a finding that discourages and one that does not.

The inference also reframes the role of the directory. If incompleteness is largely the result of listings created minimally and never revisited, then the directory holds one of the levers of the remedy, because it controls both the moment of creation and the occasions, if any, on which a business is invited back. The remedy for an unaddressed problem is to address it, and a directory is well placed to supply the prompt that an individual listing’s history otherwise lacks. This point is developed in the implications for the directory below.

The interaction with category crowding

This study measures the completeness of listings; its companion measures the crowding of the categories those listings occupy. The two properties interact, and Figure 5 maps the interaction as a conceptual matrix.

Figure 5. The interaction of listing completeness and category crowding. The visibility cost of an incomplete listing is not constant: it is modest in a thin category, where attention reaches most listings, and severe in a crowded category, where it does not.

The interaction follows from the divided-attention mechanism set out in the companion study. In a thin category, attention reaches most listings, so a sparse listing is still likely to be found, even though the searcher who finds it may then be unable to act. In a crowded category, attention reaches only the few listings that ordering and relevance bring to the front, and completeness is among the factors that determine which listings those are.

The consequence is that the visibility cost of an incomplete listing is not a constant. It is modest in a thin category and severe in a crowded one; the worst position in Figure 5, the cell where the cost is highest, is the sparse listing in the crowded category. The joint reading produces a sharper prescription than either study reaches alone: a business should assess two things together, how complete its listing is and how crowded its category is, because the urgency of completing the listing depends on both.

What a complete listing contains, and why the bar is low

Against a corpus of largely sparse listings, it is worth setting out plainly what a complete listing actually contains, because the standard turns out not to be a demanding one. A complete listing carries the business’s real name; its address, so that a searcher knows where the business is; a telephone number or another genuine means of contact, so that the searcher can act; and a title and a description that state, accurately and plainly, what the business does and for whom.

That is the whole of it. A complete listing is not an elaborate one. It requires no professional copywriting, no photography, and no expenditure; it requires only that the fields the listing already provides are filled with information that is true and current. The roughly 28% of listings that this study found to hold all four core contact fields had cleared exactly this modest bar, and no higher one.

Stating the standard plainly matters because it removes any suggestion that completeness is difficult. A business looking at its own sparse listing is not facing a hard or costly task; it is facing a short and inexpensive one, whose entire content is to supply true information for the fields that are empty. The gap between a sparse listing and a complete one, which this study has measured at the scale of the whole corpus, is at the scale of the individual listing a gap that an afternoon’s attention closes.

Implications for businesses

For a business that holds a listing, the practical reading of this study is, on balance, encouraging, because the finding points to something readily acted upon. If most listings in the corpus are substantially incomplete, then a complete listing is, by that very fact, distinguished; completeness is an unusually inexpensive way to stand apart from the roughly two-thirds of listings that carry no contact detail.

The concrete step is to examine one’s own listing field by field, and the data indicate where to look first. Because the address and contact fields are the ones most often missing, they are the ones a business should check before any other; the likeliest gap is the one the field-fill rates identify. The work asks no budget and no expertise, only the attention needed to enter true information once and keep it current, and it is foundational, in the sense that the other components of being found, a sound website, genuine reputation, all assume a listing that is actually complete. A business that has not finished its listing has not yet laid that foundation.

There is a sequencing point worth adding. Because completeness is foundational, it should be the first improvement a business makes to its presence in a directory, before effort is spent on anything more elaborate. A business that invests in refining its description while leaving its address and telephone blank has improved the part of the listing that was already adequate and left untouched the part that was failing. The order of work follows from the data: fill the empty contact fields first, because they are both the most often missing and the most consequential when absent.

Implications for the directory

Completeness is, in the first place, the business’s own responsibility, because the information in a listing is the business’s to supply. But the directory is not a passive party to it. A directory shapes completeness through how it gathers listings, since an intake process that asks clearly, and at the right moment, for the fields that matter will produce more complete listings than one that allows a bare-shell listing to be created and left; and through what it does afterward, since a periodic prompt to review a listing supplies the return visit that an unfinished listing otherwise never receives.

The multimodal distribution of Figure 3 indicates where directory effort would be best directed. The two-field population, the listings carrying a title and a description and nothing else, is large and clearly identifiable; it is the group most readily moved toward completeness, because it already shows evidence of having been worked on once. A finding that two-thirds of listings lack contact detail is therefore, in part, an account of how businesses behave and, in part, an invitation to the directory to examine its own intake and maintenance processes.

A further directory-side observation concerns measurement itself. A directory that does not measure the completeness of its listings cannot know which listings are sparse, nor track whether completeness is improving over time. The field-level method used in this study is inexpensive to run and fully repeatable, and a directory could apply it to its own corpus at regular intervals as a routine quality measure. What is measured can be managed, and a completeness figure reported and watched is the first instrument by which a directory could hold its own corpus to a standard.

Projections and future developments

This study is a snapshot of one corpus at one reference date, and it is designed to be repeated. The projections below are reasoned conjectures drawn from the observed pattern and the mechanisms discussed; they are not statistical forecasts, and they are marked as conjectures.

The first projection concerns the trajectory of completeness itself. Completeness can be projected to rise only slowly if the directory does not act on its intake and prompting processes, because the dominant cause of incompleteness inferred above, minimal creation followed by no return visit, is self-perpetuating; and it could rise more quickly if the directory does act, since the two-field population is a large and addressable group. Absent intervention, the multimodal shape, and in particular the U-shaped contact composite, can be projected to persist.

The second projection concerns the changing value of completeness. As retrieval-augmented and answer-composing systems increasingly mediate how businesses are found (Lewis et al., 2020; Aggarwal et al., 2024), discovery comes to depend on the structured information a listing carries, and a listing empty of address, contact detail, and genuine description offers such a system very little to act on. It may therefore be conjectured that the visibility gap between the complete listing and the bare shell will widen, not narrow, as discovery becomes more automated; the plain task of finishing a listing becomes more valuable, not less.

The third development is methodological. A future version of this study could examine completeness by category and by the age of a listing, asking whether the bare shells concentrate in particular industries or in particular periods of the directory’s history; the present study reports completeness across the corpus as a whole and leaves that refinement for future work. Repeating the analysis against future exports would, in any case, convert this snapshot into a longitudinal record, and the thinning of the empty end of the distribution would be the natural measure of progress.

The projections share, as in the companion study, a common character that should be stated. Each extrapolates from an observed pattern and a named mechanism; none is a quantitative forecast with an associated margin of error, and the deliberate language of projection and conjecture marks this. Their value is that a future export will test them: if completeness rises, if the U-shape softens, if the bare-shell population thins, the projections will have been borne out, and if not, they will be revised. The study is built to be repeated precisely so that such expectations can be checked rather than merely asserted.

Limitations of the study

The limitations follow from the design and are stated plainly. The study analyses a single corpus, the database of one directory, and is descriptive rather than inferential; it characterises that corpus and does not generalise to directories at large, and no significance testing is applied because there is no sampling beyond the corpus itself.

The central limitation is the presence criterion. The study measures whether a field holds a value, not whether that value is accurate, current, or well written; a listing counted as having a filled field may carry information that is wrong or out of date. As the methodology noted, this makes the reported figures an upper bound on completeness in any stricter sense, but it remains a genuine limitation, and a fuller study would assess accuracy and currency alongside presence.

A second limitation concerns interpretation rather than measurement. The study infers, from the multimodal distribution, why listings are left unfinished, and those inferences, minimal creation followed by no return visit, are reasoned but not directly evidenced; the data show the pattern of incompleteness without recording the intentions behind it. A study able to observe how listings are created and revised over time could test these inferences directly. They are offered here as the most plausible reading of the pattern, and they are labelled as inference throughout.

Several narrower limitations should also be recorded. The 123 listings with no address record at all are counted as lacking all address fields, which is the correct treatment for a study of completeness but is noted for transparency. The contact composite rests on a chosen subset of four fields, and a different subset would yield somewhat different figures.

The full-listing flag is the directory’s own internal marker, reported as context and not relied upon as a completeness measure. The rating and vote fields were excluded as unreliable. And the entire analysis reflects the single reference date of 25 May 2026; a different export date would yield somewhat different figures.

Concluding remarks

This study set out to measure how complete the listings in a curated business directory are, field by field, across the entire corpus. The central finding is that the corpus is substantially incomplete. The address and contact fields are filled for roughly a third of the 14,362 listings; the content fields, title and description, for a little over half; and the mean listing carries 3.67 of the ten informational fields examined.

Behind those averages lies a more specific structure. The distribution of completeness per listing is multimodal, with peaks at zero fields, at two fields, and at all ten; and the contact composite is sharply U-shaped, with 64.6% of listings holding none of four core contact fields and 28.1% holding all four. The corpus is best understood not as a body of listings of smoothly varying completeness but as a small number of distinct populations: the bare shell, the listing carrying only a title and a description, and the fully complete listing.

Read through the data-quality literature, the finding is one of low completeness on an established and measurable dimension; read through the economics of information, it is a finding that most listings offer a searcher less than a listing exists to offer. The cost of an incomplete listing is real but largely invisible, it is borne by the business itself, and, as Figure 5 set out, it rises with the crowding of the category the listing occupies. The practical conclusion is nonetheless an encouraging one: because so many listings are sparse, completing one’s own listing is both an inexpensive act and a genuinely distinguishing one. This study is the second in a connected series analysing the same corpus; a later synthesis will draw its findings together with those of the companion studies into a single account of the state of the directory’s listings.

A closing reflection concerns the standing of a study of this kind. An analysis of a directory’s own database, conducted and published by the directory, is at once authored by its subject and about its subject. That position is stated openly here, as it was in the companion study, because transparency about it is what allows the work to be read as research rather than as promotion. The methodology, the presence criterion, the exclusions, and the limitations have all been set out so that any reader may reproduce the figures and weigh the interpretation independently.

References

Aggarwal, P., et al. (2024). Improving search systems with large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 5, 16). Association for Computing Machinery.

Akerlof, G. A. (1970). The market for “lemons”: Quality uncertainty and the market mechanism. The Quarterly Journal of Economics, 84(3), 488, 500.

Broder, A. (2002). A taxonomy of web search. ACM SIGIR Forum, 36(2), 3, 10.

Darby, M. R., & Karni, E. (1973). Free competition and the optimal amount of fraud. The Journal of Law and Economics, 16(1), 67, 88.

Google. Improve your local ranking on Google. Google Business Profile Help. [Industry guidance, not peer-reviewed.]

Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (Vol. 33, pp. 9459, 9474).

Miller, R. B. (1968). Response time in man-computer conversational transactions. In Proceedings of the AFIPS Fall Joint Computer Conference (Vol. 33, pp. 267, 277).

Nah, F. F.-H. (2004). A study on tolerable waiting time: How long are web users willing to wait? Behaviour & Information Technology, 23(3), 153, 163.

Nelson, P. (1970). Information and consumer behavior. Journal of Political Economy, 78(2), 311, 329.

Nelson, P. (1974). Advertising as information. Journal of Political Economy, 82(4), 729, 754.

Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211, 218.

Pirolli, P., & Card, S. K. (1999). Information foraging. Psychological Review, 106(4), 643, 675.

Spence, M. (1973). Job market signaling. The Quarterly Journal of Economics, 87(3), 355, 374.

Stigler, G. J. (1961). The economics of information. Journal of Political Economy, 69(3), 213, 225.

Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5, 33.