The geography of a curated business directory: where 14,362 listings are located, and where they are not

Author. Gombos Atila Robert, PhD, Owner and Chief Executive Officer, Jasmine Business Directory (D-U-N-S 10-276-4189), Valley Cottage, New York. ORCID: 0000-0001-6468-2811. Correspondence through the author profile.

Data statement. The empirical material analysed in this study was taken directly from the production database of the Jasmine Business Directory. The directory has operated since 2009, has never used paid advertising to acquire listings, and adds about ninety per cent of its entries through manual editorial review. The analysed export is current as of 25 May 2026. This study is the third in a connected series; its companions look at category concentration and listing completeness in the same corpus.

Abstract

This study looks at where the businesses listed in a curated web directory are located and, just as much, at how many of them disclose a location at all. The data are the listings table of the Jasmine Business Directory joined to its address table; the directory has operated since 2009, has grown without paid advertising, and adds most of its entries through manual editorial review. The analysed export, current as of 25 May 2026, contains 14,362 listings. For each listing the study records the country, the state or region, and the city, normalising inconsistently entered values before analysis.

The first finding is about coverage. Only 4,931 listings, 34.3% of the corpus, carry a country, so a geographic analysis of this corpus necessarily describes the geo-tagged minority rather than the whole. Within that minority, the distribution is concentrated and strongly anglophone: the United States accounts for 55.9% of geo-tagged listings, the United Kingdom for 21.8%, and the three largest countries together for 82.5%, with ninety-one countries appearing in all. Among United States listings, California and Florida alone account for 31.1%.

The study reads these findings through the work on geographic information retrieval and local discovery (Jones et al., 2008), through the economics of search (Stigler, 1961), and through the data-quality treatment of coverage (Wang & Strong, 1996). It argues that the missing two-thirds of locations are themselves the most consequential finding, that a disclosed location is what makes a listing answerable to a geographically specific query, and it offers reasoned projections for the corpus’s geographic coverage.

Keywords. business directories; geographic distribution; local search; geographic information retrieval; data coverage; online visibility; business listings; geo-tagging; curated directory; country distribution; digital discovery; location data.

Introduction

A great deal of commercial activity is local. A person looking for a plumber, a dentist, a solicitor, or a restaurant is, in most cases, looking for one within a bounded distance, and the search is shaped from the start by where the person is. A business directory that records where its listed businesses are located can answer such a search; a directory that does not, cannot. The geographic information held against a listing is, in this sense, not a decorative detail but the property that connects a business to the large class of searches that begin with a place.

The question this study looks at has two parts, and the second matters as much as the first. The first part asks where the businesses in a curated directory are located: in which countries, in which regions, in which cities. The second part asks how many of the listings disclose a location at all, because a geographic analysis is only as representative as the share of the corpus that carries geographic data. The study treats both parts as findings, and it is careful, throughout, to separate what is true of the listings that carry a location from what is true of the corpus as a whole.

The approach is descriptive and quantitative. The complete listings table of the directory was joined to its address table; for each of the 14,362 listings the country, the state or region, and the city were recorded; inconsistently entered values were normalised, so that a state entered once as a full name and once as an abbreviation would be counted as one place; and the resulting distributions were characterised with standard measures of concentration. No survey and no experiment are involved. As with its companion studies, this is an exploratory empirical analysis of a single corpus, and its claims are set to that scope.

The data and their setting are the same as in the companion studies. The corpus is the production database of the Jasmine Business Directory, founded in 2009 and headquartered in Valley Cottage, New York. The directory has never used paid advertising to acquire listings, and it adds about ninety per cent of its entries through manual editorial review. The export analysed here was taken on 25 May 2026 and contains 14,362 listings. Because the corpus has accumulated organically and through editorial curation, the geographic pattern it shows is where listed businesses actually are, and which of them chose to disclose it, rather than the footprint of any advertising campaign.

An advertising-free corpus is, for a geographic study, an informative object for the same reason it was in the companion studies. Where listings are acquired through paid campaigns, their geographic spread partly shows where a budget was directed. Here, no budget intervened, and the geography measured is closer to where businesses that sought out the directory on their own are actually located. The corpus offers, within the limits of its coverage, a relatively clean view of an organically assembled geographic footprint.

Why the geography of a directory matters can be put at three levels. For the business that holds a listing, a disclosed location is what lets the listing be matched to a searcher seeking a provider nearby, and its absence quietly removes the listing from that entire class of searches. For the directory, the geographic spread of its listings, and the share of listings that carry a location at all, describe the reach and the usefulness of the asset it maintains. And for the wider understanding of digital discovery, the geography of an advertising-free corpus shows where businesses that list themselves without being paid to do so actually cluster. Each level is developed in the discussion.

The contribution of the study has three parts. It gives, first, a precise measurement of geographic coverage and geographic distribution in a substantial real-world directory. It reads, second, that measurement through the work on local discovery and the economics of search, and it argues that the incompleteness of geographic coverage is itself the most consequential finding. It offers, third, reasoned projections for how the corpus’s geography may change as discovery becomes more automated. The paper moves through a review of the relevant literature, a description of the dataset, the methodology, the results, the discussion, the projections, the limitations, and the concluding remarks.

The study draws on three bodies of work: the study of geographic intent in search and of local discovery, the economics of searching for a nearby provider, and the treatment of coverage as a dimension of data quality. Each is summarised here, and the section closes by stating the gap the study addresses.

Geography, local search, and discovery

A large share of what people search for has a geographic dimension, whether stated or implied. Jones, Zhang, Rey, Jhala, and Stipp (2008) studied geographic intent in web search and showed that a large fraction of queries carry a geographic component, a named place or an implied locality, and that recognising and serving that component is a distinct problem for a discovery system. A query for a service is often, in effect, a query for that service in a place.

The difference between a stated and an implied place matters here. Some queries name a location outright, as when a searcher types the name of a city alongside the service sought; others imply one, as when a searcher relies on a device’s known position to mean nearby. A directory listing must be matched against both kinds of geographic query, and in each case the match depends on the listing carrying location data of its own. The implication runs one way only: an explicit or implied place in a query is of no use if the listings have no place to be matched against it.

The implication for a business listing is direct. To be matched against a geographically specific query, a listing must carry the geographic information that the query can be matched to; a listing with no country, region, or city is, with respect to such queries, invisible. Broder’s (2002) taxonomy of search intent supports the point: a searcher with transactional intent who wants a local provider needs a result that is both relevant and correctly placed. Industry guidance on local discovery holds, in the same vein, that complete and consistent location information helps a business be surfaced for geographically relevant queries (Google, local-ranking guidance). Geographic data is, on this combined account, the hinge on which a listing’s part in local discovery turns.

The hinge comparison is worth taking seriously. A hinge is small relative to the door it carries, and the geographic fields are likewise small relative to a listing as a whole, three fields among many. But the door turns on the hinge, and a listing’s part in geographically specific discovery turns, the same way, on whether those few fields are filled. The gap between the size of the geographic fields and the consequence of their absence is why this study treats them apart from the listing’s other content.

The directory listing as a located object

A business directory is, among other things, a register of where businesses are. A listed business is not an abstraction; it occupies premises, serves an area, and exists somewhere, and a directory that records that somewhere does something close to the work of a gazetteer alongside its work as a catalogue. The geographic fields of a listing are where this register is held.

Location is, for most businesses, an intrinsic property rather than an optional attribute. A restaurant, a clinic, a law firm, a builder: each is constituted in part by where it operates, and a searcher’s interest in such a business is inseparable from where the searcher is. Even a business that trades online and ships widely has a base, a jurisdiction, and a place from which it operates. The geographic field records a property that the business already has; it does not invent one.

The three scales looked at in this study, country, region, and city, form a hierarchy of increasing precision, and each answers a different grain of geographic question. The country admits a listing to national searches and signals the jurisdiction in which a business operates; the region narrows it to a state or county; the city places it precisely enough for a local search. A listing that carries all three is locatable at every scale, and one that carries none is, geographically, an unplaced object; the study’s coverage finding measures how many listings fall into each condition.

The economics of finding a nearby provider

The economics of information explains why geography bounds so much searching. Stigler (1961) established that search is costly and that a searcher stops when the expected gain from further search no longer justifies the cost; for goods and services consumed locally, the cost of search is itself geographic, since a distant provider is, for many purposes, no provider at all. Nelson (1970) analysed how buyers acquire information about sellers, and for a locally consumed service the relevant set of sellers is bounded by place before anything else applies.

A directory listing that records a location lowers the geographic cost of search: it lets a buyer find a nearby provider without canvassing an area unaided. A listing that records no location cannot do that, however complete it may be in other respects. The geographic field is therefore not interchangeable with the other fields of a listing; it does a specific kind of work, and its absence has a specific consequence, namely the silent exclusion of the listing from every search that begins with a place.

The word silent in that sentence is doing real work. A business whose listing is excluded from local searches gets no notice of the searches it missed; the exclusion produces no error and no warning, only an absence of contacts that the business has no way to trace to a missing field. This silence is what lets the geographic gap persist: a problem that announced itself would be fixed, and a problem that does not is left.

Geographic coverage as a dimension of data quality

The data-quality literature supplies the vocabulary for the coverage question. Wang and Strong (1996) established completeness as a dimension of data quality, and Pipino, Lee, and Wang (2002) treated it as measurable by the ratio of values present to values that should be present. Applied to geography, this gives the notion of geographic coverage: the share of records in a corpus that carry a usable location. A corpus may be geographically informative or geographically sparse, and the data-quality frame makes that property something to be measured rather than assumed.

Coverage also bears on how a distribution should be read. Where only some records carry a location, the observed geographic distribution describes those records and not the corpus as a whole, and a concentration measure computed on the geo-tagged subset (Newman, 2005, gives the general treatment of such heavy-tailed distributions) characterises that subset alone. The data-quality literature thus does double duty in this study: it frames the coverage finding, and it warns, in advance, against generalising the distribution beyond the records that produced it.

This double role of the data-quality frame is worth holding onto through the rest of the study. When the results report a country distribution, the frame insists that the distribution belongs to the geo-tagged records; when the discussion reads that distribution, the frame insists that the reading respect the same boundary. The coverage caveat is not a single disclaimer to be made once and forgotten but a condition that qualifies every geographic statement the study makes.

The gap addressed by this study

These literatures are well developed, but they are seldom brought together on a precise, openly documented measurement of geographic coverage and distribution in a real, substantial, advertising-free directory. The local-discovery literature explains why geographic data matters; the economics of search explains why so much searching is geographically bounded; the data-quality literature explains why coverage must be measured before a distribution is read. This study occupies the point at which the three meet. It does not claim that its specific geographic figures hold for all directories; it claims to measure one well-defined corpus exactly, coverage caveat included, and to read that measurement in a way that is informative beyond itself.

The dataset: a curated directory’s location records

The empirical material for this study is the production database of the Jasmine Business Directory, the same corpus analysed in the companion studies of category concentration and listing completeness. The directory was founded in 2009 by Pecsi Andras and Robert GomboE(TM), is headquartered in Valley Cottage, New York, and has operated continuously since then as a general business directory organised by subject category.

Two features of the directory’s operation bear on how its geography should be read. The directory has never used paid advertising to acquire listings; its corpus has accumulated through organic submission and editorial addition. And it adds about ninety per cent of its entries through manual editorial review. The geographic pattern measured here is therefore the pattern of where listed businesses actually are, and of which businesses disclosed it, rather than the geographic footprint of a marketing budget.

The directory’s seventeen-year operating history bears on the geographic reading as well. The corpus is an accumulation reaching back to 2009, and the geographic pattern it shows is the cumulative result of that long period rather than a snapshot of recent listing activity. A market in which the directory became known early would, other things being equal, contribute more listings than one reached only recently, so the distribution carries the imprint of the directory’s history of reach as well as of its present reach.

The unit of analysis is the individual listing record, of which the export contains 14,362. Geographic information for a listing is held in the directory’s address table, which can record, against a listing, a country, a state or region, a city, a street, and a postal code. This study looks at the three fields that locate a business at successive scales, the country, the state or region, and the city, across all 14,362 listings. Table 1 summarises the dataset and the geographic fields examined.

**Table 1.** The dataset and the geographic fields examined.
Attribute	Value
Source	Jasmine Business Directory, production database
Directory founded	2009
Headquarters	Valley Cottage, New York
Listing-acquisition model	Organic submission and editorial addition; no paid advertising
Editorial curation	Approximately 90% of entries added through manual review
Export reference date	25 May 2026
Listings (universe of analysis)	14,362
Listings carrying a country	4,931 (34.3%)
Listings carrying a city	4,632 (32.3%)
Distinct countries represented	91
Geographic fields examined	Country, state/region, city

Methodology

The methodology is set out in four parts: the extraction of the geographic fields, the normalisation applied to inconsistently entered values, the coverage caveat and the two bases on which distribution figures are reported, and a note on the concentration measures and the descriptive design. The analytical framework, including the place of the coverage caveat within it, is shown in Figure 1.

Figure 1. The analytical framework of the study. A coverage check divides the corpus before any distribution is computed: the geographic distribution describes the 4,931 geo-tagged listings, while the 9,431 without a country are themselves the substance of the coverage finding.

Data extraction and the geographic fields

The dataset was obtained as a complete export of the directory’s production database on the reference date. Geographic information is held in the address table, which records location values against a listing identifier; the analysis joined that table to the listings table by the identifier, so that for every one of the 14,362 listings the country, state, and city values could be read. Where a listing has more than one address row, the values of a single representative row are used; where a listing has no address row, it is treated, correctly, as carrying no geographic information.

The three fields examined locate a business at three successive scales. The country places a business in one of the world’s nations; the state or region places it within a country; and the city places it within a region. A listing may carry any combination of the three, and the analysis records each independently, so that the coverage of each scale can be reported separately.

Recording the three scales independently is what makes the coverage finding precise rather than approximate. Had the study collapsed the three fields into a single yes-or-no judgement of whether a listing has a location, it could not have shown that country coverage and city coverage are close to one another, nor that state coverage among United States listings is high. The separate treatment of the scales is a small methodological choice that the results section turns to direct use.

Normalising the country and state fields

Geographic fields entered by many hands over many years are not entered consistently, and a normalisation step was needed before the values could be counted. The country field carried the same nation under several spellings: the United States appeared as USA, US, United States, and other variants, and the United Kingdom appeared as UK, United Kingdom, and Great Britain. These variants were mapped to a single normalised code per country, so that a nation entered under several names would be counted once.

The state field needed the same treatment, and more of it. Within United States listings, a state appeared sometimes as a full name and sometimes as a two-letter abbreviation: California and CA, for example, both occur in the data. A mapping between full names and abbreviations was applied so that each state would be counted once whichever form had been entered; values that matched neither a recognised name nor a recognised abbreviation were set aside under an explicit unrecognised category rather than dropped silently. This normalisation is the one substantive cleaning step in the study, and it is recorded here so that the treatment of the geographic fields is fully transparent.

One point about the normalisation should be made explicit so that it is not mistaken for something it is not. The normalisation changes how values are counted, not which listings are counted as geo-tagged; a listing whose country was entered as USA rather than US was always a geo-tagged listing, and the normalisation merely makes sure it is tallied under the same heading as a listing entered as US. The coverage figure of 34.3% is therefore untouched by the normalisation; what the normalisation affects is the accuracy of the country and state distributions.

The coverage caveat and the two bases of analysis

The single most important methodological point in this study is about coverage. Of the 14,362 listings, only 4,931 carry a country; the remaining 9,431 do not. Any geographic distribution computed from this corpus therefore describes, of necessity, the 34.3% of listings that disclose a location, and not the corpus as a whole. This is not a defect to be apologised for and worked around; it is a finding in its own right, and it is treated as the first result of the study.

To keep the reader continuously aware of the distinction, distribution figures in this study are reported on two bases wherever both are informative. The first base is the geo-tagged subset: the share of the 4,931 located listings that fall in a given country or region. The second base is the whole corpus: the share of all 14,362 listings. A country holding a large share of the geo-tagged subset may hold a much smaller share of the whole corpus, and reporting both prevents the geo-tagged minority from being mistaken for the entire directory.

What a recorded location is taken to mean

A short definitional note is warranted, because the coverage finding depends on what the study counts as a listing having a location. The study takes the country field as the primary marker of geographic coverage: a listing is treated as geo-tagged if it carries a country, and the headline coverage figure of 34.3% is the share of listings that do.

The country was chosen as the primary marker for a specific reason. A listing that carries a country but no city still has a coarse but usable location, enough at least for national searches; a listing that carries a city but no country is geographically ambiguous, because many city names recur across countries and a city without a country cannot be placed with confidence. The country is therefore the most reliable single indicator of whether a listing is locatable at all, and it is the natural basis for the coverage measure.

The other two scales are not thereby ignored. The study reports the coverage of the country, the region, and the city separately, so that a reader can see how far each scale is filled, and it reports the distribution across all three. The country merely serves as the gate for the headline coverage figure; the finer scales are described in their own right in the results.

Concentration measures and the descriptive design

The distributions are characterised with the same family of measures used in the companion studies: counts, shares, and cumulative shares of the largest countries and regions. These describe how concentrated the geographic distribution is: whether the located listings spread across many places or cluster in a few.

As in the companion studies, the design is descriptive. The corpus is the entire production database of one directory, analysed in full; the measures reported are exact properties of that corpus rather than estimates, and no inferential test applies, because there is no sampling beyond the corpus itself. The procedure can be reproduced: each figure is a count or a ratio of counts, computed by a deterministic join, a deterministic normalisation, and a deterministic tally. The study reports what the corpus is, with precision, and holds interpretation for the discussion.

Results

The results are presented in five parts: the geographic coverage of the corpus; the distribution of located listings across countries; the concentration of that distribution and its anglophone weighting; the distribution across United States listings; and city-level concentration. Each part comes with the relevant figure or table, and every figure and table is referred to directly in the text.

Geographic coverage: how many listings carry a location

The first result is the coverage finding, and it conditions everything that follows. Of the 14,362 listings, 4,931 carry a country; that is 34.3% of the corpus. A similar share, 4,632 listings or 32.3%, carry a city. Roughly two listings in three, then, disclose no location at all: neither the country a business operates in nor the city it operates from. Figure 4 sets out the coverage as a funnel.

Figure 4. The geographic-coverage funnel. Each stage is a subset of the one above it; the widths narrow to reflect the successive reductions. The decisive narrowing is the first: only about a third of the corpus carries a country at all.

The funnel makes the structure of the analysis visible. The corpus of 14,362 narrows at the first stage to the 4,931 listings that carry a country, and it is this subset, and only this subset, that the distribution analysis can describe. The later stages narrow further: of the geo-tagged listings, 2,754 are located in the United States, and of those, 2,613 carry a state value that the normalisation could place. The rest of this section describes the geo-tagged subset; the coverage gap itself is taken up again in the discussion, where it is argued to be the most consequential of the study’s findings.

It is worth pausing on the symmetry of the coverage figures across two of the three scales. That 34.3% of listings carry a country and 32.3% carry a city is not a coincidence of two unrelated measurements; it reflects, as a later section argues, the tendency of a listing’s geographic fields to be filled together. The funnel of Figure 4 should therefore be read as narrowing chiefly at its first stage: once a listing has any geographic data, it tends to have most of it.

Coverage across the three geographic scales

Before the distribution of located listings is examined, it is worth setting the coverage of the three geographic scales side by side. The country is carried by 34.3% of listings; the city by 32.3%; and a state value, among United States listings, by 94.9% of those listings. The first two figures are close to one another, and the closeness is itself informative.

A corpus in which country coverage and city coverage are within two percentage points of each other is a corpus in which the geographic fields are, in practice, supplied together or not at all. This echoes a finding of the companion study on completeness, which showed that the address fields of a listing move as a group: a listing that has one address field tends to have them all, and a listing that lacks one tends to lack the rest. The geographic data of this corpus is, in that sense, close to an all-or-nothing property of a listing.

The high state coverage among United States listings, at 94.9%, is consistent with the same pattern read from the other direction. Once a listing has crossed the threshold of carrying geographic information at all, it tends to carry that information at every scale; the state value is missing for only about one United States listing in twenty. The geographic question for this corpus is therefore not which scale a listing records but whether it records any geography at all, and the coverage finding answers that question.

The distribution across countries

Within the 4,931 located listings, ninety-one distinct countries appear. The distribution across them is uneven. Figure 2 shows the twelve countries with the most listings, and Table 2 reports the same figures on both bases of analysis, as a share of the geo-tagged subset and as a share of the whole corpus.

Figure 2. The twelve countries holding the most listings, as a share of the 4,931 geo-tagged listings. The United States and the United Kingdom together hold more than three-quarters of located listings; the descent below the fourth country is steep.

**Table 2.** Country distribution of geo-tagged listings, on two bases (geo-tagged subset: 4,931; whole corpus: 14,362).
Country	Listings	% of geo-tagged	% of whole corpus
United States	2,754	55.9%	19.2%
United Kingdom	1,077	21.8%	7.5%
Canada	238	4.8%	1.7%
Australia	230	4.7%	1.6%
China	49	1.0%	0.3%
South Africa	38	0.8%	0.3%
India	37	0.8%	0.3%
Singapore	33	0.7%	0.2%
United Arab Emirates	31	0.6%	0.2%
New Zealand	31	0.6%	0.2%
Germany	27	0.5%	0.2%
Ireland	23	0.5%	0.2%
Other 79 countries (combined)	363	7.4%	2.5%

The United States holds 2,754 of the geo-tagged listings, which is 55.9% of that subset; the United Kingdom holds 1,077, or 21.8%. After these two, the distribution falls away sharply: Canada and Australia hold 4.8% and 4.7% respectively, and no other country reaches even 1.5%. The eighty-seven countries from China downward hold, between them, under a tenth of the geo-tagged subset. Read on the second basis, against the whole corpus, the same figures are smaller in the same proportion: the United States accounts for 19.2% of all 14,362 listings, and the United Kingdom for 7.5%, the rest of the corpus being listings with no country at all.

The two bases of analysis are worth keeping distinct precisely here, where the figures are largest. On the geo-tagged basis, the United States is a clear majority of the located corpus; on the whole-corpus basis, it is under a fifth of all listings, and the United Kingdom under a thirteenth. Neither basis is the correct one to the exclusion of the other: the first describes the located listings accurately, the second places those listings in the context of a corpus that is mostly unlocated. A reader should hold both figures in view at once.

Concentration and the anglophone weighting

The geographic distribution is, on the evidence of the cumulative shares, highly concentrated. The single largest country holds 55.9% of geo-tagged listings; the three largest hold 82.5% between them; the five largest hold 88.2%; and the ten largest hold 91.6%. The eighty-one countries beyond the tenth share the remaining 8.4%. A geographic distribution can hardly be more concentrated than one in which two countries hold over three-quarters of the located listings.

The concentration has a clear linguistic character, and it is worth naming. The six predominantly anglophone countries in the distribution, the United States, the United Kingdom, Canada, Australia, New Zealand, and Ireland, hold 4,353 of the 4,931 geo-tagged listings between them, which is 88.3% of the located corpus. The directory’s located listings are, in other words, overwhelmingly in the English-speaking world. This fits with the directory operating in English and serving, in the main, an English-language audience; the geographic pattern reflects the directory’s own linguistic and editorial reach, and the discussion returns to this reading.

The anglophone weighting is visible in the figure as well as in the totals. The bars rendered in the accent colour in Figure 2, the anglophone countries, are not scattered through the ranking but occupy, with few exceptions, its upper reaches; the United States, the United Kingdom, Canada, and Australia are the four largest countries in the corpus. The concentration is therefore not merely numerical but ordered: the located corpus is led, at nearly every one of its top positions, by an English-speaking country.

The distribution across United States listings

Because the United States holds the majority of geo-tagged listings, its internal distribution can be examined in its own right. Of the 2,754 United States listings, 2,613 carry a state value; 141, or 5.1% of United States listings, carry a country but no state. Figure 3 shows the fourteen states with the most listings, and Table 3 reports them with the unallocated remainder.

Figure 3. The fourteen United States states with the most listings, as a share of the 2,754 United States listings. California and Florida lead by a clear margin; the remaining states descend along a gradual slope.

**Table 3.** State distribution of United States listings (base: 2,754 US listings).
State	Listings	Share of US listings
California	477	17.3%
Florida	381	13.8%
Texas	202	7.3%
New York	185	6.7%
Illinois	116	4.2%
Georgia	91	3.3%
New Jersey	72	2.6%
Pennsylvania	66	2.4%
Arizona	64	2.3%
Massachusetts	62	2.3%
Colorado	58	2.1%
Washington	57	2.1%
North Carolina	57	2.1%
Ohio	54	2.0%
Other states (combined)	671	24.4%
No state given	141	5.1%

Within the United States, California and Florida lead clearly: California holds 477 listings, or 17.3% of United States listings, and Florida holds 381, or 13.8%, so the two states together account for 31.1% of all United States listings. Texas and New York follow at 7.3% and 6.7%, and below them the distribution descends along a gradual slope, with a long tail of less-represented states. The state-level distribution is, like the country-level one, concentrated, though less extremely so: the four largest states hold 45.1% of United States listings between them, where the four largest countries held 87.2% of the geo-tagged subset.

The sizeable unallocated remainder in Table 3 deserves a brief note. The 141 United States listings that carry a country but no state, and the broad band of states below the fourteenth, together hold a substantial share of United States listings, and they are a reminder that even within the located corpus the geographic data thins as the scale becomes finer. A listing may be placed in a country with confidence and yet not be placed in a state, and the state distribution is, accordingly, slightly less complete than the country distribution that contains it.

City-level concentration

The third geographic scale is the city, and 4,632 listings, 32.3% of the corpus, carry a city value. The city field is the noisiest of the three, because city names are entered freely and the same place can appear under several spellings, so the city counts should be read as indicative rather than exact. Table 4 reports the twelve cities most frequently recorded.

**Table 4.** The twelve cities most frequently recorded (base: 4,632 listings carrying a city; counts indicative).
City	Listings
London	190
New York	88
Los Angeles	65
Tampa	55
Chicago	51
Houston	50
Atlanta	49
Austin	43
Las Vegas	42
Miami	38
Orlando	38
Beverly Hills	38

London is the single most frequently recorded city, with 190 listings, ahead of New York at 88 and Los Angeles at 65. The prominence of London fits with the United Kingdom being the second-largest country in the corpus, and the United States cities below it, New York, Los Angeles, Tampa, Chicago, Houston, are large metropolitan centres or, in the cases of Tampa and Orlando, cities in Florida, the second-ranked United States state. No single city dominates the way the United States dominates the country distribution; the city counts spread across many places, and even London, the largest, accounts for only about four per cent of the listings that carry a city. The city scale, in short, is the least concentrated of the three and also the least completely recorded, and its counts are best read as a broad indication of where located listings cluster rather than as a precise census.

One pattern in the city data is nonetheless worth drawing out, because it is robust to the noise. The cities that appear most often are, almost without exception, large metropolitan centres in the countries that lead the country distribution: London in the United Kingdom, and New York, Los Angeles, Chicago, and Houston in the United States. The city scale, for all its inconsistency, tells the same story as the country and state scales: the located listings of this corpus cluster in the major urban centres of the anglophone world, and they thin quickly beyond them.

Discussion

The results establish that only about a third of the corpus carries a location, that the located listings are concentrated in a few countries and strongly weighted toward the English-speaking world, and that within the United States the listings cluster in a small number of states. The discussion now reads these findings: it sets out how a geographic distribution should be read when most of the corpus is not geo-tagged, considers why the located corpus is anglophone, examines what the missing locations omit, explains why geographic data matters for discovery, relates the geographic finding to the companion studies, and draws the implications for businesses and for the directory.

Reading geography from an incompletely geo-tagged corpus

The coverage finding governs every other geographic claim in this study, and the first task of the discussion is to state precisely what may and may not be concluded from a corpus that is two-thirds unlocated. What may be said is that, among the listings that disclose a location, the distribution is the one the results report. What may not be said is that the corpus as a whole has that distribution, because the corpus as a whole is, in geographic terms, mostly silent.

A sharper question is whether the geo-tagged listings are a representative sample of the rest, and the honest answer is that they almost certainly are not. The companion study of listing completeness found that the address and contact fields are filled together or not at all; a listing that carries a country is, by that finding, a listing that tends also to carry the rest of its address and to be more complete in general. The geo-tagged subset is therefore not a random third of the corpus but, in effect, the more complete third. The geographic distribution reported here is the distribution of the more complete listings, and it should be read as such: an accurate account of where the located, fuller listings are, and not a census of the whole directory.

This qualification does not empty the geographic finding of value; it sharpens what the finding is a finding about. An accurate account of where the located, fuller listings cluster is genuinely useful: it describes the part of the corpus that is actually working as a discovery instrument, and it identifies the markets in which the directory’s usable listings are concentrated. The qualification asks only that this account not be silently promoted into a claim about the corpus as a whole, two-thirds of which the geographic data does not describe.

Why the located corpus is anglophone

The most striking single feature of the distribution is its linguistic concentration: 88.3% of located listings are in the six predominantly anglophone countries. A reasoned explanation is available, and it is about the directory rather than the world.

The Jasmine Business Directory operates in English. Its category names, its editorial process, and the audience it has accumulated since 2009 are English-language. A business is more likely to list itself in a directory whose interface it reads comfortably and whose users are the customers it wants, and an English-language directory is, for that reason, more visible and more useful to an anglophone business than to one operating in another language. It can therefore be concluded, as a reasoned supposition, that the anglophone concentration reflects the directory’s own language and reach rather than any claim about where commercial activity occurs in the world. The directory is, in geographic terms, a window onto the businesses of the anglophone web; a business in a non-anglophone market is underrepresented here not because it is less significant but because the directory is less visible to it.

It follows that the anglophone weighting should not be read as a judgement about the relative size or importance of markets. A great deal of commercial activity occurs in languages other than English, and the near-absence of those markets from this corpus reflects the directory’s reach rather than their scale. A directory operating in another language, all else equal, would be expected to show a quite different geographic distribution, weighted toward the markets its own language serves. The distribution measured here is, in that sense, a property of this directory and not a finding about the world’s businesses.

What the missing locations omit

The 9,431 listings that carry no country are not an empty space in the data; they are real businesses whose location the directory does not record. It is worth being concrete about what their missing locations cost. Each such listing is excluded from every search that begins with a place, because there is no place to match it against; the directory cannot route a searcher looking for a nearby provider to a listing that does not say where it is; and for the directory as a whole, the geographic value of the asset is realised for about a third of the corpus and stays latent for the other two-thirds.

The missing locations are, moreover, not an isolated problem. The companion study established that the listings without contact fields are, in the main, the sparse listings, and the country field is one of those contact fields. It follows that the geographic-coverage gap documented here and the completeness gap documented in the companion study are, to a large extent, the same gap viewed from two angles. It can be concluded that improving geographic coverage and improving listing completeness are not two separate tasks but one: the listing that gains a country is, as a rule, the same listing that gains an address and a telephone.

This identity between the two gaps has a practical upside. A business or a directory that sets out to improve listing completeness will, in the same act, improve geographic coverage, because the address fields that completeness requires include the country and the city that coverage requires. There is no need to choose between the two tasks or to sequence them; the work that closes one gap closes the other, and a single effort, finishing the sparse listings, advances both findings of this series at once.

Geography, local search, and being found nearby

Why a disclosed location matters for discovery can be stated through the literature on local search, and Figure 5 sets out the mechanism.

Figure 5. A place-based search can only reach listings that carry a place. A listing with no location is excluded from the search at the first step, regardless of how complete or well written it is in every other respect.

A large share of searching is geographic: Jones et al. (2008) showed that a large fraction of web queries carry a geographic component, whether a named place or an implied locality. A search of that kind can only be matched against listings that themselves carry a place; a listing with no location is, as Figure 5 shows, excluded at the first step. Stigler’s (1961) account of search applies directly: for a service consumed locally, the cost of search is geographic, and a located listing lowers that cost in a way an unlocated one cannot. A searcher with transactional intent seeking a nearby provider (Broder, 2002) needs a result that is correctly placed, and industry guidance on local discovery holds, consistently, that complete location information helps a business be surfaced for geographically relevant queries (Google, local-ranking guidance).

There is, in addition, a signalling dimension. A business that states plainly where it is presents itself as locatable and accountable, and a searcher may reasonably read a disclosed, specific address as a mark of a more established business, in the manner the signalling literature describes (Spence, 1973). For both reasons, eligibility for local matching and the signal that disclosure sends, the geographic fields are not simply three fields among a listing’s many. They are the fields that admit a listing to an entire mode of discovery, and their absence closes that mode off.

The phrase a mode of discovery is meant precisely. Place-based searching is not a minor subset of how businesses are found; for locally consumed goods and services it is the dominant mode, and a listing excluded from it is excluded from the principal way its customers would look for it. To say that the geographic fields admit a listing to that mode is therefore to say that, for a great many businesses, those three fields are the difference between being discoverable by the relevant audience and not being discoverable by it at all.

The interaction with completeness and category crowding

This study is the third analysis of a single corpus, and its finding is best understood alongside the two that precede it. The companion study of completeness measured how many of a listing’s fields carry values; this study looks at three of those fields in particular, the geographic ones, and finds that they are among the fields most often empty. Geographic coverage is, in that sense, a particular case of the completeness problem, and the two findings reinforce each other.

The first companion study, on category concentration, adds the third dimension. A business competing in a crowded category, against many others, and serving a local market, needs both to be complete enough to be selected and to carry a location specific enough to be matched to its locality. The three studies converge on a single composite picture: a listing that is complete, that discloses where it is, and whose owner understands how crowded its category is, is positioned to be found; a listing that is sparse, unlocated, and in a crowded category is, on all three counts, positioned not to be. The geographic finding is the third leg of that account, and it is no more separable from the other two than a location is separable from the business it locates.

The geography of an organically grown directory

The provenance of the corpus, as in the companion studies, bears on how the geographic distribution should be read. Because the directory has never used paid advertising to acquire listings, the country distribution is not the footprint of a marketing campaign that chose to target particular national markets. No budget steered listings toward the United States or away from elsewhere; the distribution is, instead, where the businesses that found the directory organically happen to be located.

This permits a reasoned supposition. It can be concluded that the geographic pattern is a reasonably genuine map of the directory’s organic reach, of the markets in which an English-language directory, operating since 2009 without paid promotion, has become known and used. The concentration in the United States and the United Kingdom is, on this reading, a measure of where the directory’s organic visibility is strongest, and the long tail of lightly represented countries is a measure of how thinly that visibility extends beyond the anglophone core.

One qualification keeps the supposition honest. The directory adds most of its entries through editorial review, and an editorial team operating in English will, in the natural course of its work, meet and add anglophone businesses more readily than others. The geographic distribution is therefore shaped both by where businesses found the directory and by where the directory’s editors looked; the term organic should be read, here as elsewhere in this series, as free of paid distortion rather than free of all human shaping.

Implications for businesses

For a business that holds a listing, the practical reading of this study is direct. If the business serves customers who are, in any degree, local, and most businesses do, then a listing that discloses no location forfeits the entire class of searches that begin with a place, and it forfeits them silently, because a search that never matches the listing leaves no trace the business could notice.

The remedy is, once again, inexpensive. It consists of making sure that the country, the region, and the city are present in the listing and entered accurately; the work is a matter of minutes and needs no budget. Even a business that serves a wide area, or operates online, gains from stating a base location, because doing so makes the listing eligible for regional searches and presents the business as locatable rather than placeless. The geographic fields should be regarded, alongside the contact fields identified in the companion study, as part of the foundational minimum that a listing must carry before any more elaborate effort at visibility is worthwhile.

Coverage, not distribution, is the actionable finding

It is worth distinguishing, before the directory-side implications are drawn, between the two kinds of finding this study reports, because they differ in what can be done about them. The distribution, that the located listings concentrate in the United States, the United Kingdom, and a few other countries, is informative, but it is not, for any single party, actionable. A business cannot alter the country distribution of a directory, and a directory cannot relocate the businesses it lists.

The coverage finding is different. That two-thirds of listings carry no location is a condition that can be changed, and changed by exactly the parties this study addresses. A business can add its own location to its own listing; a directory can ask for location at intake and prompt for it afterward. The coverage gap is, in other words, the part of the geographic picture that responds to action, and it is therefore the part on which the implications concentrate.

This is why the study has insisted, from its first result onward, that the coverage gap is the most consequential finding. It is consequential not only because it conditions the distribution but because it is the finding a reader can act on. The distribution describes a state of affairs; the coverage gap describes a task.

Implications for the directory

For the directory, the coverage finding is both a measurement of the asset and an agenda for improving it. A directory in which two-thirds of listings carry no location is a directory realising only a fraction of its potential value for local discovery, and the gap is large enough that closing even part of it would materially increase the directory’s usefulness for the geographically specific searches that Figure 5 describes.

Two levers are available. The first is intake: an intake process that asks clearly for a location at the moment a listing is created will produce more located listings than one that lets a placeless listing be entered and left. The second is normalisation at the point of entry: the inconsistency this study had to correct after the fact, the same country and the same state entered under several forms, could be prevented at intake by offering standardised choices rather than free text. A directory that both asks for location and records it consistently would, over time, raise the coverage figure that this study has measured and improve the quality of the geographic data behind it.

There is a measurement point for the directory here as well, parallel to the one made in the companion study. A directory that does not track its own geographic coverage cannot know whether that coverage is improving, and the coverage figure reported in this study, 34.3%, is exactly the kind of measure a directory could compute routinely and watch over time. A single number, recomputed at each export and reported alongside the completeness measures of the companion study, would give the directory a standing instrument for holding its own corpus to account.

Projections and future developments

This study is a snapshot of one corpus at one reference date, and it is designed to be repeated. The projections below are reasoned conjectures drawn from the observed pattern and the mechanisms discussed; they are not statistical forecasts, and they are marked as conjectures.

The first projection is about geographic coverage. Coverage will probably rise only slowly if the directory does not change how it gathers location data, since the cause of the gap, listings created without a location and never revisited, is the same self-perpetuating pattern identified for completeness in general; and coverage could rise more quickly if the directory acts on intake and prompting. The anglophone weighting, by contrast, can be projected to persist for as long as the directory operates in English, because that weighting reflects the directory’s own reach rather than a transient condition.

The second projection is about the changing value of a disclosed location. As retrieval-augmented and answer-composing systems increasingly mediate discovery (Lewis et al., 2020; Aggarwal et al., 2024), and as such systems are asked place-based questions, the structured location data a listing carries becomes the material from which a geographically specific answer is composed. A listing with no location offers such a system nothing to place, and it may therefore be conjectured that the visibility gap between located and unlocated listings will widen as discovery becomes more automated, just as the companion study projected for completeness in general.

The third development is methodological. A future version of this study could examine geographic coverage by category and by the age of a listing, asking whether located listings concentrate in particular industries, those whose businesses are most strongly local, or in particular periods of the directory’s history. This study reports geography across the corpus as a whole and leaves that refinement to future work. Repeating the analysis against later exports would, in any case, convert this snapshot into a longitudinal record, and a rising coverage figure would be the natural measure of progress.

The projections of this study, like those of its companions, are extrapolations from an observed pattern and a named mechanism rather than quantitative forecasts, and the deliberate language of projection and conjecture marks the difference. Their purpose is to be tested. A later export will show whether coverage has risen, whether the anglophone weighting has held, and whether the gap between located and unlocated listings has widened as discovery has become more automated; the study is built to be repeated so that these expectations can be checked against evidence rather than left as assertions.

Limitations of the study

The limitations follow from the design and are stated plainly. The study analyses a single corpus, the database of one directory, and is descriptive rather than inferential; it characterises that corpus and does not generalise to directories at large, and no significance testing is applied because there is no sampling beyond the corpus itself.

The central limitation is the one made prominent throughout: geographic coverage. Only 34.3% of listings carry a country, and the geo-tagged subset is, as the discussion argued, very likely the more complete third of the corpus rather than a representative sample of it. The geographic distribution reported here therefore describes the located, fuller listings, and a reader must not extend it to the corpus as a whole. This is a genuine limitation of what the data can support, and it is also, restated, the study’s first finding.

A further interpretive limitation concerns the inference, drawn in the discussion, that the geo-tagged subset is the more complete third of the corpus. That inference rests on the companion study’s finding that address fields move together, and it is well supported, but it is an inference across two studies rather than a direct measurement within this one. A combined analysis, examining completeness and geographic coverage listing by listing in a single pass, could establish the relationship directly, and it is noted here as a natural extension of this work.

Several narrower limitations should also be recorded. The country and state fields were normalised, but the city field was not, so the city counts in Table 4 are indicative rather than exact, and the same place may be split across spelling variants. Where a listing has more than one address row, a single representative row was used, so a business operating in several places is recorded at one of them. And the entire analysis reflects the single reference date of 25 May 2026; a different export date would yield somewhat different figures.

Concluding remarks

This study set out to measure where the businesses listed in a curated directory are located, and how many of them disclose a location at all. The answer to the second question frames the answer to the first. Only 4,931 of the 14,362 listings, 34.3%, carry a country; roughly two listings in three disclose no location.

Within the located minority, the distribution is concentrated and strongly anglophone. The United States holds 55.9% of geo-tagged listings and the United Kingdom 21.8%; the three largest countries hold 82.5% between them; ninety-one countries appear in all; and the six predominantly anglophone countries hold 88.3% of located listings. Within the United States, California and Florida alone account for 31.1% of listings, and London is the single most frequently recorded city. The anglophone concentration is best read, as the discussion argued, as a reflection of the directory’s own English-language reach rather than as a map of world commerce.

The most consequential finding, however, is the coverage gap itself. A listing that discloses a location can be matched to the large class of searches that begin with a place; a listing that does not is excluded from those searches at the first step, however complete it may be in every other respect. The geographic-coverage gap is, moreover, largely the same gap that the companion study measured as a deficit of listing completeness, seen from a different angle. For a business, the implication is to disclose its location, plainly and accurately, as part of the foundational minimum a listing must carry; for the directory, the implication is that intake and normalisation are the levers by which the coverage figure can be raised. This study is the third in a connected series analysing the same corpus; a later synthesis will draw its findings together with those of the companion studies into a single account of the state of the directory’s listings.

A closing reflection concerns, as in the companion studies, the standing of a study of this kind. An analysis of a directory’s own database, conducted and published by the directory, is at once authored by its subject and about its subject, and the geographic finding is in one respect uncomfortable for that subject, since it records that two-thirds of the directory’s listings carry no location. Reporting it plainly is what the descriptive design requires, and the methodology, the normalisation, and the coverage caveat have all been set out so that any reader may reproduce the figures and weigh the interpretation independently.

References

Aggarwal, P., et al. (2024). Improving search systems with large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 5, 16). Association for Computing Machinery.

Akerlof, G. A. (1970). The market for “lemons”: Quality uncertainty and the market mechanism. The Quarterly Journal of Economics, 84(3), 488, 500.

Broder, A. (2002). A taxonomy of web search. ACM SIGIR Forum, 36(2), 3, 10.

Google. Improve your local ranking on Google. Google Business Profile Help. [Industry guidance, not peer-reviewed.]

Jones, R., Zhang, W. V., Rey, B., Jhala, P., & Stipp, E. (2008). Geographic intention and modification in web search. International Journal of Geographical Information Science, 22(3), 229, 246.

Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems (Vol. 33, pp. 9459, 9474).

Miller, R. B. (1968). Response time in man-computer conversational transactions. In Proceedings of the AFIPS Fall Joint Computer Conference (Vol. 33, pp. 267, 277).

Nelson, P. (1970). Information and consumer behavior. Journal of Political Economy, 78(2), 311, 329.

Nelson, P. (1974). Advertising as information. Journal of Political Economy, 82(4), 729, 754.

Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5), 323, 351.

Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211, 218.

Pirolli, P., & Card, S. K. (1999). Information foraging. Psychological Review, 106(4), 643, 675.

Spence, M. (1973). Job market signaling. The Quarterly Journal of Economics, 87(3), 355, 374.

Stigler, G. J. (1961). The economics of information. Journal of Political Economy, 69(3), 213, 225.

Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12(4), 5, 33.