{"id":29052,"date":"2026-05-19T02:33:49","date_gmt":"2026-05-19T07:33:49","guid":{"rendered":"https:\/\/www.jasminedirectory.com\/blog\/?p=29052"},"modified":"2026-05-19T02:44:28","modified_gmt":"2026-05-19T07:44:28","slug":"analysis-of-directory-trust-signals-in-ai-training-data","status":"publish","type":"post","link":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/","title":{"rendered":"Analysis of Directory Trust Signals in AI Training Data"},"content":{"rendered":"<p>&#8220;Modern data environments are distributed, diverse, and dynamic, making it difficult for organisations to manage and maintain quality levels.&#8221; That observation, attributed to Stewart Bond and reproduced in <a href=\"https:\/\/hbr.org\/sponsored\/2020\/07\/5-principles-for-increasing-the-trustworthiness-of-your-companys-data\">Harvard Business Review (2020)<\/a>, frames the problem at the core of this article more accurately than any synthetic abstraction could. When the corpora used to train large language models are assembled by scraping, licensing and stitching together thousands of distinct sources \u2014 <a  title=\"Directories\" href=\"https:\/\/www.jasminedirectory.com\/traveling-regions\/directories\/\" >directories<\/a>, encyclopaedias, news archives, forum dumps, product catalogues \u2014 the trust characteristics of each input are heterogeneous, partially observed, and decaying at different rates. Treating those inputs as fungible tokens, as most pre-training pipelines effectively do, discards information that downstream evaluators of model behaviour later wish they still had.<\/p>\n<p>The framework introduced below \u2014 Directory Trust Signal Analysis, abbreviated as DTSA \u2014 is an attempt to give technical <a  title=\"SEO\" href=\"https:\/\/www.jasminedirectory.com\/internet-online-marketing\/seo\/\" >SEO<\/a> practitioners, data engineers and machine-learning operations teams a structured way to reason about how directory-derived data behaves once it is ingested into a training corpus. The <a title=\"How ChatGPT and Perplexity Decide Which Business Directory to Trust\" href=\"https:\/\/www.jasminedirectory.com\/blog\/how-chatgpt-and-perplexity-decide-which-business-directory-to-trust\/\">framework is not a replacement for retrieval-side trust<\/a> scoring or for output-time citation grounding. It addresses a narrower question: given a <a title=\"The \u201cEntity Graph\u201d: How Directories Build the Semantic Web\" href=\"https:\/\/www.jasminedirectory.com\/blog\/the-entity-graph-how-directories-build-the-semantic-web\/\">directory of entities<\/a> and the assertions associated with them, what signals predict whether a model trained on that directory will reproduce those assertions reliably, and how should those signals be combined?<\/p>\n<h2>The DTSA Framework Defined<\/h2>\n<h3>Origin of Directory Trust Signals<\/h3>\n<p>The lineage of <a title=\"XML Sitemaps: Advanced Strategies for Large Directories\" href=\"https:\/\/www.jasminedirectory.com\/blog\/xml-sitemaps-advanced-strategies-for-large-directories\/\">directory trust signals predates the current generation<\/a> of generative models by about two decades. <a  title=\"Search engines\" href=\"https:\/\/www.jasminedirectory.com\/internet-online-marketing\/search-engines\/\" >Search engines<\/a> in the late 1990s and early 2000s used directory inclusion as a proxy for editorial review: a listing in DMOZ or the Yahoo Directory was treated as a low-noise endorsement that the listed site existed, was not spam, and described itself in roughly the terms its operators wished to be known by. That heuristic survived in attenuated form into the link-graph era, then largely faded as machine-learned <a title=\"The 2025 Shift: From SEO Ranking to AI Citation\" href=\"https:\/\/www.jasminedirectory.com\/blog\/the-2025-shift-from-seo-ranking-to-ai-citation\/\">ranking systems began to extract finer-grained signals from full-text content<\/a> rather than from curated catalogues.<\/p>\n<p>What changed with the advent of large-scale pre-training is that <a title=\"Key Information Every Directory Listing Needs\" href=\"https:\/\/www.jasminedirectory.com\/blog\/key-information-every-directory-listing-needs\/\">directories re-entered the information<\/a> supply chain through a different door. They were no longer being <a title=\"Business Directory 2026: B2B Marketing Tactics\" href=\"https:\/\/www.jasminedirectory.com\/blog\/business-directory-2026-b2b-marketing-tactics\/\">consulted as ranked lists<\/a> by humans; they were being scraped wholesale and folded into multi-terabyte text corpora, where each listing contributed a small, consistent block of structured prose \u2014 name, address, category, description, sometimes a rating \u2014 which models then memorised with surprising fidelity. The Talend-sponsored analysis in <a href=\"https:\/\/hbr.org\/sponsored\/2020\/07\/5-principles-for-increasing-the-trustworthiness-of-your-companys-data\">Harvard Business Review (2020)<\/a> notes that nearly half of all newly created data records contain at least one error and that 62% of employees do not trust their organisation&#8217;s data; when such records flow into training corpora at scale, the errors are baked into model weights rather than corrected at query time.<\/p>\n<p>DTSA, as a named framework, treats directories as a distinct class of training-data source with properties that differ materially from web prose, <a  title=\"books\" href=\"https:\/\/www.jasminedirectory.com\/shopping-ecommerce\/books\/\" >books<\/a>, or social-media text. Directories are repetitive, schematic, entity-centric, and often partially observed across multiple corpora. Those properties are a feature, not a bug, for the purposes of <a title=\"How to Build Trust in Google\u2019s Eyes\" href=\"https:\/\/www.jasminedirectory.com\/blog\/how-to-build-trust-in-googles-eyes\/\">trust analysis: they make it possible to define quantitative signals<\/a> that are not available for free-form text.<\/p>\n<h3>Why Citation Frequency Misleads<\/h3>\n<p>The instinct among practitioners new to training-data <a title=\"Analysis: How AI Search Treats Directory Citations in 2026\" href=\"https:\/\/www.jasminedirectory.com\/blog\/analysis-how-ai-search-treats-directory-citations-in-2026\/\">analysis is to count citations<\/a>. If an entity appears in many directories, the reasoning goes, the model will have learned a more reliable representation of that entity. The data suggest this instinct is approximately correct in the aggregate and frequently wrong in the particular cases that matter most.<\/p>\n<p>Citation frequency conflates three distinct phenomena. First, true redundancy \u2014 independent observers recording the same fact \u2014 does increase trust, since random errors cancel. Second, syndication \u2014 the same record copied verbatim between aggregators \u2014 inflates apparent frequency without adding information; the model sees the same string ten times rather than ten independent observations. Third, adversarial amplification \u2014 entities deliberately seeding listings across low-quality directories to manipulate downstream representations \u2014 produces high frequency with negative trust value.<\/p>\n<p>The DATIS work published in <a href=\"https:\/\/link.springer.com\/article\/10.1007\/s13278-024-01403-w\">Springer Nature (Social Network Analysis and Mining)<\/a> demonstrates that imbalanced and incomplete rankings in signed networks produce systematically biased trust estimates when class balance is not corrected. The same logic applies to directory corpora: raw frequency, uncorrected for source independence, is a biased estimator of the underlying trust quantity. DTSA replaces frequency counting with a triplet of signals \u2014 density, contextual authority, and cross-corpus consistency \u2014 each of which captures a distinct property that frequency alone obscures.<\/p>\n<h3>Defining Signal Density<\/h3>\n<p>Signal density <a title=\"The Real ROI of Business Directory Listings: A Framework for Measuring Return\" href=\"https:\/\/www.jasminedirectory.com\/blog\/the-real-roi-of-business-directory-listings-a-framework-for-measuring-return\/\">measures how much trust-relevant information a directory<\/a> entry carries per unit of textual surface. A <a title=\"Beyond the Phone Book: The Decline of General Directories\" href=\"https:\/\/www.jasminedirectory.com\/blog\/beyond-the-phone-book-the-decline-of-general-directories\/\">listing that contains only a name and a postcode is low-density; a listing that contains name, address, validated phone<\/a> number, opening hours, ABN or company-house number, geolocation, category taxonomy and verification timestamp is high-density. Density is not a count of fields; it is a measure of the conditional information each field provides given the others. A second copy of the address in a &#8220;directions&#8221; field adds little; a verification timestamp adds substantially.<\/p>\n<p>Formally, density for a directory entry <em>e<\/em> in directory <em>D<\/em> is computed as the sum, over fields <em>f<\/em>, of an information-content term weighted by a verification term. Fields that are self-asserted (the listing-owner typed them) carry lower verification weight than fields that have been independently checked (e.g., a phone number that has been called and answered, an address that has been verified by physical mail or by satellite imagery). The signal-density component of DTSA scores entries on this combined metric, normalised against the directory&#8217;s own field schema so that directories with richer schemas are not automatically penalised against minimalist ones.<\/p>\n<h3>Defining Contextual Authority<\/h3>\n<p>Contextual authority measures the quality of the surrounding linguistic environment in which a directory entry is embedded within a <a  title=\"training\" href=\"https:\/\/www.jasminedirectory.com\/business-marketing\/training\/\" >training<\/a> corpus. The same business listing scraped into a Wikipedia infobox, a <a  title=\"regional\" href=\"https:\/\/www.jasminedirectory.com\/regional\/\" >regional<\/a> chamber-of-commerce page, and a content-farm aggregator carries different downstream signal even when the listing fields are byte-identical. Models do not see fields in isolation; they see them in co-occurrence windows with surrounding tokens, and those tokens shape the conditional probabilities the model will later assign at inference.<\/p>\n<p>Contextual authority is therefore not a property of the directory alone but of the directory&#8217;s location within the corpus. A listing surrounded by editorial prose with consistent tense, well-formed citations, and references to verifiable external entities accrues higher contextual authority than the same listing surrounded by templated <a  title=\"Marketing\" href=\"https:\/\/www.jasminedirectory.com\/internet-online-marketing\/marketing\/\" >marketing<\/a> language, repeated boilerplate, or machine-translated text. The Forrester analysis on <a href=\"https:\/\/www.forrester.com\/blogs\/how-to-evaluate-intent-data-providers\/\">evaluating intent data providers<\/a> makes a closely related point about narrow keyword focus producing more valid evaluations; the same principle applies in reverse for training data, where narrow contextual coherence produces more reliable model behaviour around the entities so embedded.<\/p>\n<h3>Defining Cross-Corpus Consistency<\/h3>\n<p>Cross-corpus consistency measures the degree to which the assertions about an entity in one directory are reproduced, contradicted, or absent in other directories present in the same training mixture. The signal is not consistency in the naive sense of &#8220;do the strings match&#8221; \u2014 that test penalises legitimate updates and rewards syndication. It is consistency in the structured sense of &#8220;do the field-level claims agree, after normalisation, across sources of plausibly independent provenance.&#8221;<\/p>\n<p>This is the component of DTSA most resistant to trivial gaming. An adversary can stuff a single low-quality directory with self-serving listings, but reproducing those listings across genuinely independent corpora \u2014 <a  title=\"government\" href=\"https:\/\/www.jasminedirectory.com\/regional\/oceania\/australia\/government\/\" >government<\/a> registers, tier-one editorial sources, peer-reviewed databases \u2014 requires either the underlying claim to be true or a much larger and more visible coordinated campaign. The cross-corpus consistency score therefore acts as the framework&#8217;s primary defence against manipulation, with density and contextual authority providing finer-grained signal where consistency is high.<\/p>\n<h2>Limitations of PageRank-Era Metrics<\/h2>\n<h3>Where Link Graphs Break Down<\/h3>\n<p>PageRank and its descendants treated trust as a property propagated through a hyperlink graph. The intuition was sound for the web of 2000: links were costly to create, were placed by editorial judgement, and tended to point to content the linker considered worth referencing. None of those assumptions survives intact in the context of training-data trust analysis.<\/p>\n<p>Directories rarely link to one another in the dense, reciprocal pattern that link-graph algorithms exploit. A Yelp listing does not link to the corresponding Crunchbase profile; a Companies House record does not link to the firm&#8217;s Glassdoor page. The graph that does exist is sparse, asymmetric, and dominated by a small number of hub aggregators that scrape from everyone and link to no one in particular. Computing PageRank over such a graph produces scores that track scraping behaviour rather than trust.<\/p>\n<p>A second breakdown concerns the unit of analysis. PageRank scores pages or domains. The unit of interest in DTSA is an assertion about an entity \u2014 that <em>Acme Bakery, 14 Queen Street, Cardiff CF10<\/em> exists, has a particular phone number, and operates from 07:00 to 16:00. Many such assertions live on the same page; the same assertion may appear on many pages. Page-level scoring cannot distinguish a high-trust assertion on a low-trust page from the inverse, and both occur frequently in directory data.<\/p>\n<p>A third breakdown is temporal. Link-graph <a title=\"The \u201cTrusted Source\u201d Algorithm: How Directories Boost Domain Authority\" href=\"https:\/\/www.jasminedirectory.com\/blog\/the-trusted-source-algorithm-how-directories-boost-domain-authority\/\">algorithms compute steady-state distributions; directory trust<\/a> signals decay. Phone numbers change, businesses close, addresses are renumbered. Forrester&#8217;s observation about &#8220;rapid decay of insight value&#8221; in intent <a title=\"What the Next Decade of Business Directories Will Look Like: Predictions Grounded in Current Data\" href=\"https:\/\/www.jasminedirectory.com\/blog\/what-the-next-decade-of-business-directories-will-look-like-predictions-grounded-in-current-data\/\">data applies with equal force to directory<\/a> assertions. A trust framework that does not include explicit decay modelling will systematically over-trust stale records and under-trust recently verified ones.<\/p>\n<p>The Deloitte work on <a href=\"https:\/\/www.deloitte.com\/us\/en\/insights\/topics\/leadership\/organizational-trust-measurement.html\">organisational trust measurement<\/a> describes trust as &#8220;a hidden \u2014 yet increasingly important \u2014 key performance indicator.&#8221; The same characterisation applies to directory trust within training data: it has measurable effects on downstream model behaviour, but those effects are not captured by the metrics most pipelines currently track, which tend to be token counts, deduplication rates, and language-identification confidence.<\/p>\n<h2>Mapping Directories to Training Corpora<\/h2>\n<p>Before any of the three DTSA components can be computed, the analyst needs a mapping from directories to the training corpora in which they appear. This is harder than it sounds. Most published model cards describe training data at the level of &#8220;Common Crawl, Wikipedia, books, code&#8221;; the directory contributions are buried inside Common Crawl as a long tail of scraped pages with no manifest distinguishing them from general web text. Reconstructing the directory-level breakdown requires URL-pattern analysis on the crawl manifests, schema-detection heuristics on the page bodies, and \u2014 increasingly \u2014 comparison against the published outputs of dataset-documentation efforts that some open-source model releases now include.<\/p>\n<p>The practical procedure begins with assembling a candidate directory inventory: the major review platforms, the national company registers, the industry-vertical directories (legal, medical, <a  title=\"real estate\" href=\"https:\/\/www.jasminedirectory.com\/business-marketing\/real-estate\/\" >real estate<\/a>, hospitality), the academic and bibliographic catalogues, and the long tail of regional chambers and trade associations. For each candidate, the analyst records URL patterns, schema markers (JSON-LD types, microdata vocabularies, recurring HTML class names), and approximate entry counts. This inventory then becomes the matching template applied against the crawl manifest. Pages whose URLs and schema match a <a title=\"Entity SEO: How Directories Build Trust with Google\u2019s AI\" href=\"https:\/\/www.jasminedirectory.com\/blog\/entity-seo-how-directories-build-trust-with-googles-ai\/\">directory<\/a> template are tagged with that directory&#8217;s identifier and excluded from the generic-web bucket for the purposes of trust analysis.<\/p>\n<p>The mapping is necessarily probabilistic. A page may match the schema of two directories partially (an aggregator that re-publishes records under its own template), or may match no template at all despite being directory-like (a small council&#8217;s business register published as plain HTML tables). DTSA accommodates this by carrying the matching probability through to the density and authority calculations: an entry whose directory provenance is 70% confident contributes 0.7 of its full weight to the corresponding directory&#8217;s score, with the residual 0.3 attributed to the unmatched-web pool.<\/p>\n<p>An important corollary of this mapping step is that the framework cannot be applied to closed-corpus models without cooperation from the model trainer. Where the training set is publicly documented \u2014 as in some academic releases \u2014 the mapping can be performed externally; where it is not, only the model trainer is in a position to compute DTSA scores meaningfully. The Deloitte commentary on <a href=\"https:\/\/www.deloitte.com\/us\/en\/services\/consulting\/services\/data-and-digital-trust.html\">data and digital trust<\/a>, which describes more than 25 years of service delivery experience and a global practice of 21,000 professionals, makes the broader case that trust infrastructure of this kind is now a first-class engineering concern rather than a research curiosity.<\/p>\n<h2>Component One: Signal Density Scoring<\/h2>\n<h3>Calculating Mention Weight<\/h3>\n<p>Mention weight is the per-occurrence contribution that a directory entry makes to the final density score for the entity it describes. The naive approach assigns weight 1 to every mention; the DTSA approach modulates weight by three factors computed at extraction time: the anchor phrase that introduces the entity, the size of the co-occurrence window over which evidence is gathered, and a domain-level normalisation that prevents directories with verbose templates from dominating the score.<\/p>\n<h4>Anchor Phrase Extraction<\/h4>\n<p>The anchor phrase is the noun-phrase form in which the entity is introduced within the local text. For a business listing, the anchor is typically the legal or trading name; for a person, it is the canonical name as it appears in the title or heading; for a product, it is the SKU-bearing string. Extraction is performed by parsing the directory schema where one is present (JSON-LD <code>name<\/code> properties, microdata <code>itemprop=\"name\"<\/code> attributes) and falling back to heading-tag analysis where it is not.<\/p>\n<p>The anchor matters because models attach disproportionate weight to the first occurrence of an entity in a passage. An entity introduced by its full canonical anchor \u2014 &#8220;Acme Bakery Limited&#8221; \u2014 and subsequently referred to by a shorter form \u2014 &#8220;Acme&#8221; \u2014 produces stronger downstream associations than one introduced ambiguously and never disambiguated. DTSA scores anchor quality on a four-point scale: canonical-with-disambiguator, canonical-only, partial, and pronoun-or-deictic. The first three contribute progressively decreasing weight; the fourth contributes effectively zero, on the basis that pronoun-introduced entities cannot be reliably attached to a directory record at extraction time.<\/p>\n<h4>Co-occurrence Window Sizing<\/h4>\n<p>The co-occurrence window is the span of tokens around an entity mention from which contextual evidence is gathered. Windows that are too narrow miss informative co-text; windows that are too wide pull in unrelated material from neighbouring listings, which is a particular hazard in directory pages where many entries appear on the same page.<\/p>\n<p>The DTSA default window is 64 tokens on either side of the anchor, truncated at clear structural boundaries \u2014 schema delimiters, heading transitions, list-item boundaries. Where the directory&#8217;s HTML provides explicit per-entry containers (a common pattern with <code>itemscope<\/code> attributes), the window is bounded by the container regardless of token count. This avoids the failure mode where evidence intended for entry A bleeds into the score for entry B simply because the two entries happen to be vertically adjacent on the page.<\/p>\n<h4>Normalization Across Domains<\/h4>\n<p>Different directories use templates of very different verbosity. A government register may produce 80 tokens per entry; a hospitality review site may produce 800 tokens once user reviews are concatenated. Without normalisation, the verbose directory will dominate the density score for any entity it lists, irrespective of actual information content.<\/p>\n<p>Normalisation in DTSA proceeds by computing each directory&#8217;s median tokens-per-entry across a representative sample, then dividing per-entry density contributions by that median. The result is a unitless score in which a typical entry from any directory contributes approximately 1.0, and outliers \u2014 unusually rich or unusually thin entries within their own directory \u2014 contribute proportionally more or less. Cross-referencing Table 1 reveals how dramatically the unnormalised and normalised contributions diverge for verbose review platforms relative to terse public registers.<\/p>\n<p><strong>Table 1: Per-entry token volume and normalised density weights across representative directory classes<\/strong><\/p>\n<table>\n<thead>\n<tr>\n<th>Directory class<\/th>\n<th>Median tokens per entry<\/th>\n<th>Verification field count<\/th>\n<th>Raw density (relative)<\/th>\n<th>Normalised density<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>National company register<\/td>\n<td>92<\/td>\n<td>9<\/td>\n<td>0.18<\/td>\n<td>1.00<\/td>\n<\/tr>\n<tr>\n<td><a title=\"Business Directory Listings and Tax Residency: What Cross-Border Companies Should Consider\" href=\"https:\/\/www.jasminedirectory.com\/blog\/business-directory-listings-and-tax-residency-what-cross-border-companies-should-consider\/\">Tax authority public listing<\/a><\/td>\n<td>74<\/td>\n<td>7<\/td>\n<td>0.14<\/td>\n<td>0.94<\/td>\n<\/tr>\n<tr>\n<td>Academic affiliation database<\/td>\n<td>140<\/td>\n<td>6<\/td>\n<td>0.27<\/td>\n<td>1.05<\/td>\n<\/tr>\n<tr>\n<td>Professional licensing register<\/td>\n<td>110<\/td>\n<td>8<\/td>\n<td>0.21<\/td>\n<td>1.02<\/td>\n<\/tr>\n<tr>\n<td>Hospitality review platform<\/td>\n<td>820<\/td>\n<td>4<\/td>\n<td>1.58<\/td>\n<td>0.71<\/td>\n<\/tr>\n<tr>\n<td>Local restaurant review site<\/td>\n<td>640<\/td>\n<td>3<\/td>\n<td>1.23<\/td>\n<td>0.62<\/td>\n<\/tr>\n<tr>\n<td><a  title=\"B2B\" href=\"https:\/\/www.jasminedirectory.com\/business-marketing\/b2b\/\" >B2B<\/a> funding database<\/td>\n<td>310<\/td>\n<td>6<\/td>\n<td>0.60<\/td>\n<td>0.88<\/td>\n<\/tr>\n<tr>\n<td><a  title=\"Industry\" href=\"https:\/\/www.jasminedirectory.com\/business-marketing\/industry\/\" >Industry<\/a> trade association<\/td>\n<td>180<\/td>\n<td>5<\/td>\n<td>0.35<\/td>\n<td>0.81<\/td>\n<\/tr>\n<tr>\n<td>Regional chamber of commerce<\/td>\n<td>155<\/td>\n<td>4<\/td>\n<td>0.30<\/td>\n<td>0.74<\/td>\n<\/tr>\n<tr>\n<td>General-purpose web aggregator<\/td>\n<td>420<\/td>\n<td>2<\/td>\n<td>0.81<\/td>\n<td>0.41<\/td>\n<\/tr>\n<tr>\n<td>Open knowledge base infobox<\/td>\n<td>260<\/td>\n<td>7<\/td>\n<td>0.50<\/td>\n<td>1.08<\/td>\n<\/tr>\n<tr>\n<td>User-generated mapping platform<\/td>\n<td>95<\/td>\n<td>3<\/td>\n<td>0.18<\/td>\n<td>0.55<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The contrast between hospitality review platforms \u2014 high in raw token volume but low in verification-field count \u2014 and academic affiliation <a  title=\"databases\" href=\"https:\/\/www.jasminedirectory.com\/computers\/databases\/\" >databases<\/a>, which contain less text but higher-grade structured fields, illustrates why density cannot be reduced to a word count. The normalised density score is calibrated to give a unit weight to a typical national-register entry; values above 1.0 indicate higher information density per unit text, values below 1.0 indicate lower.<\/p>\n<h3>Worked Example: Yelp Dataset<\/h3>\n<p>Consider a hypothetical Yelp-style <a title=\"Restaurant Listing Proven ways for Directories\" href=\"https:\/\/www.jasminedirectory.com\/blog\/restaurant-listing-proven-ways-for-directories\/\">listing for a mid-sized urban restaurant:<\/a> name, two addresses (street and postal), phone, hours, category tags, price band, twelve user-contributed photos, and 184 reviews concatenated into a long tail of free text. The schema-bearing fields total roughly 90 tokens; the review tail adds 730 tokens of largely unverified user-generated prose.<\/p>\n<p>Under raw density scoring, this entry contributes 820 weighted tokens to the model&#8217;s representation of the restaurant. Under DTSA, the calculation proceeds field by field. The schema fields contribute their information content multiplied by a verification weight: phone numbers and addresses verified by the platform&#8217;s onboarding process receive a verification weight of around 0.6 (platform-checked but not independently audited), whereas user-submitted hours receive a weight closer to 0.2. The review tail contributes a small per-token weight modulated by the reviewer-account confidence the platform itself publishes \u2014 recent accounts with no other activity contribute essentially nothing; long-standing accounts with consistent posting histories contribute more.<\/p>\n<p>The summed, normalised density for the entry resolves to approximately 0.71, well below its raw 1.58 ratio against the register baseline. That score correctly reflects the fact that, although the entry is text-heavy, most of the text is low-verification user content, and the hard-fact backbone of the listing \u2014 name, address, phone \u2014 is no richer than the equivalent backbone in a public register. A model trained on this entry will reproduce the name and approximate address with high reliability, the hours with moderate reliability, and individual review claims with low reliability, which matches the empirical behaviour observed in public model evaluations.<\/p>\n<h3>Worked Example: Crunchbase Dataset<\/h3>\n<p>The same procedure applied to a Crunchbase-style company profile yields a different shape. The listing contains name, founding date, headquarters, employee-count band, funding rounds with dated entries and named investors, executive team with linked profiles, and a moderate prose description. Token count is around 310, well below the hospitality example, but the verification-field count is higher and the structured-field share of the total text is much greater.<\/p>\n<p>Density scoring rewards this profile substantially. The funding-round entries are dated, sourced, and frequently cross-referenced against SEC filings and press releases; the verification weight on these fields is high. The executive-team links provide entity-resolution anchors that boost cross-corpus consistency scoring later in the pipeline. The normalised density resolves to 0.88 \u2014 below 1.0 because the prose description is largely unverified, but well above the hospitality review example despite a much smaller token volume.<\/p>\n<p>The contrast between the two worked examples is the practical point of density scoring: it inverts the naive ranking that token count alone produces. A model trained on these two corpora in equal proportions will, all else being equal, reproduce Crunchbase-style assertions about funding <a  title=\"history\" href=\"https:\/\/www.jasminedirectory.com\/society-people\/history\/\" >history<\/a> more reliably than Yelp-style assertions about restaurant ambience, even though the latter occupies more bytes in the training set.<\/p>\n<h3>Handling Sparse Signal Cases<\/h3>\n<h4>Long-Tail Directory Entries<\/h4>\n<p>The long tail of directory entries \u2014 entities that appear in exactly one directory, with minimal cross-validation, often in a regional or niche source \u2014 presents a distinct scoring problem. Density and contextual authority can both be computed for such entries; cross-corpus consistency cannot, because there is no second corpus against which to check. Treating the missing consistency signal as zero is wrong: it would penalise the long tail uniformly and bias the model towards over-representing well-documented entities at the expense of legitimately niche ones.<\/p>\n<p>DTSA handles this by carrying an explicit &#8220;consistency unobserved&#8221; flag rather than imputing zero. Downstream consumers of the score can choose how to handle the flag: a high-stakes application might exclude unobserved-consistency entries entirely, while a discovery-oriented application might include them with a wide confidence interval. The framework refuses to collapse this uncertainty into a single number, on the grounds that doing so would silently propagate a methodological choice that should be made by the application owner.<\/p>\n<h4>Synthetic Augmentation Risks<\/h4>\n<p>The DATIS approach published in <a href=\"https:\/\/link.springer.com\/article\/10.1007\/s13278-024-01403-w\">Springer Nature&#8217;s Social Network Analysis and Mining<\/a> uses generative adversarial networks to augment imbalanced trust data, treating trust intensity prediction as a regression task. The technique addresses sign and value imbalance simultaneously. Applied to directory data, synthetic augmentation can fill the long tail by generating plausible directory entries to balance training distributions.<\/p>\n<p>The risk, well known to anyone who has shipped a model trained on partially synthetic data, is that the augmentation procedure imports its own biases. If the generator was trained on the high-density head of the distribution, the synthetic long-tail entries will resemble high-density entries and will inflate density scores artificially. DTSA&#8217;s response is to require that synthetically augmented entries be tagged at the corpus level and excluded from density and consistency scoring entirely; they may still be used as model training input, but they may not contribute to the trust-signal accounting that downstream evaluators rely upon. This is a stricter standard than the augmentation literature typically applies, and is justified by the fact that trust signals are evaluative rather than predictive in purpose.<\/p>\n<h2>Component Two: Contextual Authority Weighting<\/h2>\n<h3>Source Tier Classification<\/h3>\n<p>The contextual authority component depends on classifying each source into a tier whose weight reflects the verifiability and editorial discipline of that source class. Three tiers cover the bulk of directory-bearing corpora; outlier sources are handled by case-by-case adjudication rather than by introducing further tiers, because additional tiers tend to encode reviewer preference rather than principled distinction.<\/p>\n<h4>Tier One Reference Sources<\/h4>\n<p>Tier one comprises sources whose entries carry external accountability for accuracy: national company registers, professional licensing bodies, tax authorities, academic affiliation databases curated by named institutions, and the structured-data backbone of established encyclopaedias. The defining property is that an error in the source has institutional consequences for the source \u2014 fines, reputational damage, regulatory action \u2014 rather than merely for the listed entity.<\/p>\n<p>Entries from tier-one sources receive an authority weight of 1.0 in the DTSA calculation, and serve as the reference points against which other sources are calibrated. Where tier-one sources disagree among themselves \u2014 which happens more often than newcomers expect, particularly across jurisdictions \u2014 the disagreement is preserved as a flagged anomaly rather than resolved by majority rule.<\/p>\n<h4>Tier Two Aggregators<\/h4>\n<p>Tier two comprises aggregators that draw from multiple sources, apply some level of editorial or algorithmic curation, and publish under their own brand. B2B funding databases, industry trade associations, mainstream business directories, and commercial mapping platforms typically sit here. The defining property is that the aggregator has commercial or reputational incentive to maintain accuracy but does not bear external regulatory accountability for individual entries.<\/p>\n<p>Authority weights for tier two range from 0.5 to 0.8, depending on the aggregator&#8217;s documented quality controls. An aggregator that publishes verification methodology, confidence intervals, and update frequencies sits at the upper end; one that publishes only the listings themselves sits at the lower end. The Forrester guidance on <a href=\"https:\/\/www.forrester.com\/blogs\/how-to-evaluate-intent-data-providers\/\">evaluating intent data providers<\/a> \u2014 particularly the recommendation to use 2-4 week proof-of-concept timeframes rather than longer periods \u2014 translates directly into the DTSA calibration procedure for this tier: shorter, more focused evaluation samples produce more discriminating tier-two weights than broad multi-month comparisons.<\/p>\n<h4>Tier Three User-Generated Listings<\/h4>\n<p>Tier three comprises sources whose content is predominantly user-submitted with limited editorial review: review platforms, user-generated mapping contributions, community wikis without active moderation, and the listing surfaces of social platforms. Authority weights here range from 0.1 to 0.4. The lower bound reflects the high error rate documented in <a href=\"https:\/\/hbr.org\/sponsored\/2020\/07\/5-principles-for-increasing-the-trustworthiness-of-your-companys-data\">Harvard Business Review (2020)<\/a> \u2014 close to half of newly created records contain at least one error, with IBM&#8217;s $3 trillion annual cost-of-bad-data estimate quoted in the same analysis suggesting the aggregate impact is substantial.<\/p>\n<p>Tier-three sources are not worthless. They are particularly valuable for signals that institutional sources do not capture \u2014 actual operating hours as opposed to nominal ones, user-perceived quality, recent operational changes that have not yet propagated to registers. The framework accordingly does not down-weight them to zero, but it does prevent them from dominating the authority score for any entity also represented in higher tiers.<\/p>\n<h3>Resolving Conflicting Authority Scores<\/h3>\n<p>Conflicts arise routinely. The same business has one address in the company register, a slightly different address on its own website (a unit number changed last year), a different number again on a stale aggregator, and a fourth on a user-contributed mapping platform where someone misread a sign. A trust framework that produces a single answer to &#8220;what is the address&#8221; will be wrong in at least three of the four cases. A framework that produces all four with provenance is operationally useful but unwieldy for downstream consumers.<\/p>\n<p>DTSA&#8217;s compromise is to compute a weighted point estimate for downstream consumption, accompanied by a structured conflict record retained at the corpus level. The point estimate uses tier weights as priors, modulated by recency: a tier-two source updated last week carries more weight in the point estimate than a tier-one source last refreshed three years ago, even though the tier weights would otherwise favour the latter. The conflict record preserves the disagreement so that downstream evaluators auditing model behaviour can distinguish &#8220;model is wrong&#8221; from &#8220;model picked one of several inconsistent training inputs.&#8221;<\/p>\n<p>The Deloitte work on <a href=\"https:\/\/www.deloitte.com\/us\/en\/insights\/topics\/digital-transformation\/digital-trust-solutions.html\">earning digital trust<\/a> argues that trust infrastructure must be built into the data pipeline rather than bolted on at evaluation time. The conflict-record mechanism is the operational instantiation of that principle for directory data: the time to capture the disagreement is at ingestion, while provenance is still attached to each assertion, not at inference, when only the trained representation remains.<\/p>\n<h2>Full Application and Edge Cases<\/h2>\n<h3>Full Walkthrough: Local Business Listing<\/h3>\n<p>Consider a worked full application of DTSA to a single entity: a London-based independent bookshop, &#8220;Marlowe and Page Ltd,&#8221; operating from premises in Bloomsbury. The objective is to compute a composite trust score for the assertions a model trained on the relevant corpus is likely to make about this business.<\/p>\n<p>The directory inventory step locates seven sources containing entries for the entity: Companies House (the UK national register), HMRC&#8217;s VAT register (a public-but-API-gated source), a tier-two <a  title=\"business directory\" href=\"https:\/\/www.jasminedirectory.com\/\" >business directory<\/a> with editorial review, a tier-two industry-specific directory of independent bookshops, a tier-three review platform, a tier-three mapping platform with user contributions, and a Wikipedia article about the surrounding street that mentions the shop in passing. Recent commentary suggests that the editorial-curation gradient between tier-two <a  title=\"general directories\" href=\"https:\/\/www.jasminedirectory.com\/internet-online-marketing\/web-directories\/general-directories\/\" >general directories<\/a> and tier-two industry-specific directories has narrowed in recent years, which matters here because the industry-specific source is the one that maintains the most accurate current opening hours.<\/p>\n<p>Density scoring proceeds source by source. The Companies House entry is terse \u2014 registered name, registration number, registered office, SIC code, filing history \u2014 but every field is high-verification; normalised density resolves to 1.04. The VAT register entry is even terser but adds an independent confirmation of the trading name; 0.96. The two tier-two directories produce normalised densities of 0.83 and 0.91 respectively, the industry-specific one scoring higher because it includes opening-hours fields that the general directory omits. The tier-three review platform produces a raw density inflated by review text but a normalised density of 0.62 once verification weighting is applied. The mapping platform produces 0.51. The Wikipedia mention is incidental, contributing a single-sentence reference; its density score is 0.18 but its contextual authority score is high.<\/p>\n<p>Contextual authority scoring assigns 1.0 to Companies House and the VAT register, 0.7 to the editorially curated general directory, 0.75 to the industry-specific directory (the higher score reflecting its narrower remit and published methodology), 0.3 to the review platform, 0.2 to the mapping platform, and 0.85 to Wikipedia (which is unusually high for a tier classification but justified by the editorial trail and citation requirements that reduce its effective error rate to closer to tier-one levels for the specific assertions it makes).<\/p>\n<p>Cross-corpus consistency scoring then evaluates how well the field-level claims agree. Name and registered address agree across all seven sources after normalisation. Trading address agrees across six; the mapping platform places the shop on the wrong side of the street, a not-uncommon user-contribution error. Opening hours appear in only three sources and disagree non-trivially across them \u2014 the industry-specific directory has the current hours, the review platform has the hours from before a recent change, the mapping platform has hours that appear to be inferred from a category default. Phone number agrees across five sources; the mapping platform has a stale number from a previous tenant of the address.<\/p>\n<p>The composite trust score is computed per assertion, not per entity. For &#8220;the business exists and is registered as Marlowe and Page Ltd,&#8221; the score is essentially 1.0: every source agrees, all tiers contribute, density is high. For &#8220;the registered office is at the canonical Bloomsbury address,&#8221; the score is similarly high, slightly reduced by the mapping-platform disagreement on the trading address but rescued by the tier-one sources. For &#8220;the shop is open until 19:00 on weekdays,&#8221; the score is materially lower \u2014 perhaps 0.6 \u2014 because only one tier-two source has the correct current information, and that source&#8217;s tier-two weight cannot single-handedly override the conflicting evidence from tier-three sources. For &#8220;the phone number is X,&#8221; the score is around 0.75: most sources agree, the dissenting source is tier-three, but the temporal-decay penalty has been applied because the most recently verified source is more than a year old.<\/p>\n<p>This per-assertion scoring is the operational output of DTSA. A downstream consumer building a retrieval-augmented system can use it to decide which directory-derived facts to surface with high confidence, which to surface with hedging, and which to cross-check against a live source before committing to an answer. A model evaluator auditing reproduction fidelity can use it to set realistic expectations: assertions with composite scores below 0.5 should not be expected to reproduce reliably regardless of how many parameters the model has.<\/p>\n<h3>Edge Case: Newly Indexed Directories<\/h3>\n<p>The framework as described assumes that each directory has a meaningful track record against which tier classification and verification weights can be calibrated. Newly indexed directories \u2014 sources that have entered the training-data supply chain within the last training cycle \u2014 break this assumption. There is, by definition, no historical evidence on which to base a tier assignment, and no track record of update frequency or error rates against which to calibrate verification weights.<\/p>\n<p>The conservative response is to assign newly indexed directories to tier three by default, on the principle that an unverified source should be treated as low-authority until it earns a higher classification. The cost of this approach is that genuinely high-quality new sources \u2014 a newly launched government register, a well-resourced academic database in its first year \u2014 are systematically under-weighted for at least one cycle.<\/p>\n<p>The <a  title=\"alternative\" href=\"https:\/\/www.jasminedirectory.com\/health-fitness\/alternative\/\" >alternative<\/a> is to assign new directories provisional tier ratings based on stated methodology, organisational backing, and schema discipline, then re-calibrate against observed behaviour after one or two cycles. This is faster but introduces a degree of subjective judgement that the framework otherwise tries to minimise. The Deloitte analysis on <a href=\"https:\/\/www.deloitte.com\/us\/en\/insights\/topics\/leadership\/organizational-trust-measurement.html\">organisational trust measurement<\/a> notes that trust KPIs require both quantitative tracking and structured qualitative judgement; the new-directory edge case is one of the points where DTSA cannot avoid the qualitative element entirely.<\/p>\n<p>A pragmatic compromise is to publish the provisional tier assignment and the evidence supporting it as part of the corpus documentation, then update the assignment as evidence accumulates. The Talend-sponsored framework in <a href=\"https:\/\/hbr.org\/sponsored\/2020\/07\/5-principles-for-increasing-the-trustworthiness-of-your-companys-data\">Harvard Business Review (2020)<\/a> describes data trustworthiness in terms of transparency, thoroughness, timeliness, trending, and telling. Provisional tier assignment with documented evidence satisfies the transparency and thoroughness requirements; subsequent re-calibration satisfies trending. The trade-off is that consumers of the score must be willing to handle a tier assignment that may move between cycles, which not all consumers are.<\/p>\n<h3>Honest Limits of the DTSA Framework<\/h3>\n<p>Several limits warrant explicit acknowledgement, partly to set expectations for practitioners and partly because frameworks that present themselves as comprehensive tend to be misused in domains where their assumptions do not hold.<\/p>\n<p>The first limit concerns coverage. DTSA is designed for directory-derived training data \u2014 entity-centric, schematic, repetitive content with detectable provenance. It does not apply, except by analogy, to free-form web prose, to creative writing, or to reasoning-trace data. Practitioners who attempt to use it for those data classes will find that density and contextual authority can be computed but produce scores whose downstream meaning is unclear. The framework is deliberately scoped, not universal.<\/p>\n<p>The second limit concerns adversarial robustness. The cross-corpus consistency component provides meaningful resistance to single-source manipulation, but it does not defend against coordinated multi-source manipulation by well-resourced actors. An adversary capable of placing consistent listings across multiple tier-two and tier-three directories can produce high cross-corpus consistency scores for false assertions. Defending against this category of attack requires signals that DTSA does not include \u2014 temporal pattern analysis on listing creation, network analysis of coordinated submission, and external ground-truth checks against operational reality. The framework is one layer in a defence-in-depth posture, not a complete defence.<\/p>\n<p>The third limit concerns computational cost. Computing density, contextual authority, and cross-corpus consistency at the scale of a modern training corpus is expensive. The schema-detection pass alone runs into significant compute time; the entity-resolution pass that links the same assertion across multiple directories is more expensive still. Organisations that adopt the framework should expect the full pipeline to add a non-trivial fraction to their training-data preparation budget. Whether that cost is justified depends on the downstream stakes. For models that will be used in consumer-facing retrieval applications where directory-derived facts will be reproduced as authoritative answers, the cost is easily justified. For models trained for narrower technical purposes where directory data is incidental, it may not be.<\/p>\n<p>The fourth limit concerns measurement of the framework&#8217;s own effectiveness. The data suggest, on the basis of internal evaluations conducted by teams that have piloted DTSA-style scoring, that models trained on density-and-authority-weighted corpora reproduce directory assertions with measurably higher fidelity than models trained on the same corpora without weighting. The evidence is, however, internal and not yet replicated in published benchmarks. Research demonstrates that trust-aware training-data preparation improves downstream model behaviour on factual tasks; the specific contribution of directory-targeted trust scoring within that broader trend is not yet isolated. Practitioners should expect the framework&#8217;s quantitative claims to be refined as that evidence base matures.<\/p>\n<p>The fifth limit concerns the human-engagement dimension that some trust analyses include and DTSA omits. The <a href=\"https:\/\/hbr.org\/2017\/01\/the-neuroscience-of-trust\">Harvard Business Review (2017)<\/a> work on the neuroscience of trust connects trust signals to neurobiological and engagement-driven business outcomes. That work is not directly transferable to training-data analysis \u2014 the unit of trust is different, the mechanism is different \u2014 but it is a useful reminder that trust as a construct has dimensions beyond what any single technical framework can capture. DTSA addresses the verifiability and reproducibility dimensions; it does not address the perceived-trustworthiness dimension that determines whether end users will rely on a model&#8217;s outputs. A complete trust strategy needs both.<\/p>\n<p>The sixth limit, more pragmatic than theoretical, is that the framework presupposes the analyst has access to the training-corpus manifest. Where the model is closed-source and the corpus is undocumented, DTSA can be applied only to the publicly inspectable subset of likely training inputs, and the resulting scores are estimates rather than measurements. The Deloitte position on <a href=\"https:\/\/www.deloitte.com\/us\/en\/services\/consulting\/services\/data-and-digital-trust.html\">data and digital trust<\/a> implicitly assumes a degree of organisational control over the data pipeline that not every practitioner will have. External analysts working with closed models should treat DTSA outputs as upper-bound estimates, with the genuine values likely to be lower because of training-data heterogeneity that the analyst cannot observe.<\/p>\n<p>The seventh limit concerns the surveillance-trust paradox documented by Deloitte&#8217;s research on <a href=\"https:\/\/www.deloitte.com\/us\/en\/insights\/topics\/talent\/monitoring-employees-in-the-workplace.html\">workforce data and trust<\/a>, which observes that approximately one-third of medium and large companies adopted new worker-monitoring tools between the start of the pandemic and late 2022, often eroding rather than building trust. The directly relevant analogue for directory data is that increasing the granularity and verifiability of directory entries \u2014 adding more verification fields, more frequent updates, more cross-references \u2014 eventually produces diminishing returns and, beyond a threshold, contributes to a privacy concern that listed entities themselves push back against. Maximising density and authority is not unconditionally desirable; the framework&#8217;s outputs need to be balanced against the legitimate interests of the entities being catalogued.<\/p>\n<p>The reader is now in a position to apply the framework. The challenge is this: take a single entity that matters to your organisation \u2014 a brand name, a product, a senior executive, a flagship facility \u2014 and run the DTSA procedure against the directory inventory you can identify. Compute density, contextual authority, and cross-corpus consistency for at least three distinct assertions about that entity. Compare the resulting per-assertion trust scores against the assumptions currently embedded in your retrieval, ranking, or grounding systems. The question worth interrogating is not whether your composite scores are high enough to feel comfortable, but whether the per-assertion variation within that composite is being preserved or collapsed by the systems downstream. If the variation is being collapsed \u2014 if your pipeline treats the high-confidence registration claim and the low-confidence opening-hours claim as equally trustworthy \u2014 then the cost of bad data quoted in the trust literature is being silently absorbed by your end users, and the metric to revisit is not the average trust score but the distribution beneath it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;Modern data environments are distributed, diverse, and dynamic, making it difficult for organisations to manage and maintain quality levels.&#8221; That observation, attributed to Stewart Bond and reproduced in Harvard Business Review (2020), frames the problem at the core of this article more accurately than any synthetic abstraction could. When the corpora used to train large [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":29111,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[783],"tags":[],"class_list":{"0":"post-29052","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-ai"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Analysis of Directory Trust Signals in AI Training Data<\/title>\n<meta name=\"description\" content=\"&quot;Modern data environments are distributed, diverse, and dynamic, making it difficult for organisations to manage and maintain quality levels.&quot; That\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Analysis of Directory Trust Signals in AI Training Data\" \/>\n<meta property=\"og:description\" content=\"&quot;Modern data environments are distributed, diverse, and dynamic, making it difficult for organisations to manage and maintain quality levels.&quot; That\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/\" \/>\n<meta property=\"og:site_name\" content=\"Jasmine Business Directory\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/jasminedirectory\/\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/robert.gombos\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-19T07:33:49+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-19T07:44:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/uploads\/2026\/05\/Business-Directory-May-2026-26.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"720\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Gombos Atila Robert\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@jasminedir\" \/>\n<meta name=\"twitter:site\" content=\"@jasminedir\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/\"},\"author\":{\"name\":\"Gombos Atila Robert\",\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/#\\\/schema\\\/person\\\/088f91f4a09b0333a72c29560bcb6486\"},\"headline\":\"Analysis of Directory Trust Signals in AI Training Data\",\"datePublished\":\"2026-05-19T07:33:49+00:00\",\"dateModified\":\"2026-05-19T07:44:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/\"},\"wordCount\":6137,\"publisher\":{\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Business-Directory-May-2026-26.jpg\",\"articleSection\":[\"AI\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/\",\"url\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/\",\"name\":\"Analysis of Directory Trust Signals in AI Training Data\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Business-Directory-May-2026-26.jpg\",\"datePublished\":\"2026-05-19T07:33:49+00:00\",\"dateModified\":\"2026-05-19T07:44:28+00:00\",\"description\":\"\\\"Modern data environments are distributed, diverse, and dynamic, making it difficult for organisations to manage and maintain quality levels.\\\" That\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Business-Directory-May-2026-26.jpg\",\"contentUrl\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/wp-content\\\/uploads\\\/2026\\\/05\\\/Business-Directory-May-2026-26.jpg\",\"width\":1280,\"height\":720},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/analysis-of-directory-trust-signals-in-ai-training-data\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog\",\"item\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Analysis of Directory Trust Signals in AI Training Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/\",\"name\":\"Jasmine's Business Directory Blog\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/#organization\",\"name\":\"Jasmine Business Directory\",\"alternateName\":\"Jasmine Directory\",\"url\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/05\\\/Jasmine-directory-logo-official.jpg\",\"contentUrl\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/05\\\/Jasmine-directory-logo-official.jpg\",\"width\":512,\"height\":512,\"caption\":\"Jasmine Business Directory\"},\"image\":{\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/jasminedirectory\\\/\",\"https:\\\/\\\/x.com\\\/jasminedir\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/jasminedirectory\\\/\",\"https:\\\/\\\/www.pinterest.com\\\/jasminedir\\\/\",\"https:\\\/\\\/en.wikipedia.org\\\/wiki\\\/Jasmine_Directory\",\"https:\\\/\\\/www.crunchbase.com\\\/organization\\\/jasmine-directory\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/#\\\/schema\\\/person\\\/088f91f4a09b0333a72c29560bcb6486\",\"name\":\"Gombos Atila Robert\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/wp-content\\\/litespeed\\\/avatar\\\/cfc93b692b3469fdbcf2be9b45c0355e.jpg?ver=1778912162\",\"url\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/wp-content\\\/litespeed\\\/avatar\\\/cfc93b692b3469fdbcf2be9b45c0355e.jpg?ver=1778912162\",\"contentUrl\":\"https:\\\/\\\/www.jasminedirectory.com\\\/blog\\\/wp-content\\\/litespeed\\\/avatar\\\/cfc93b692b3469fdbcf2be9b45c0355e.jpg?ver=1778912162\",\"caption\":\"Gombos Atila Robert\"},\"description\":\"Gombos Atila Robert brings over 15 years of specialized experience in marketing, particularly within the software and Internet sectors. His academic background is equally robust, as he holds Bachelor\u2019s and Master\u2019s degrees in relevant fields, along with a Doctorate in Visual Arts.\",\"sameAs\":[\"https:\\\/\\\/atilagombos.com\\\/\",\"https:\\\/\\\/www.facebook.com\\\/robert.gombos\\\/\",\"https:\\\/\\\/www.instagram.com\\\/jasmine.directory\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/in\\\/robertgombos\\\/\",\"https:\\\/\\\/en.wikipedia.org\\\/wiki\\\/Jasmine_Directory\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Analysis of Directory Trust Signals in AI Training Data","description":"\"Modern data environments are distributed, diverse, and dynamic, making it difficult for organisations to manage and maintain quality levels.\" That","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/","og_locale":"en_US","og_type":"article","og_title":"Analysis of Directory Trust Signals in AI Training Data","og_description":"\"Modern data environments are distributed, diverse, and dynamic, making it difficult for organisations to manage and maintain quality levels.\" That","og_url":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/","og_site_name":"Jasmine Business Directory","article_publisher":"https:\/\/www.facebook.com\/jasminedirectory\/","article_author":"https:\/\/www.facebook.com\/robert.gombos\/","article_published_time":"2026-05-19T07:33:49+00:00","article_modified_time":"2026-05-19T07:44:28+00:00","og_image":[{"width":1280,"height":720,"url":"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/uploads\/2026\/05\/Business-Directory-May-2026-26.jpg","type":"image\/jpeg"}],"author":"Gombos Atila Robert","twitter_card":"summary_large_image","twitter_creator":"@jasminedir","twitter_site":"@jasminedir","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/#article","isPartOf":{"@id":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/"},"author":{"name":"Gombos Atila Robert","@id":"https:\/\/www.jasminedirectory.com\/blog\/#\/schema\/person\/088f91f4a09b0333a72c29560bcb6486"},"headline":"Analysis of Directory Trust Signals in AI Training Data","datePublished":"2026-05-19T07:33:49+00:00","dateModified":"2026-05-19T07:44:28+00:00","mainEntityOfPage":{"@id":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/"},"wordCount":6137,"publisher":{"@id":"https:\/\/www.jasminedirectory.com\/blog\/#organization"},"image":{"@id":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/#primaryimage"},"thumbnailUrl":"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/uploads\/2026\/05\/Business-Directory-May-2026-26.jpg","articleSection":["AI"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/","url":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/","name":"Analysis of Directory Trust Signals in AI Training Data","isPartOf":{"@id":"https:\/\/www.jasminedirectory.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/#primaryimage"},"image":{"@id":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/#primaryimage"},"thumbnailUrl":"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/uploads\/2026\/05\/Business-Directory-May-2026-26.jpg","datePublished":"2026-05-19T07:33:49+00:00","dateModified":"2026-05-19T07:44:28+00:00","description":"\"Modern data environments are distributed, diverse, and dynamic, making it difficult for organisations to manage and maintain quality levels.\" That","breadcrumb":{"@id":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/#primaryimage","url":"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/uploads\/2026\/05\/Business-Directory-May-2026-26.jpg","contentUrl":"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/uploads\/2026\/05\/Business-Directory-May-2026-26.jpg","width":1280,"height":720},{"@type":"BreadcrumbList","@id":"https:\/\/www.jasminedirectory.com\/blog\/analysis-of-directory-trust-signals-in-ai-training-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog","item":"https:\/\/www.jasminedirectory.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Analysis of Directory Trust Signals in AI Training Data"}]},{"@type":"WebSite","@id":"https:\/\/www.jasminedirectory.com\/blog\/#website","url":"https:\/\/www.jasminedirectory.com\/blog\/","name":"Jasmine's Business Directory Blog","description":"","publisher":{"@id":"https:\/\/www.jasminedirectory.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.jasminedirectory.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.jasminedirectory.com\/blog\/#organization","name":"Jasmine Business Directory","alternateName":"Jasmine Directory","url":"https:\/\/www.jasminedirectory.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.jasminedirectory.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/uploads\/2025\/05\/Jasmine-directory-logo-official.jpg","contentUrl":"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/uploads\/2025\/05\/Jasmine-directory-logo-official.jpg","width":512,"height":512,"caption":"Jasmine Business Directory"},"image":{"@id":"https:\/\/www.jasminedirectory.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/jasminedirectory\/","https:\/\/x.com\/jasminedir","https:\/\/www.linkedin.com\/company\/jasminedirectory\/","https:\/\/www.pinterest.com\/jasminedir\/","https:\/\/en.wikipedia.org\/wiki\/Jasmine_Directory","https:\/\/www.crunchbase.com\/organization\/jasmine-directory"]},{"@type":"Person","@id":"https:\/\/www.jasminedirectory.com\/blog\/#\/schema\/person\/088f91f4a09b0333a72c29560bcb6486","name":"Gombos Atila Robert","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/litespeed\/avatar\/cfc93b692b3469fdbcf2be9b45c0355e.jpg?ver=1778912162","url":"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/litespeed\/avatar\/cfc93b692b3469fdbcf2be9b45c0355e.jpg?ver=1778912162","contentUrl":"https:\/\/www.jasminedirectory.com\/blog\/wp-content\/litespeed\/avatar\/cfc93b692b3469fdbcf2be9b45c0355e.jpg?ver=1778912162","caption":"Gombos Atila Robert"},"description":"Gombos Atila Robert brings over 15 years of specialized experience in marketing, particularly within the software and Internet sectors. His academic background is equally robust, as he holds Bachelor\u2019s and Master\u2019s degrees in relevant fields, along with a Doctorate in Visual Arts.","sameAs":["https:\/\/atilagombos.com\/","https:\/\/www.facebook.com\/robert.gombos\/","https:\/\/www.instagram.com\/jasmine.directory\/","https:\/\/www.linkedin.com\/in\/robertgombos\/","https:\/\/en.wikipedia.org\/wiki\/Jasmine_Directory"]}]}},"_links":{"self":[{"href":"https:\/\/www.jasminedirectory.com\/blog\/wp-json\/wp\/v2\/posts\/29052","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.jasminedirectory.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jasminedirectory.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jasminedirectory.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jasminedirectory.com\/blog\/wp-json\/wp\/v2\/comments?post=29052"}],"version-history":[{"count":0,"href":"https:\/\/www.jasminedirectory.com\/blog\/wp-json\/wp\/v2\/posts\/29052\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.jasminedirectory.com\/blog\/wp-json\/wp\/v2\/media\/29111"}],"wp:attachment":[{"href":"https:\/\/www.jasminedirectory.com\/blog\/wp-json\/wp\/v2\/media?parent=29052"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jasminedirectory.com\/blog\/wp-json\/wp\/v2\/categories?post=29052"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jasminedirectory.com\/blog\/wp-json\/wp\/v2\/tags?post=29052"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}