Root-LD is a linked data specification for machine-readable provenance records. A Root-LD document consists of three ordered layers — Anchor, Body, and Recursive — embedded as a JSON-LD block in the <head> of a web page. An AI crawler, search agent, or human browser retrieves complete provenance on the first HTTP request without requiring a body parse. Boise Standard is the originating registry and first regional-scale deployment. The reference implementation is published at boisestandard.org/standard. A live verified entity record is available at boisestandard.org/web/boisestandard-org.
Root-LD addresses the provenance gap in the machine-readable web. Web pages carry schema.org structured data describing their content, but those declarations carry no record of how they were produced, when they were measured, from what source, or by what method. An AI system retrieving structured data from the open web receives a static snapshot with no chain of custody.
Root-LD wraps every structured data record in a three-layer provenance envelope. The envelope travels with the page. Every AI crawler, language model, and search agent that reads the page receives the full provenance chain on the first request — the identity of the record, the complete measurement snapshot frozen at mint time, and a recursive edge layer that accumulates graph connections as the corpus grows.
This specification defines the structure, required fields, and behavioral constraints for Root-LD documents at version 2.0. It covers the three core layer types (rld:Anchor, rld:Body, rld:Recursive), the Double Helix merge type (rld:DoubleHelixRootLD), the namespace declaration, and the Constitutional Laws that govern every implementation decision.
The W3C JSON-LD 1.1 specification — the underlying serialization format — is published at w3.org/TR/json-ld11. Root-LD is a profile of JSON-LD 1.1, fully conformant with it, adding structured provenance conventions through the rld: namespace.
The Root-LD block is placed in the <head> of the web page as a <script type="application/ld+json"> element. This placement ensures that any system performing a single HTTP request to the page URL — including AI crawlers, search indexers, and language model retrieval agents — receives the complete provenance record without requiring DOM traversal or body parsing.
The Body layer contains measurements, not descriptions. Every field in rld:Body traces to a specific pipeline stage, a specific source URL, and a specific timestamp. The same input to the pipeline produces the same output. This determinism is what makes the provenance record auditable.
The Recursive layer is initialized empty at mint and grows through accumulated corpus passes. Records are appended; existing records are not modified. This append-only constraint is the basis for the chain of custody guarantee. Constitutional Law II governs this constraint: the timestamp is the record.
Every edge declared in the Recursive layer carries two provenance pointers: the internal Boise Standard knowledge graph node URL (rld:bsNodeUrl) and the external authoritative source URL (rld:externalUrl), which may be a Wikidata entity, a schema.org type, or another authoritative registry. This dual provenance enables any crawler or human browser to enter the knowledge graph at any edge and verify the connection against an independent source.
The current published version of this specification is 2.0. Version 1.0 records carry "rld:specVersion": "1.0" in the Anchor layer. Version 2.0 records carry "rld:specVersion": "2.0". The Double Helix record type (rld:DoubleHelixRootLD) was introduced in version 2.0. All version 1.0 records remain conformant with version 2.0 as the crawl strand of a Double Helix document.
Every Root-LD document at version 2.0 contains exactly three layers as named properties of the root rld:RootLD object. The layers are rld:anchor, rld:body, and rld:recursive. All three are present in every conforming document. The Recursive layer is empty at mint; its presence is required.
rld:AnchorThe Anchor layer is immutable. It is sealed at the moment of minting and is not modified by any subsequent pipeline pass, corpus pass, or verification event. It contains the permanent identity of the record: UUID, Federation ID, content hash of the extracted text corpus, primary source URL, generation method, pipeline version, mint timestamp, and the manifest index.
The manifest index (rld:manifest) is an orientation block for AI crawlers. It declares what data is present in the record, the coverage score and label, the schema type inventory, and a table of contents (rld:toc) with permanent fragment-identifier URLs pointing to every sub-record within the document. An AI system reading only the Anchor layer knows precisely what the record contains and where each section lives.
The link pod (rld:linkPod) provides direct URLs to the canonical source, the manifest JSON, the Root-LD JSON, the official schema.org vocabulary, and the Boise Standard vocabulary. All URLs are permanent and resolvable.
Required fields — rld:Anchor:
| Field | Type | Req | Description |
|---|---|---|---|
| rld:uuid | string | R | Universally unique identifier for this mint event |
| rld:federationId | string | R | Stable short identifier — bs-{8hex}, persistent across pipeline versions |
| rld:contentHash | string | R | SHA-256 or hex hash of the full extracted text corpus at mint time |
| rld:primarySource | URL | R | Canonical URL of the entity being described |
| rld:sourceVerified | boolean | R | Whether the primary source responded with a 200 at crawl time |
| rld:generationMethod | string | R | Pipeline method identifier (e.g. crawl_extract_v1, bst_verification_pipeline_v1) |
| rld:specVersion | string | R | "1.0" or "2.0" — governs document structure expectations |
| rld:mintedAt | ISO 8601 UTC | R | Timestamp of mint event — immutable after seal |
| rld:sequence | integer | R | 0 = crawl mint, 1 = first enrichment, 2 = verification mint, N = subsequent |
| rld:pipeline | string | R | Pipeline version string (e.g. refinery-boise-v1.0.0, BST-PIPELINE-1.0.0) |
| rld:domainSignature | string | R | Bare domain of the entity (e.g. 10barrel.com) |
| rld:manifest | object | R | Manifest index — inventory of record contents with toc and linkPod |
| rld:queuedAt | ISO 8601 UTC | C | When the entity entered the pipeline queue; omitted for on-demand mints |
| rld:immutable | boolean | C | Explicit immutability declaration — required on verification anchors |
| rld:certificateId | string | C | Certificate ID — present on verified records only (e.g. BS-2026-000001) |
R = Required. C = Conditional (present when applicable).
rld:BodyThe Body layer is a complete measurement snapshot of the entity, frozen at the moment of minting. A new mint produces a new Body; the existing Body is preserved as the crawl strand of a Double Helix document. The Body contains nine named subsections covering identity, SEO, schema graph, semantic signal, topology fingerprint, ratio signals, navigation, provenance metadata, and the atomic answer.
The topology fingerprint (bs:topologyFingerprint) is a six-layer pre-linguistic shape measurement of the full extracted text corpus. It produces twelve deterministic values — type-token ratio, hapax ratio, repetition score, sentence skewness, kurtosis, punctuation entropy, capital token ratio, and related measurements — sealed with a SHA-256 hash of the extracted text. The same input always produces the same output.
The atomic answer (bs:atomicAnswer) is a machine-generated summary grounded in the measured fields of the record. It carries the model identifier, the generation timestamp, and a SHA-256 hash of the input. It is the primary field read by AI retrieval systems on first contact with a Boise Standard entity page.
Named subsections of rld:Body:
bs:identity — domain, slug, TLD, canonical URL, status code, SSL, response timebs:seo — title, H1, meta description, canonical URL, language, word count, OG tagsbs:schema — declared types, block count, property count, coverage score, gap list, type neighborhoodbs:semantic — top 40 words by frequency after stop-word removal; no classification; no inferencebs:topologyFingerprint — six-layer shape measurement; SHA-256 sealed; deterministicbs:ratioSignals — eight deterministic ratio measurements tracing to specific pipeline stagesbs:navigation — URL inventory, dead links, external TLD diversity, interior page countbs:provenance — security label, HTTPS enforcement, HSTS, CSP, freshness label, tech stackbs:atomicAnswer — machine-generated summary; model stamp; input hash; conditional on generationbs:pipeline — per-stage timing, extractor version, attempt count, braid nameOn verified records, the Body subsections expand to include owner-submitted data: bs:verifiedIdentity, bs:verifiedServices, bs:verifiedVoice, bs:verifiedCredentials, bs:schemaVerification, bs:boundaryDeclarations, bs:pipelineStamps, and bs:negativeSpaceDeclarations. Every verified field carries "source": "entity_owner_submission" and a verifiedAt timestamp.
rld:RecursiveThe Recursive layer is initialized at mint with zero edges. This is the correct initial state. The Recursive layer is the future tense of every entity record — the accumulation point for graph connections that emerge from corpus passes as the knowledge base grows.
Edges populate the Recursive layer through four mechanisms: common edges (Constitutional Law V — entities sharing a schema type neighborhood), uncommon edges (Constitutional Law VI — signals present in one topology cluster and absent in another), jurisdictional edges (entities sharing a verified geographic or regulatory boundary), and supply chain edges (entities connected through declared product and service relationships).
Every edge in the Recursive layer carries dual provenance: rld:bsNodeUrl (the internal Boise Standard knowledge graph node URL) and rld:externalUrl (the external authoritative source URL — Wikidata, schema.org, or an authoritative registry). AI crawlers and human browsers may enter the knowledge graph at any edge. Constitutional Law VII governs this: any point in the Root-LD is an entry point into the torus.
The Recursive layer specification — covering edge types, accumulation protocol, and the bidirectional corpus pass — is published at recursive-ld.org. The path https://root-ld.org/spec/1.0/recursive referenced in deployed records resolves to that specification.
Required fields — rld:Recursive:
rld:edgeCountrld:edges. Zero at mint.rld:edgesrld:appendedAtrld:mintNoterld:recursiveSpecUrlhttps://root-ld.org/spec/1.0/recursive.rld:initializedAtRequired fields — each edge object:
rld:edgeTypegeographic | industry | schema_type | registry | supply_chainrld:labelrld:bsNodeUrlrld:externalUrlrld:stampedAtrld:pipelinePasspass4_edges)rld:dualProvenanceRoot-LD documents use the following namespace prefixes. All prefixes are declared in the @context object of every conforming Root-LD document.
| Prefix | Namespace URI | Purpose |
|---|---|---|
rld: | https://root-ld.org/ | Root-LD types and properties |
bs: | https://boisestandard.org/vocab# | Boise Standard measurement and verification properties |
schema: | https://schema.org/ | schema.org vocabulary |
xsd: | http://www.w3.org/2001/XMLSchema# | XML Schema datatypes |
| Type | URI | Description |
|---|---|---|
rld:RootLD | https://root-ld.org/RootLD | A single Root-LD document — three layers, one entity, one mint |
rld:Anchor | https://root-ld.org/Anchor | Immutable provenance core — sealed at mint |
rld:Body | https://root-ld.org/Body | Complete measurement snapshot — frozen at mint |
rld:Recursive | https://root-ld.org/Recursive | Append-only edge accumulation layer |
rld:DoubleHelixRootLD | https://root-ld.org/DoubleHelixRootLD | Merged document containing both crawl and verified records |
@context BlockEvery conforming Root-LD document declares the following context. Implementations may add additional prefixes; the four below are required.
{
"@context": {
"@vocab": "https://root-ld.org/spec/1.0/",
"schema": "https://schema.org/",
"bs": "https://boisestandard.org/vocab#",
"rld": "https://root-ld.org/",
"xsd": "http://www.w3.org/2001/XMLSchema#"
},
"@type": "rld:RootLD",
"rld:specVersion": "2.0",
"rld:anchor": { /* rld:Anchor object */ },
"rld:body": { /* rld:Body object */ },
"rld:recursive": { /* rld:Recursive object */ }
}
The bs: namespace covers measurement and verification properties specific to the Boise Standard pipeline. The full machine-readable vocabulary is published at boisestandard.org/schema/official_vocab.json. The human-readable vocabulary index is at boisestandard.org/schema/official_vocab/.
Frequently used bs: properties:
| Property | Type | Description |
|---|---|---|
bs:federationId | string | Short persistent identifier, format bs-{8hex} |
bs:verificationStatus | string | verified | unverified | pending |
bs:coverageScore | float | GDR Weighted Coverage Score, 0.0–1.0 |
bs:coverageGrade | string | A+ | A | B | C | D | F |
bs:topologyFingerprint | object | Six-layer pre-linguistic shape measurement |
bs:atomicAnswer | object | Machine-generated summary; model stamp; input hash |
bs:pipelineStamps | object | Per-pass timestamps and descriptions |
bs:boundaryDeclarations | object | Owner-declared service boundaries — Constitutional Law VI |
The Constitutional Laws are implemented constraints. They govern every pipeline decision, every output field, and every display choice in every conforming Root-LD deployment. They are in the code. They are in the output. When a label appears on a Boise Standard entity page reading "Law I — Provenance" or "Law VI — Negative Space," that label identifies the specific constraint governing what is shown and why it is shown that way.
Every field in a Root-LD record traces to a specific origin: a crawl, a measurement, a model call, or a human-submitted verification. The source URL, fetch timestamp, pipeline stage, and generation method are recorded alongside every output. Source schema blocks are preserved verbatim — not summarized, not interpreted, not cleaned. The raw extraction is the provenance.
Every measurement is timestamped in ISO 8601 UTC. The record describes the entity as it was at the moment of the crawl, precisely identified. The mint timestamp, the analysis timestamp, the atomic answer generation timestamp — all are ISO 8601 UTC, all are in the Anchor layer. The freshness label is a deterministic output of the elapsed time since the mint date. The timestamp is the record.
The semantic words panel — the top 40 words by frequency from the full crawled corpus — carries no classification, no editorial label, and no dictionary matching. The words are measured. The pipeline records what the entity chose to say about itself, ranked by frequency. The interpretation belongs to whoever reads the record. This applies equally to the topology fingerprint: the values are measurements. The pipeline measures shape; meaning belongs to the reader.
What two entities share is itself a node in the graph. Two entities sharing a schema type neighborhood share a common edge. Two entities sharing a topology cluster share a common edge. Two entities sharing a jurisdictional boundary share a common edge. Common edges accumulate through corpus passes and populate the Recursive layer as the graph matures. The graph builds itself through measurement, not through editorial decision.
The absence of connection is a pattern waiting to be read. The structural negative type space — the schema.org branches that have no connection to the entity's declared type — is first-class data. An HVACBusiness with no structural connection to the BioChemEntity branch is more precisely declared because of that absence. Uncommon edges — the signals present in one entity and absent in another — are how the Recursive layer builds the graph's most precise distinctions.
The graph reads itself. The record builds the record. The Recursive layer is empty at mint because it requires a corpus to make connections meaningful. When corpus depth is sufficient, the pipeline passes over all records, identifies common and uncommon edges, and appends them to the Recursive layer of each entity they connect. Any point in the Root-LD document is an entry point into the knowledge graph. The output feeds the next input. The graph builds itself.
The Double Helix is the append-only provenance chain for any data entity. Every time the pipeline touches a record, it does not overwrite — it adds a new strand. The result is a merged document of type rld:DoubleHelixRootLD containing two complete Root-LD records and a summary block for AI crawlers.
rld:crawlRecord — Helix Position 1rld:verifiedRecord — Helix Position 2rld:helixSummaryWhen no prior crawl record exists for an entity — when verification is the first pipeline contact — the Double Helix document is produced with rld:crawlRecord set to null and "rld:helixPosition": "GENESIS" declared in the summary. The verification record is both the crawl record and the verified record. The helix starts at the moment of first mint.
The delta between what the crawl record measured and what the verified record declares is itself machine-readable. The crawl Body contains the unverified atomic answer — what AI inferred. The verified Body contains owner-submitted corrections, boundary declarations, and verified measurements. The gap between them is the provenance delta. It is readable by any system that can traverse a JSON-LD document. The fine-tunable dataset (finetune_dataset.jsonl) produced by pass5g makes this delta explicit as training-compatible QA pairs, aligned to the EU AI Act Article 10 data governance requirements and the NIST Risk Management Framework.
The following is the live Double Helix summary block from the Boise Standard verified entity record. The full Double Helix document is published at boisestandard.org/web/boisestandard-org/root-ld.json. The verified profile is at boisestandard.org/web/boisestandard-org.
{
"domain": "boisestandard.org",
"certificationId": "BS-2026-000001",
"verificationHash": "893E015383BE489E",
"verifiedAt": "2026-06-27T02:50:06.144487+00:00",
"crawlMintedAt": "",
"federationId": "bs-893e0153",
"totalVerifiedProps": 43,
"totalDeclaredEdges": 37,
"pipelinePassCount": 13,
"hasDoubleHelix": false,
"coverageGrade": "A",
"registryUrl": "https://boisestandard.org",
"standard": "The Provenance Standard for the Machine-Readable Web"
}
Boise Standard is the originating registry and first regional-scale deployment of Root-LD. The reference implementation covers the full pipeline — crawl extraction, topology measurement, schema analysis, atomic answer generation, Root-LD assembly, and the Double Helix merge on verification. All source behavior described in this specification derives from the production implementation.
A live verified entity record — the genesis record for the Boise Standard registry itself — is published at boisestandard.org/web/boisestandard-org. An unverified entity record demonstrating the crawl-only state is at boisestandard.org/web/10barrel-com. The standard documentation page is at boisestandard.org/standard.
{
"@type": "rld:Anchor",
"rld:uuid": "bsv-893e0153-20260627T025006Z",
"rld:federationId": "bs-893e0153",
"rld:contentHash": "893E015383BE489E",
"rld:certificateId": "BS-2026-000001",
"rld:primarySource": "https://boisestandard.org",
"rld:verifiedSource": "https://boisestandard.org/web/boisestandard-org",
"rld:sourceVerified": true,
"rld:generationMethod": "bst_verification_pipeline_v1",
"rld:specVersion": "2.0",
"rld:mintedAt": "2026-06-27T02:50:06.144487+00:00",
"rld:verifiedAt": "2026-06-27T02:50:06.144487+00:00",
"rld:sequence": 2,
"rld:pipeline": "BST-PIPELINE-1.0.0",
"rld:domainSignature": "boisestandard.org",
"rld:immutable": true,
"rld:immutableNote": "This anchor is immutable after verification mint. Certificate ID and verification hash are the permanent identifiers for this verification event. Constitutional Law II: the timestamp is the record.",
"rld:manifest": {
"bs:verificationStatus": "verified",
"bs:certificateId": "BS-2026-000001",
"bs:verificationHash": "893E015383BE489E",
"bs:pipelineVersion": "BST-PIPELINE-1.0.0",
"bs:schemaPropsCount": 30,
"bs:bsPropsCount": 13,
"bs:totalProps": 43,
"bs:totalEdgesDeclared": 37,
"bs:geographicEdges": 19,
"bs:industryEdges": 6,
"bs:schemaTypeEdges": 2,
"bs:registryEdges": 1,
"bs:negativeSpcDecls": 9,
"bs:passCount": 13,
"bs:hasDoubleHelix": true,
"bs:coverageGrade": "A",
"rld:toc": {
"verifiedAnchor": "https://boisestandard.org/web/boisestandard-org#root-ld-verified/anchor",
"verifiedBody": "https://boisestandard.org/web/boisestandard-org#root-ld-verified/body",
"verifiedRecursive": "https://boisestandard.org/web/boisestandard-org#root-ld-verified/recursive",
"crawlRecord": "https://boisestandard.org/web/boisestandard-org#root-ld-crawl",
"profileUrl": "https://boisestandard.org/web/boisestandard-org",
"rootLdUrl": "https://boisestandard.org/web/boisestandard-org/root-ld.json",
"certificateUrl": "https://boisestandard.org/web/boisestandard-org/certificate.html"
}
}
}
{
"@type": "rld:Recursive",
"@id": "https://boisestandard.org/web/boisestandard-org#root-ld-verified/recursive",
"rld:edgeCount": 28,
"rld:recursiveType": "verification_edges",
"rld:recursiveNote": "Layer 3 populated at verification. 28 declared graph edges with dual provenance. Each edge carries the internal Boise Standard node URL AND the external authoritative source URL. AI crawlers and human browsers may enter the Boise Standard Knowledge Graph at any edge. Constitutional Law VII — The Torus: any point is an entry point.",
"rld:edges": [
{
"rld:edgeType": "geographic",
"rld:label": "United States",
"rld:bsNodeUrl": null,
"rld:externalUrl": "https://www.wikidata.org/wiki/Q30",
"rld:wikidataQid": "Q30",
"rld:stampedAt": "2026-06-27T02:50:06.144487+00:00",
"rld:pipelinePass": "pass4_edges",
"rld:dualProvenance": true,
"rld:note": "Dual provenance: internal BS graph node URL + external authoritative source URL. Constitutional Law V: what two nodes share is itself a node."
},
{
"rld:edgeType": "industry",
"rld:label": "Digital Twin",
"rld:externalUrl": "https://www.wikidata.org/wiki/Q25099680",
"rld:wikidataQid": "Q25099680",
"rld:stampedAt": "2026-06-27T02:50:06.144487+00:00",
"rld:pipelinePass": "pass4_edges",
"rld:dualProvenance": true
},
{
"rld:edgeType": "schema_type",
"rld:label": "ProfessionalService",
"rld:externalUrl": "https://schema.org/ProfessionalService",
"rld:stampedAt": "2026-06-27T02:50:06.144487+00:00",
"rld:pipelinePass": "pass4_edges",
"rld:dualProvenance": true
},
{
"rld:edgeType": "registry",
"rld:label": "Boise Standard Registry",
"rld:externalUrl": "https://boisestandard.org",
"rld:stampedAt": "2026-06-27T02:50:06.144487+00:00",
"rld:pipelinePass": "pass4_edges",
"rld:dualProvenance": true
}
],
"rld:appendedAt": ["2026-06-27T02:50:06.144487+00:00"],
"rld:recursiveSpecUrl": "https://root-ld.org/spec/1.0/recursive",
"rld:initializedAt": "2026-06-27T02:50:06.144487+00:00"
}