Provenance — verifiable lineage
A tamper-evident mechanism for tracking and verifying when, by whom, and from what inputs a data point, model, or decision was produced. The input layer of verifiable AI and one of Lemma's foundational pillars.
Definition
Provenance captures the relationship graph of "where did this object come from" and "what transformations did it undergo." The conceptual model is W3C PROV (PROV-DM / PROV-O), pairing entities, activities, and agents in time order.
Domain-specific standards extend this. C2PA defines content provenance signatures (capture, edit, AI generation). SLSA defines build provenance for software; SCITT defines transparency logs. AI lacks a unified standard for training data, model, and inference history — which is exactly where verifiable AI enters.
The key distinction is that provenance is a provable history, not merely a log. Log files are mutable after the fact and carry weak evidentiary value. Pinning each stage with a commitment plus a signature is what makes a lineage hold up to third-party scrutiny.
Lemma Oracle implementation
Lemma pins lineage as docHash plus a metadata commitment. docHash covers the document's byte digest; the metadata covers timestamp, author, and the link to the prior stage. The chain collapses to a single hash that a downstream zero-knowledge proof can attest exists.
Combined with selective disclosure, individual attributes from along the chain — "the producer is in the EU," "the data was collected after the regulation took effect" — can be proven without exposing the chain itself. This is how GDPR, trade secret, and state secret constraints coexist with regulatory adherence.
Lemma Civic applies this to public-sector data, Lemma Critical to manufacturing supply-chain parts, Lemma Compliance to customer attributes, and verifiable AI to the document corpora a RAG pipeline cites — all on the same lineage substrate.