Definition
Provenance captures the relationship graph of "where did this object come from" and "what transformations did it undergo." The conceptual model is W3C PROV (PROV-DM / PROV-O), pairing entities, activities, and agents in time order.
Domain-specific standards extend this. C2PA defines content provenance signatures (capture, edit, AI generation). SLSA defines build provenance for software; SCITT defines transparency logs. AI lacks a unified standard for training data, model, and inference history — which is exactly where verifiable AI enters.
The key distinction is that provenance is a provable history, not merely a log. Log files are mutable after the fact and carry weak evidentiary value. Pinning each stage with a commitment plus a signature is what makes a lineage hold up to third-party scrutiny.
Lemma implementation
Lemma pins lineage as docHash plus a metadata commitment. docHash covers the document's byte digest; the metadata covers timestamp, author, and the link to the prior stage. The chain collapses to a single hash that a downstream zero-knowledge proof can attest exists.
Combined with selective disclosure, individual attributes from along the chain — "the producer is in the EU," "the data was collected after the regulation took effect" — can be proven without exposing the chain itself. This is how GDPR, trade secret, and state secret constraints coexist with regulatory adherence.
Lemma Civic applies this to public-sector data, Lemma Critical to manufacturing supply-chain parts, Lemma Compliance to customer attributes, and verifiable AI to the document corpora a RAG pipeline cites — all on the same lineage substrate.