Content Identifier — multiformats
A self-describing content-addressed identifier. multihash, multicodec, and multibase together encode the hash algorithm, the data format, and the string encoding directly into the identifier.
Definition
CID, standardized through multiformats and used heavily across the IPFS/IPLD ecosystem, comes in two versions. CIDv0 is the minimal form (Qm…, base58btc + SHA-256 multihash); CIDv1 layers in a multibase prefix, version byte, multicodec, and multihash for forward extensibility.
Self-description means the identifier itself tells you which hash algorithm and which data format you're holding. Migrating between SHA-256 and SHA-3, or between JSON and CBOR, doesn't require touching the identifier format.
Because the identifier is derived from the content (content-addressing), the same bytes always produce the same CID regardless of who uploads them. Equal CIDs guarantee equal bytes.
Lemma Oracle implementation
Lemma references objects in distributed storage — RAG documents, provenance metadata, license files — by CID. docHash is the internal hashing surface; CID is the external interop surface.
Each node in a provenance graph links to its predecessor by CID, preserving the durability and verifiability of the chain at the storage layer.
Inside the ZK circuit, CIDs are reduced to short Poseidon values rather than processed as full strings — keeping interop and efficiency separated.