TL;DR
Users of an AI service assume the chats they type are handled safely. But in January 2025, Wiz Research found that the AI company DeepSeek had a backend ClickHouse database publicly exposed with no authentication. Anyone could reach it over open ports (8123 / 9000), and it exposed over a million log lines, plaintext chat history, API keys, secret tokens, and backend information. We analyze this through Pillar 03 (Agent Authority Proof) as a structure in which access to the AI service’s sensitive data backend had no authentication or authority verification at all, so network reachability became full retrieval. Wiz disclosed responsibly and DeepSeek remediated within about 30 minutes. The failure primitive is not one misconfiguration — it is that reachability and authorization on the data plane were not separated. It connects to Brief 056 (guessable credential plus missing authorization), 013 (raw PII kept for compliance becomes the breach surface), 036 (provenance/consent of AI training data), and 006 (the verification gap of a credential’s revocation attribute).
Incident overview
- Subject: DeepSeek’s (AI / LLM service) backend infrastructure
- Discoverer: Wiz Research (during a security assessment of external infrastructure)
- Exposure point: A ClickHouse instance publicly exposed with no authentication (
oauth2callback.deepseek.com:9000,dev.deepseek.com:9000). The open ports were 8123 (HTTP) / 9000, and arbitrary SQL queries could be run via a web interface without authentication - What was exposed: Over a million log lines (the
log_streamtable), plaintext user chat history, API keys, secret access tokens, backend operational metadata - Data range: Logs accumulated from at least 2025-01-06 onward
- Response: Wiz disclosed responsibly to DeepSeek; DeepSeek remediated the exposure within about 30 minutes
- The crux: There was no authentication or authority verification for access to the sensitive-data backend; reachability was retrieval
Note: This Brief does not assert the presence or absence of illicit retrieval by third parties; its object of analysis is the structure of absent authentication / authority verification.
Timeline
- 2025-01-06: The exposed database’s logs accumulate from this date onward
- 2025-01: Wiz Research discovers two ClickHouse instances during an external-infrastructure assessment
- 2025-01-29: Wiz discloses responsibly to DeepSeek; DeepSeek remediates within about 30 minutes
How exposure propagates into “reachable = fully retrievable”
This incident stems from a structure in which reachability and authorization to the AI service’s data backend are not separated.
- Unauthenticated external reach: The ClickHouse instance was open externally on public ports (8123 / 9000), reachable over the network without authentication
- Absence of authority verification: For the party that reached it, there was no verification of access authority to the data at all, allowing near-full-control operations
- Full retrieval of sensitive data: Plaintext chat history, API keys, secret tokens, and backend information could be retrieved without further verification. Reachability becomes full retrieval
- Chain of secondary risk: Exposed API keys and tokens can become the starting point for lateral movement and privilege escalation to other systems
Structural analysis
This incident belongs to the identity-auth category under Pillar 03 (Agent Authority Proof). The central failure primitive is that for the AI service’s sensitive-data backend, the accessing party’s authentication and authority attributes go unverified, so network reachability connects directly to full retrieval. As secondary we note attribute-proof-bypass (bypass of access-authority verification) and data-provenance (handling of chat history as sensitive data).
Brief 036 (PII in public training data) addresses the provenance/consent layer of data, but this case differs in cross-section: it is the operational data of a running AI service that was exposed. It shares the “absence of authority verification” primitive with Brief 056 (McHire), but whereas McHire was a guessable credential plus IDOR, DeepSeek is the more fundamental absence of authentication itself. It illustrates a pattern in which, as AI companies prioritize scale, data-backend reach control and authorization are exposed without being separated.
The gap between detection and proof
Here the detection chain — Wiz Research’s external scan and responsible disclosure, DeepSeek’s prompt remediation — functioned, and the exposure was fixed before evidence of exploitation spread. It is an example of continuous external attack-surface monitoring working, and this Brief does not deny its role.
But the problem is that even when an external scan can “detect” the exposure, that operates after the fact, after the exposure already exists; it is not a layer that proves, at the moment of access, “is this access based on legitimate authority.” On a backend with no authentication, there is simply no means to distinguish whether the party that reached it is legitimate. “The port was open / not open” is an object of detection, but not proof of “legitimate access.”
At present, in AI-service data backends, examples abound where reach control (network) and authorization (who can access what) are operated without being separated. Pre-execution attestation places, ahead of the data-backend access path, an attribute proof that “the party legitimately holds the authority for this scope,” structurally separating reachability from authorization. Detection (external scans, disclosure) contributes to shrinking harm, while pre-execution attestation (authority verification at access time) contributes to independently verifying authorization — each complementary. For verifying independently before the action see “Proof-as-Auth: Sign In Without Ever Sending Your Key” (Lemma, 2026-05); for the detection-and-proof thesis see “The Last Layer Left for Cyber Defense in the Age of AI” (Lemma, 2026-05).
Response and industry trends
- Discovery and fix: Following Wiz Research’s responsible disclosure, DeepSeek remediated the exposure within about 30 minutes
- Industry point: In a fast-growing AI service, the basic operational-security fact that a backend for extremely sensitive data such as chat history could be exposed with no authentication became a focal point again
- Secondary risk: Because exposed API keys and secret tokens can be a starting point for lateral movement and privilege escalation, the importance of secret management and rotation was re-recognized
- Geopolitical / regulatory context: Overlapping with debates over national regulation and usage restrictions around DeepSeek, it heightened interest in AI-service data protection and governance
A picture in which reachability and authorization to the sensitive-data backend are not separated is not one vendor’s misconfiguration; it remains an operational-security issue for AI services as a whole that prioritize scale.
Lemma’s analysis
Against the gap DeepSeek exposed — access to the AI service’s sensitive-data backend with no authentication or authority verification, where reachability connects directly to full retrieval — Lemma proposes a design that fixes the basis for access as an independently verifiable cryptographic proof at that moment.
- Pre-execution attestation of authority attributes: Before accessing the data backend, prove as an independently verifiable attribute that “the party legitimately holds the authority for this scope.” Unauthenticated network reach alone does not pass
- Separating reachability from authorization: Structurally separate network reachability from authorization to the data, so an open port does not immediately become full retrieval
- Handling of secrets: Do not hold or expose API keys and tokens in plaintext; move to proof-based authorization (send a proof, not the key)
- Selective disclosure: Prove only that “the access was within the scope of authority,” without exposing the sensitive data itself, such as chat history
Proof fixed at the moment of action functions as evidence that can be independently verified later — without disclosing the sensitive data — when asked “was this access legitimate.” Detection (external scans, disclosure) contributes to shrinking harm, while pre-execution attestation (authority verification at access time) contributes to independently verifying authorization — each complementary. For the design and scope see Pillar 03 — Agent Authority Proof and Seal (send a proof, not the key).
Sources
- Wiz Research: “Wiz Research Uncovers Exposed DeepSeek Database Leaking Sensitive Information, Including Chat History” (primary — the original account of the discovery; 2025-01) — https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepseek-database-leak
- BleepingComputer: “DeepSeek exposes database with over 1 million chat records” (ports, exposed content, remediation) — https://www.bleepingcomputer.com/news/security/deepseek-exposes-database-with-over-1-million-chat-records/
- The Register: “DeepSeek database left open, exposing sensitive info” (2025-01-30; timeline) — https://www.theregister.com/2025/01/30/deepseek_database_left_open/
About Brief distribution
The Lemma Critical Brief is a threat-intelligence brief published by Lemma. This material is a structured analysis of public information and is not an audit, diagnosis, or recommendation for any specific organization. If you use it as a reference for decision-making, please consult your Lemma Critical contact directly.
Discovery Call → Whitepaper → ✉️ Newsletter →
(c) 2026 FRAME00, INC. — Built for decisions that matter.