Privacy in RAG Is an Architecture Problem

The most important privacy decision for a RAG system is not which encryption scheme you use. It is where your server runs. On-premise deployment and strict tenant isolation solve most of the problem at the architecture level. End-to-end encryption is a useful last mile, but the research shows it does not yet deliver what its proponents claim.

What PRAG actually does

PRAG, a paper from Li, Xu, Cheng and colleagues published in April 2026, proposes an end-to-end privacy-preserving RAG system. The mechanism is CKKS homomorphic encryption, originally described by Cheon, Kim, Kim, and Song, which allows approximate arithmetic on encrypted vectors. The client encrypts document embeddings before upload. The server performs all similarity calculations (clustering, HNSW graph traversal, ranking) directly on ciphertexts. It never sees a plaintext vector. It never holds the decryption key.

This is technically real. The system achieves 72–74% recall on TriviaQA at 100k documents, with retrieval latency of 1.29 seconds in non-interactive mode (PRAG-I) and 7.91 seconds in interactive mode (PRAG-II). The encrypted HNSW graph resists graph-reconstruction attacks down to 1.1% edge recovery when dummy traversals and periodic re-encryption are enabled.

These are not toy numbers. For an academic proof-of-concept, PRAG demonstrates that fully homomorphic retrieval is practical at small scale. The engineering is careful.

What PRAG does not protect

Here is the gap. PRAG encrypts embedding vectors. It does not encrypt the original documents.

The paper’s protocol stores chunk texts locally at the client side (DB_local in their terminology). The server holds encrypted vectors and an encrypted index. After retrieval, the client decrypts the top-k chunk identifiers, fetches the corresponding plaintexts from its local store, and passes them to the LLM.

This means: if you want E2E-encrypted RAG, you keep the documents yourself. You encrypt only the numerical representations that let the server find relevant chunks. The server never sees your data, but it also never processes your data. You have built a remote index over your local filing cabinet.

For the use case the paper addresses, a single client who wants to outsource vector search to an untrusted cloud, this is a coherent design. For the use case that most RAG platforms serve, teams and organizations uploading documents to a shared knowledge base, querying them through a hosted LLM pipeline, it does not map. The documents have to live somewhere the pipeline can read them, or the pipeline cannot do its job.

There are more limitations:

Single-key setting. PRAG assumes one client, one secret key. Multi-tenant deployments, the standard for any hosted RAG platform, would require separate encrypted indices per tenant, each with its own key management. The paper does not address this.
Performance at scale. 1.29 seconds for 100k vectors. A production knowledge platform with millions of chunks per workspace faces multiplicative overhead. Setup time for 100k vectors is already 400 seconds because encrypted K-Means clustering is expensive.
Recall loss. 72–74% versus plaintext baselines. The Chebyshev polynomial approximation that makes non-interactive ranking possible introduces numerical noise. PRAG-II recovers accuracy by adding client roundtrips, at the cost of 6x higher latency and 19x more bandwidth per query.

None of this is a criticism of the paper. PRAG is honest about its scope and its numbers. The criticism is directed at the claim that this approach solves cloud RAG privacy. It solves one narrow version of the problem, under constraints that exclude most real deployments.

The architectural answer

The alternative is to make the architecture decision first. If your RAG platform runs on a server you control, in a jurisdiction you trust, with tenant isolation that prevents cross-access, you have solved the privacy problem for most regulated use cases, without homomorphic encryption.

This is the reasoning behind Enchilada’s deployment model:

On-premise deployment is available for organizations that cannot accept any third-party hosting. The software runs on their infrastructure, under their network policies, with their key management.
Hosted deployment is available for less sensitive workloads, with the same workspace isolation and RBAC, in a Frankfurt data center under EU jurisdiction.
Workspace-level isolation means each tenant’s documents, indices, and graph structures live in separate storage. No shared index. No cross-tenant queries unless explicitly configured.
Node-level access control within a workspace’s knowledge graph gives granular control over who can see which entities and relations.

This does not solve every threat model. An insider at the hosting provider can, in principle, access plaintext data on the server. A subpoena could compel disclosure. These are real risks. But they are the same risks that exist for any database, any file server, any email system running in a data center, and the mitigation for those systems has always been jurisdictional control, contractual guarantees, and audit trails, not homomorphic encryption of the storage layer.

The difference between this approach and the encryption approach is where the trust boundary sits. With architecture-level isolation, you trust the operator of the infrastructure, but you pick the operator, you pick the jurisdiction, and you can replace both. With homomorphic encryption, you trust the mathematics and the implementation of the cryptographic library. Neither trust boundary is zero-width. The question is which one you can verify, change, and audit in practice. For most organizations running RAG in production, the answer is the operator, not the crypto.

The mailbox problem

The use case that makes this concrete: enterprise mailbox integration with RAG. Organizations want to make their email archives queryable, not just searchable by keyword, but answerable by an LLM that understands context, threads, and intent.

Email is among the most sensitive data any organization holds. It contains contract negotiations, personnel matters, legal correspondence, and strategic discussions. Putting a mailbox into a cloud-hosted RAG system means uploading all of that to a server operated by someone else, under a jurisdiction the organization may not control.

PRAG-style encryption would protect the embedding vectors of those emails. The actual email text, the content that matters, would still need to live somewhere the RAG pipeline can access it for chunk delivery to the LLM. You have encrypted the index but not the payload.

The honest answer for this use case is on-premise deployment with strict access controls. The emails stay on infrastructure the organization controls. The knowledge graph that structures them is isolated per workspace. RBAC determines who can query what. The architecture solves the problem at the level where it actually exists.

Why the research still matters

PRAG is not irrelevant. The direction is important for specific threat models: regulated industries that must outsource computation but cannot trust the compute provider. The CKKS approach to encrypted similarity search will improve. Latency will decrease as hardware support for homomorphic operations matures. Multi-key schemes will emerge.

But today, for a knowledge platform serving EU organizations, the practical privacy stack is:

Jurisdiction control: server runs under EU law
Tenant isolation: no shared state between customers
RBAC within tenants: granular access to graph entities
Encrypt-at-rest and TLS-in-transit: standard data protection
Audit logging: every query traceable

Homomorphic encryption of the retrieval index is not on this list, because the threat it addresses, a compromised or malicious cloud operator reading your data, is better mitigated by running on infrastructure you control in the first place.

Encryption doesn’t help if the key sits with the provider.

The same logic applies to the server itself. If you control the server, you control the keys. If you do not control the server, no encryption layer removes the need to trust the operator. Architecture first, cryptography second. This is not a design preference. It is the engineering reality of how RAG systems handle documents today.

What needs to change for E2E to work

If homomorphic encryption becomes fast enough to encrypt not just vectors but the full document retrieval pipeline (chunks, rankings, and LLM context delivery), then the threat model shifts. At that point, you could run a RAG system on untrusted infrastructure without exposing any plaintext. The research would need to solve three problems:

Full-document encryption through the LLM call. The LLM itself would need to operate on encrypted context, or the decryption would need to happen in a trusted execution environment the provider cannot inspect. Neither exists in production today.
Multi-key support for multi-tenant deployments. Each tenant’s data encrypted under their own key, with the server unable to correlate across tenants even at the index level.
Performance at millions-of-chunks scale with sub-second latency. Today’s numbers, 1.3 seconds at 100k, are two to three orders of magnitude from production requirements.

These are hard problems. They will take years. In the meantime, the architectural answer, on-prem where required, hosted in your jurisdiction where acceptable, with strict isolation in both cases, is the honest solution. It does not promise what the research cannot yet deliver. As we argued in Retrieval Is Not Memory, honest boundaries in the architecture produce better engineering than blurry claims. The same principle applies here. Privacy in RAG is not a feature you add. It is a constraint you build for from the start, by deciding where the data lives before you decide how to encrypt it.