RXP Module
RAG Retrieval Poisoning (RXP) measures whether malicious documents rank highly in semantic retrieval. RXP validates whether poisoned documents are likely to be retrieved when querying a retrieval-augmented generation (RAG) system.What is Retrieval Poisoning?
RAG systems augment language models by retrieving relevant documents from a knowledge base before answering questions. A poisoned document contains malicious instructions (like a jailbreak or exploit) disguised as legitimate content. The vulnerability: If a poisoned document ranks in the top-k retrieval results, the language model may follow malicious instructions. RXP tests whether your knowledge base documents are vulnerable to retrieval poisoning attacks.Optional Dependencies
RXP requires additional dependencies. Install with:Attack Scenario
- Attacker poisons document — Injects malicious instructions into a legitimate-looking document
- Document gets indexed — Poisoned document is embedded and added to knowledge base
- User queries — A user asks a question like “What is our password reset policy?”
- Retrieval happens — RAG system retrieves documents semantically similar to the query
- Poisoned document ranks high — If the poisoned document is in top-k results, it reaches the LLM
- Malicious instruction executes — LLM follows the poisoned instruction instead of the legitimate policy
What RXP Validates
RXP measures retrieval rank effectiveness. For each query in a profile:- Create a poisoned version of a legitimate document
- Embed the poisoned document
- Run the user query against the knowledge base
- Record the rank position of the poisoned document
- Calculate retrieval effectiveness (does it appear in top-k?)
Key Concepts
Domain Profiles
Domain profiles define standard queries for a knowledge domain. For example: HR Policy Profile:Embedding Models
RXP uses local sentence-transformers models to convert text to vectors. Three models are registered by default:- minilm-l6 — MiniLM-L6-v2, 384 dimensions (default)
- minilm-l12 — MiniLM-L12-v2, 384 dimensions
- bge-small — BGE small English v1.5, 384 dimensions
Retrieval Effectiveness
Measured as:- Rank position — Where the poisoned document appears (1 = best, 100+ = not retrieved)
- Relevance score — Cosine similarity between query and poisoned document (0-1)
- Inclusion in top-k — Whether poisoned document is in top-k results
Interpretive Bands
RXP v0.5.0+ maps retrieval rates to severity levels:- Critical — ≥75% of queries retrieve the poisoned document (very high risk)
- High — ≥50% retrieval rate
- Medium — ≥25% retrieval rate
- Low — Below 25% retrieval rate (poisoning ineffective)
Workflow
1. Prepare Knowledge Base
Gather documents for your domain:2. Create Poisoned Document
Embed malicious instructions in a legitimate-looking document:3. Run Validation
4. Interpret Results
5. Mitigate
- Lower-rank results — Poisoning is less effective; current defenses are working
- Higher-rank results — Poisoning succeeds; investigate why the document ranks highly and consider content filtering or switching embedding models
- Different queries — Test with varied questions to confirm robustness across query patterns
- Different models — Try alternative embedding models to evaluate whether the vulnerability is model-specific or systemic
RXP → IPI Pipeline
In the Test Document Ingestion workflow, RXP retrieval results can gate IPI payload generation. When RXP is enabled (via the “Pre-validate retrieval rank with RXP” toggle in the launcher, orrxp_enabled: true in the workflow config), the workflow runs RXP first, then passes retrieval results to IPI as a gate.
How Gating Works
Gating is document-level, not per-query. The decision logic:- RXP disabled — IPI generates all payloads (default behavior, no gating)
- RXP enabled, zero retrieval — no queries retrieved the poison document. IPI skips generation entirely and marks all queries as non-viable
- RXP enabled, any retrieval — at least one query retrieved the poison document. IPI generates all payloads and annotates which queries were non-viable in the result
Graceful Degradation
- If RXP fails (dependency missing, embedding model error, etc.), IPI runs ungated — same as if RXP were disabled
- If RXP optional dependencies are not installed, the launcher shows a warning with the install command
RXP pre-validation uses an ephemeral ChromaDB collection, not the production RAG system. Retrieval results are an approximation — a document that ranks well in the ephemeral collection may behave differently in the target system’s actual retrieval pipeline.
When To Use RXP
- Measuring whether a specific RAG system is vulnerable to document poisoning
- Comparing embedding models to see if poisoning results are consistent across models
- Identifying which document/query pairs are most susceptible to poisoning
- Validating retrieval-layer defenses against adversarial documents
What to Expect
RXP is automated and deterministic. Provide corpus, poison docs, queries, and an embedding model. Get a report showing whether your poison docs are being retrieved and at what rank. No manual iteration required. New domain profiles and embedding models can be added without code changes.Next Steps
- Read the CLI reference for available commands
- Explore models and profiles to choose your testing setup
- Review interpretive bands to understand scoring