RXP Module Architecture

The RXP (RAG Retrieval Poisoning) module measures whether adversarial documents appear in top-k retrieval results. It validates that a poisoned document ranks highly for target queries, indicating it would reach the LLM context window in a real RAG system. RXP requires optional dependencies: pip install "q-uestionable-ai[rxp]" (sentence-transformers, chromadb).

Module Structure

rxp/
├── validator.py         # validate_retrieval() — core validation logic
├── embedder.py          # Embedding model interface (sentence-transformers)
├── collection.py        # ChromaDB collection management
├── registry.py          # Embedding model registry (list_models, resolve_model)
├── models.py            # CorpusDocument, ValidationResult, QueryResult, ModelConfig
├── profiles/            # Built-in domain profiles (YAML + corpus/poison .txt files)
├── db.py                # RXP-specific DB operations
├── cli.py               # validate, list-models, list-profiles commands
├── adapter.py           # RXPAdapter for orchestrator integration
├── mapper.py            # persist_validation() — bridges to core DB
└── _deps.py             # require_rxp_deps() — lazy import guard

Validation Pipeline

validate_retrieval() is the core function:

Ingest corpus — Load legitimate documents and poison documents into a ChromaDB collection
Embed — Generate embeddings for all documents using the selected model (sentence-transformers)
Query — For each query, retrieve top-k documents by cosine similarity
Measure — Check whether the poison document appears in the top-k results
Return — ValidationResult with retrieval_rate, mean_poison_rank, and per-query details

Key Data Models

CorpusDocument — id, text, source (file path), is_poison (boolean flag) ValidationResult — model_id, poison_retrievals (count), total_queries, retrieval_rate (float 0–1, where 1.0 = 100%), mean_poison_rank (when retrieved), query_results (list of QueryResult) QueryResult — query (string), poison_retrieved (boolean), poison_rank (int or None)

Domain Profiles

Built-in profiles provide pre-configured corpus documents, poison documents, and test queries for common domains. Each profile is a YAML file under profiles/ referencing .txt files for corpus and poison content. Use qai rxp list-profiles to see available profiles. Use --profile on validate to load one.

Embedding Model Registry

Models are registered in registry.py. Each ModelConfig has an id, name, dimensions, and description. Use qai rxp list-models to see available models. Default model: minilm-l6 (MiniLM-L6-v2, local, no API key needed). Use --model all to compare across all registered models.

Severity Thresholds

The RXP adapter maps retrieval rates to severity levels for findings:

Rate (0–1)	Percentage	Severity
≥ 0.75	≥ 75%	CRITICAL
≥ 0.50	≥ 50%	HIGH
≥ 0.25	≥ 25%	MEDIUM
< 0.25	< 25%	LOW

See Interpretive Bands for details.

Adapter

RXPAdapter wraps validate_retrieval() for orchestrator workflows. It runs via asyncio.to_thread() (CPU-bound embedding work), persists results, and emits findings with severity based on the retrieval rate thresholds. Uses best_effort error handling.

CXP Module Architecture

⌘I

​Module Structure

​Validation Pipeline

​Key Data Models

​Domain Profiles

​Embedding Model Registry

​Severity Thresholds

​Adapter