Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.q-uestionable.ai/llms.txt

Use this file to discover all available pages before exploring further.

The RXP (RAG Retrieval Poisoning) module measures whether adversarial documents appear in top-k retrieval results. It validates that a poisoned document ranks highly for target queries, indicating it would reach the LLM context window in a real RAG system. RXP requires optional dependencies: pip install "q-uestionable-ai[rxp]" (sentence-transformers, chromadb).

Module Structure

rxp/
├── validator.py         # validate_retrieval() — core validation logic
├── embedder.py          # Embedding model interface (sentence-transformers)
├── collection.py        # ChromaDB collection management
├── registry.py          # Embedding model registry (list_models, resolve_model)
├── models.py            # CorpusDocument, ValidationResult, QueryResult, ModelConfig
├── profiles/            # Built-in domain profiles (YAML + corpus/poison .txt files)
├── db.py                # RXP-specific DB operations
├── cli.py               # validate, list-models, list-profiles commands
├── adapter.py           # RXPAdapter for orchestrator integration
├── mapper.py            # persist_validation() — bridges to core DB
└── _deps.py             # require_rxp_deps() — lazy import guard

Validation Pipeline

validate_retrieval() is the core function:
  1. Ingest corpus — Load legitimate documents and poison documents into a ChromaDB collection
  2. Embed — Generate embeddings for all documents using the selected model (sentence-transformers)
  3. Query — For each query, retrieve top-k documents by cosine similarity
  4. Measure — Check whether the poison document appears in the top-k results
  5. ReturnValidationResult with retrieval_rate, mean_poison_rank, and per-query details

Key Data Models

CorpusDocumentid, text, source (file path), is_poison (boolean flag) ValidationResultmodel_id, poison_retrievals (count), total_queries, retrieval_rate (float 0–1, where 1.0 = 100%), mean_poison_rank (when retrieved), query_results (list of QueryResult) QueryResultquery (string), poison_retrieved (boolean), poison_rank (int or None)

Domain Profiles

Built-in profiles provide pre-configured corpus documents, poison documents, and test queries for common domains. Each profile is a YAML file under profiles/ referencing .txt files for corpus and poison content. Use qai rxp list-profiles to see available profiles. Use --profile on validate to load one.

Embedding Model Registry

Models are registered in registry.py. Each ModelConfig has an id, name, dimensions, and description. Use qai rxp list-models to see available models. Default model: minilm-l6 (MiniLM-L6-v2, local, no API key needed). Use --model all to compare across all registered models.

Severity Thresholds

The RXP adapter maps retrieval rates to severity levels for findings:
Rate (0–1)PercentageSeverity
≥ 0.75≥ 75%CRITICAL
≥ 0.50≥ 50%HIGH
≥ 0.25≥ 25%MEDIUM
< 0.25< 25%LOW
See Interpretive Bands for details.

Adapter

RXPAdapter wraps validate_retrieval() for orchestrator workflows. It runs via asyncio.to_thread() (CPU-bound embedding work), persists results, and emits findings with severity based on the retrieval rate thresholds. Uses best_effort error handling.