Overview - {q-AI}

The import module brings external tool results into qai’s unified findings table. Findings from Garak, PyRIT, SARIF-producing security tools, scored-prompts exports (including qai ipi probe --export), or BIPIA benchmark data are parsed, normalized with taxonomy bridging, and stored alongside native audit and inject findings. Imported findings can then inform inject template selection in the assess workflow.

Supported Formats

Format	Tool	Input	What Gets Extracted
`garak`	Garak	JSONL report	Eval-level summaries — probe, detector, pass rate, taxonomy tags
`pyrit`	PyRIT	JSON conversation export	Final scored results — orchestrator, score value/type, labels
`sarif`	Any SARIF producer	SARIF 2.1.0 JSON	Results with rule metadata, severity, and message text
`scored`	`qai ipi probe --export` or external	Scored-prompts JSON	Per-prompt or aggregate compliance results with canary scoring
`bipia`	BIPIA	CSV benchmark data	Per-row IPI benchmark results with compliance scoring

CLI Usage

# Import a Garak report
qai import report.jsonl --format garak

# Import with target association (enables workflow integration)
qai import report.jsonl --format garak --target my-target-id

# Preview what would be imported without writing to the database
qai import report.jsonl --format garak --dry-run

# Import PyRIT results
qai import conversations.json --format pyrit --target my-target-id

# Import SARIF output from any tool
qai import results.sarif --format sarif

# Import scored-prompts JSON (e.g., from qai ipi probe --export)
qai import probe-results.json --format scored --target my-target-id

# Import BIPIA benchmark CSV
qai import bipia_results.csv --format bipia --target my-target-id

Flags

Flag	Short	Description
`--format`	`-f`	Required. Source format: `garak`, `pyrit`, `sarif`, `scored`, or `bipia`
`--target`	`-t`	Target ID to associate imported findings with. Enables imported findings to inform inject template selection in workflows
`--dry-run`		Parse and display what would be imported without writing to the database

Taxonomy Bridging

External tools use their own vulnerability taxonomies. qai maps these to its internal audit categories where a meaningful equivalent exists. Each mapping has a confidence level:

Confidence	Meaning
`direct`	The external category maps directly to a qai audit category
`adjacent`	Related but not identical — the qai category covers overlapping concerns
`none`	No infrastructure-level equivalent in qai (typically model-level concerns)

Current OWASP LLM Top 10 Mappings

External ID	OWASP LLM Top 10	qai Category	Confidence
LLM01	Prompt Injection	`prompt_injection`	direct
LLM02	Insecure Output Handling	`token_exposure`	adjacent
LLM03	Training Data Poisoning	—	none
LLM04	Model Denial of Service	—	none
LLM05	Supply Chain Vulnerabilities	—	none
LLM06	Sensitive Information Disclosure	`permissions`	adjacent
LLM07	Insecure Plugin Design	—	none
LLM08	Excessive Agency	—	none
LLM09	Overreliance	—	none
LLM10	Model Theft	—	none

Entries with none confidence are preserved in the database with their original taxonomy IDs but don’t map to a qai category. They won’t influence inject template selection (which matches on qai categories), but they’re visible as imported findings.

How Target Association Works

The --target flag connects imported findings to a qai target. When you later run an assess workflow against the same target, the inject adapter queries both native audit findings and imported findings for that target to inform template selection. This means external model-testing results from Garak or PyRIT can guide which inject payloads are prioritized. Without --target, findings are stored but isolated — they won’t inform any workflow.

Provenance

Every import creates a parent run record (module=import) with metadata tracking:

Tool name and version — detected from the input file (Garak version from start_run setup entry; PyRIT version not available in export format; SARIF tool version from runs[].tool.version)
Parser version — qai version at the time of import
Source file checksum — SHA-256 of the imported file for integrity verification

Raw evidence and metadata are stored as evidence records on the import run, preserving the original data alongside the normalized findings.

What Each Parser Expects

Garak

A JSONL file where each line is a JSON object. The parser looks for:

A start_run setup entry (required — validates the file is a Garak report)
eval entries with probe name, detector name, pass/fail counts, and optional owasp_llm taxonomy tags
attempt entries are counted but not imported (stored in raw evidence)

Severity is derived from the detector pass rate: ≤25% pass → Critical, ≤50% → High, ≤75% → Medium, under 100% → Low, 100% → Info.

PyRIT

A JSON file containing an array of conversation objects. Each conversation should include:

A score object with score_value and score_type fields
An orchestrator field identifying the test orchestrator
Optional labels for taxonomy preservation

Severity mapping depends on score_type: boolean true/false scores map to High/Info; numeric scores use threshold bands.

SARIF

A SARIF 2.1.0 JSON file with standard structure. The parser extracts:

Results from runs[].results[] with rule ID, message text, and severity
Rule metadata from runs[].tool.driver.rules[] for descriptions and taxonomy
Tool name and version from runs[].tool.driver

Severity uses security-severity property (CVSS-style thresholds) when available, falling back to the SARIF level field.

Scored-Prompts

A JSON file in the canonical scored-prompts format, as produced by qai ipi probe --export. Supports two modes:

Per-prompt — contains a results array with individual probe entries. Each entry produces one finding with taxonomy bridging through the ipi_probe framework.
Aggregate — no results array, just summary statistics (total_probes, overall_compliance_rate, category_summary). Produces a single aggregate finding.

Required top-level field: model. Severity is derived from compliance rate: ≥75% → Critical, ≥50% → High, ≥25% → Medium, >0% → Low, 0% → Info. Unknown fields in per-prompt entries are passed through to metadata.

BIPIA

A CSV file from the BIPIA benchmark (KDD 2025). Required columns: category, prompt, response, complied. Optional columns: probe_id, description, score, model. Each row produces one finding with taxonomy bridging through the ipi_probe framework. Severity is derived from compliance rate using the same thresholds as the scored parser. When score is missing, it defaults to 1.0 if complied, 0.0 otherwise.

qai maintains best-effort parsers for current versions of Garak and PyRIT. External tool output formats may change between releases. If a parser fails on a newer format version, file an issue.

Next Steps

Inject Overview — How imported findings inform payload selection
Inject Campaigns — Coverage reporting with native and imported findings
Architecture: Core — Shared artifact contract and provenance model

​Supported Formats

​CLI Usage

​Flags

​Taxonomy Bridging

​Current OWASP LLM Top 10 Mappings

​How Target Association Works

​Provenance

​What Each Parser Expects

​Garak

​PyRIT

​SARIF

​Scored-Prompts

​BIPIA

​Next Steps

Supported Formats

CLI Usage

Flags

Taxonomy Bridging

Current OWASP LLM Top 10 Mappings

How Target Association Works

Provenance

What Each Parser Expects

Garak

PyRIT

SARIF

Scored-Prompts

BIPIA

Next Steps