System Architecture
User Prompt
→
Security Gateway
→
Analysis Engine
→
Rule Matching
Rule Matching
→
Sanitization
→
Safe? → LLM
/
Blocked → Log
Frontend
PHP 8 + Vanilla JS
IBM Plex Sans/Mono · Chart.js 4 · Font Awesome 6
Analysis Engine
PHP Rule Engine
Regex, Keyword, Phrase matching
Database
MySQL (DianaHost)
7 tables · InnoDB · 44 detection rules
Visualization
Chart.js 4
Trend lines, Doughnut, KPI counters
Database Schema
| Table | Purpose | Key Columns |
|---|---|---|
| attack_categories | 5 attack taxonomy categories | name, slug, severity_weight, color |
| rules | Detection rules with patterns | pattern, pattern_type, severity, severity_score |
| prompt_logs | All analyzed prompts | prompt_text, risk_score, verdict |
| rule_matches | Rule-to-log junction | log_id, rule_id, matched_text |
| sanitization_log | Sanitization transformations | original_fragment, sanitized_fragment |
| settings | System configuration | setting_key, setting_value, setting_type |
attack_categories 1──∞ rules
prompt_logs 1──∞ rule_matches
rules 1──∞ rule_matches
prompt_logs 1──∞ sanitization_log
prompt_logs 1──∞ rule_matches
rules 1──∞ rule_matches
prompt_logs 1──∞ sanitization_log
Rule Engine Pipeline
1
Input Validation
Check prompt length, encoding, and emptiness
2
Rule Evaluation
Match against 44 active rules (regex, keyword, phrase)
3
Risk Scoring
Calculate weighted risk: Σ(rule_score × category_weight), cap at 100
4
Sanitization
Strip PII (SSN, CC, email, phone), remove injection tokens
5
Verdict & Logging
Safe (≤30) → Pass |
Suspicious (31–65) → Warn |
Blocked (>65) → Deny + Log to DB
Attack Categories
API Reference
| Method | Endpoint | Description | Parameters |
|---|---|---|---|
| POST | /api/analyze.php | Analyze a prompt for threats | { prompt, source, activity_id, destination_model } |
| GET | /api/rules.php | List all detection rules + categories | ?id=X (optional) |
| POST | /api/rules.php | Create a new detection rule | { name, pattern, severity, category_id, ... } |
| PUT | /api/rules.php?id=X | Update a rule | { name, pattern, ... } |
| DELETE | /api/rules.php?id=X | Delete a rule | — |
| GET | /api/logs.php | Fetch prompt logs with filters | ?verdict=&category=&search=&date_from=&date_to= |
| GET | /api/stats.php | Dashboard statistics + time series | — |
| GET | /api/activities.php | List test bench activity sessions | ?id=X (optional) |
| POST | /api/activities.php | Create a new activity session | { name, description, user_model, destination_model } |
| GET | /api/settings.php?providers=1 | Get AI provider status (configured/unconfigured) | — |
| PUT | /api/settings.php | Update API keys / risk thresholds / model config | { settings: { key: value } } |
Risk Scoring Model
Formula
risk_score = min(100, Σ(rule.severity_score × category.weight))
// Category Weights:
Harmful Intent: 1.80×
Jailbreak: 1.50×
System Override: 1.40×
PII Exposure: 1.30×
Social Engineering: 1.20×
// Category Weights:
Harmful Intent: 1.80×
Jailbreak: 1.50×
System Override: 1.40×
PII Exposure: 1.30×
Social Engineering: 1.20×
Verdict Thresholds
SAFE
No threats detected, prompt allowed
SUSPICIOUS
Potential risk, sanitized and warned
BLOCKED
High-risk content, prompt denied