AI GUARDRAILS

Policy enforcement inside the AI path.

Guardrails should not be optional code snippets bolted onto each application. In Kosmoy they run inside the Gateway path, applied the same way across every app, assistant, RAG system, and agent.

Three layers of enforcement: built-in configurable categories, customer-specific custom rules, and fine-tuned small language models for fast pattern checks at scale.

Each guardrail decision is an event. Allow. Block. Redact. Route to human review. Log. Every event lands in the Insights Dashboard and the AI Act dossier — searchable, exportable, and connected to the use case.


Six guardrail types.

Built-in

Toxic content

Hate speech, harassment, violence. Configurable thresholds. Multilingual.

Built-in + fine-tuned SLM

PII detection

Email addresses, phone numbers, government IDs, payment data. Block, redact, or log.

Built-in + fine-tuned SLM

Prompt injection

Detect attempts to override system instructions, exfiltrate prompts, or chain-of-trust break-ins.

Built-in

EU AI Act risk

Article 5 prohibited practices. Article 50 transparency triggers. Custom risk categories per the org.

Custom

Custom regex / list

Internal product names, vendor names, regulated terms, blocklists, allowlists.

Custom

Custom LLM-judge

Bring your own judge prompt and evaluator model. For domain-specific policies that don't fit the built-in categories.


Five outcomes.

Every rule produces one of five decisions: allow, block, redact, review (route to human), log. Configurable per rule, per route, per environment.


Module questions, answered straight.

Are guardrails optional?

Per use case. Some apps need full enforcement; some need logging-only; some need none. The Gateway lets the platform team configure guardrails per route, per app, per environment.

How fast are the fast-path checks?

Built-in regex and list checks are sub-10ms. Fine-tuned SLM-based guardrails are sub-200ms. Frontier-model judges (when explicitly enabled) are slower and used only where policy demands them.

Can guardrails redact instead of block?

Yes. Each guardrail rule has five outcomes — allow, block, redact, review (route to human), log. Most PII rules in production redact + log; toxic content rules block + log; injection attempts block + log + alert.

What happens to the evidence?

Every guardrail decision is an event with the input, the rule that triggered, the outcome, the model, the user, the timestamp. Events land in the Insights Dashboard and the AI Act dossier.

See guardrails in production.

Walk through built-in, custom, and SLM-based enforcement on real prompts.