top of page

SAFETY

Guardrails

Guardrails inside Kosmoy LLM Gateway let organizations define and enforce policies on every prompt and response, powered by Kosmoy proprietary ultra-efficient Small Language Models (SLMs) or any foundation LLM you choose.

Enforced at gateway level, Kosmoy guardrails satisfy GDPR and EU AI Act requirements while preserving content integrity. Dedicated and proprietary Kosmoy Small Language Models (SLMs) scan for toxicity, Personal Identifiable Information (PII), violation of the EU AI Act and prompt injections attempts — at a fraction of the cost of full-size models. You can also create custom guardrails with your own keywords or regexes and run them on any foundation LLM of your choice.

SAFETY

BENEFITS

Gateway-level Guadrails

FINE-TUNED SLM FOR EFFICIENT GUADRAILS

Purpose-built Small Language Models (SMLs) for Toxicity, Personal Identifiable Information (PII), EU AI Act compliance and Prompt-Injection scan every prompt at the Gateway level—blocking profanity, hate speech, sensitive-data leaks, EU high-risk uses and jailbreak exploits. Because filtering runs inside the Gateway, every chatbot or API inherits the same shield with zero duplicated code and no added latency.

okGuardrails.jpg

CUSTOMIZABLE RULES

Custom guardrails provide you with total flexibility to implement your policies: drag-and-drop your own terms, regexes or external compliance endpoints into the policy editor, then decide which Small Language Model (SLM) or foundation Large Language Model (LLM) enforces them.

CUSTOMIZABLE RULES
SAFETY AND SECURITY

SAFETY AND SECURITY

Guardrails Categories

Each guardrail category ships with a Kosmoy-tuned SLM out-of-the-box, and can be paired with any other LLM or fully custom rule—one click in the Gateway UI.

  • Toxic Content Guardrails filter out messages containing harmful or inappropriate content, such as violence, self-harm, explicit material, drugs, weapons, and illegal activities. These guardrails help maintain a safe and professional environment by ensuring that all communications stay free of toxic or dangerous content.

  • PII Guardrails ensure compliance with GDPR by preventing the sharing of personally identifiable information (PII), including names, dates of birth, US Social Security numbers, addresses, location data, email addresses, and credit card information. These safeguards protect sensitive data and ensure your AI applications adhere to stringent privacy regulations.

  • Malicious Code and URL Guardrails protect your GenAI systems by blocking the sharing of harmful code, suspicious URLs, and phishing links. These guardrails help prevent malicious activities such as code injection, malware distribution, and unauthorized access, ensuring your AI applications remain secure and free from external threats.

  • Hallucination Guardrails are designed to detect and flag responses from the LLM that are likely fabrications or inaccuracies. By identifying content that deviates from factual data or trusted sources, these guardrails help ensure the reliability of your AI-generated responses, reducing the risk of misleading or incorrect information.

  • A purpose-tuned Small Language Model (SLM) flags prompts and responses that fall under the EU AI Act’s prohibited or high-risk categories—such as social scoring or biometric identification—so you can auto-block, log or escalate before the request reaches production.

  • Adversarial Attack Guardrails protect the GenAI system from manipulation attempts, such as context-switching attacks or ‘Do Anything Now’ exploits. These guardrails maintain the integrity of interactions by preventing users from bypassing constraints or tricking the AI into executing unintended actions, ensuring the system stays secure and focused on its intended tasks.

bottom of page