SAFETY
Guardrails
Guardrails within Kosmoy Studio are a set of tools and protocols that allow organizations to define and enforce policies on the type of content that can be passed to and from Large Language Models (LLMs).
Guardrails are essential for ensuring compliance with strict data protection regulations like GDPR while upholding content integrity. They automatically filter out profanity, hate speech, and sensitive topics such as weapons or drugs. In addition, Kosmoy’s guardrails offer advanced protection against malicious attacks, including prompt injection, and help reduce LLM hallucinations—keeping your AI systems secure and reliable.
BENEFITS
Key features
CONTENT FILTERING
Automatically screen and block predefined categories of sensitive information, ensuring that personally identifiable information (PII), confidential corporate data, or inappropriate content never compromises your GenAI interactions.
CUSTOMIZABLE RULES
Tailor the guardrail system to meet your specific requirements. Whether it’s compliance with industry-specific regulations or alignment with corporate values, you have the flexibility to set the parameters that best protect your organization and your users.
SAFETY AND SECURITY
Guardrails Types
Kosmoy Studio provides multiple types of guardrails to safeguard your GenAI application:
Toxic Content Guardrails filter out messages containing harmful or inappropriate content, such as violence, self-harm, explicit material, drugs, weapons, and illegal activities. These guardrails help maintain a safe and professional environment by ensuring that all communications stay free of toxic or dangerous content.
PII Guardrails ensure compliance with GDPR by preventing the sharing of personally identifiable information (PII), including names, dates of birth, US Social Security numbers, addresses, location data, email addresses, and credit card information. These safeguards protect sensitive data and ensure your AI applications adhere to stringent privacy regulations.
Malicious Code and URL Guardrails protect your GenAI systems by blocking the sharing of harmful code, suspicious URLs, and phishing links. These guardrails help prevent malicious activities such as code injection, malware distribution, and unauthorized access, ensuring your AI applications remain secure and free from external threats.
Hallucination Guardrails are designed to detect and flag responses from the LLM that are likely fabrications or inaccuracies. By identifying content that deviates from factual data or trusted sources, these guardrails help ensure the reliability of your AI-generated responses, reducing the risk of misleading or incorrect information.
Off-Topic Guardrails ensure that the GenAI system stays focused by preventing it from being asked irrelevant questions or providing off-topic responses. These guardrails keep interactions aligned with the intended context, ensuring that the system delivers accurate and relevant information, while filtering out distractions or unrelated content.
Adversarial Attack Guardrails protect the GenAI system from manipulation attempts, such as context-switching attacks or ‘Do Anything Now’ exploits. These guardrails maintain the integrity of interactions by preventing users from bypassing constraints or tricking the AI into executing unintended actions, ensuring the system stays secure and focused on its intended tasks.