RETRIEVAL-AUGMENTED GENERATION
From raw files to a working chatbot in half a day
No-code ingestion pipelines. Reliable retrievers. Any cloud and on-prem assets.
Stop burning weeks wiring up and coding pipelines and retrievers. Kosmoy RAG-in-a-box solution lets you go from a bunch of files in an object store to a production-grade RAG chatbot in hours — governed, observable, and ready for scale.

NO-CODE INGESTION PIPELINES
What it does: Connect your object stores and file repos, pick an embedding model, choose the chunking strategy, and load into your vector DB—no Python required.
Typical sources: AWS S3, Azure Blob Storage, onprem S3compatible stores; plus productivity/data platforms like OneDrive, Google Drive, Databricks, and Snowflake.
Why it matters: You get a repeatable pipeline from “raw files” to “searchable vectors” that your assistants can use immediately.
CHUNKING & ENRICHMENT THAT KEEPS MEANING
What it does: Choose Maxsize, Hierarchical, or Hybrid chunking. Capture standard and custom metadata. Track records and deletions to keep the index clean.
Why it matters: Good chunking and metadata drive better recall and fewer hallucinations—especially for long PDFs, tables, and mixed content.
DB BRING YOUR OWN STORAGE & VECTOR
What it does: Point Kosmoy at your AWS S3, Azure Blob, or onprem S3compatible buckets. Load into the vector DBs your team already uses—PostgreSQL/pgvector, Waviate, Pinecone, and others (see Integrations for the full list).
Why it matters: No replatforming. Keep your data where it lives and your stack the way you like it.
HALF A DAY, NOT HALF A QUARTER
What it does: Prebuilt ingestion, Data Channels, and observability cut custom code by the majority and remove glue work.
Why it matters: Launch the first version in half a day; iterate on retrieval quality and prompts the same week.
KOSMOY DATA CHANNEL = RETRIEVERS YOUR CONTROL
What it does: A Data Channel binds an assistant to one or more collections and defines how retrieval works endtoend:
- Preretrieval: query rewriting (multiquery rephrase), selfquery with metadata constraints.
- Retrieval: lexical, semantic (embeddings), or hybrid search.
- Postretrieval: reranking and filtering before context hits the LLM.
Why it matters: You tune retrieval for each use case—precision for policies, breadth for exploratory search, or hybrid for both.
MANAGED AND OBSERVABLE BY DESIGN
What it does: Every chatbot and Data Channel runs behind the Kosmoy LLM Gateway with gatewaylevel guardrails (toxic content, promptinjection, PII, EU AI Act), spend controls, conversation logging, and routing.
Why it matters: Your central AI team controls allowed models, safety, and budgets once—and it’s enforced everywhere.