top of page

RETRIEVAL-AUGMENTED GENERATION

From raw files to a working chatbot in half a day

No-code ingestion pipelines. Reliable retrievers. Any cloud and on-prem assets.

Stop burning weeks wiring up and coding pipelines and retrievers. Kosmoy RAG-in-a-box solution lets you go from a bunch of files in an object store to a production-grade RAG chatbot in hours — governed, observable, and ready for scale.

RAG in a BOX

NO-CODE INGESTION PIPELINES

What it does: Connect your object stores and file repos, pick an embedding model, choose the chunking strategy, and load into your vector DB—no Python required.

Typical sources: AWS S3, Azure Blob Storage, onprem S3compatible stores; plus productivity/data platforms like OneDrive, Google Drive, Databricks, and Snowflake.

Why it matters: You get a repeatable pipeline from “raw files” to “searchable vectors” that your assistants can use immediately.

LIGHT Ingestion Pipeline

CHUNKING & ENRICHMENT THAT KEEPS MEANING

What it does: Choose Maxsize, Hierarchical, or Hybrid chunking. Capture standard and custom metadata. Track records and deletions to keep the index clean.

Why it matters: Good chunking and metadata drive better recall and fewer hallucinations—especially for long PDFs, tables, and mixed content.

Chunking strategy

DB BRING YOUR OWN STORAGE & VECTOR

What it does: Point Kosmoy at your AWS S3, Azure Blob, or onprem S3 compatible buckets. Load into the vector DBs your team already uses—PostgreSQL/pgvector, Waviate, Pinecone, and others (see Integrations for the full list).

Why it matters: No replatforming. Keep your data where it lives and your stack the way you like it.

own storage and vector

HALF A DAY, NOT HALF A QUARTER

What it does: Prebuilt ingestion, Data Channels, and observability cut custom code by the majority and remove glue work.

Why it matters: Launch the first version in half a day; iterate on retrieval quality and prompts the same week.

RAG-COMPLETE_LIGHT

KOSMOY DATA CHANNEL = RETRIEVERS YOUR CONTROL

What it does: A Data Channel binds an assistant to one or more collections and defines how retrieval works endtoend:

- Preretrieval: query rewriting (multiquery rephrase), selfquery with metadata constraints.

- Retrieval: lexical, semantic (embeddings), or hybrid search.

- Postretrieval: reranking and filtering before context hits the LLM.

Why it matters: You tune retrieval for each use case—precision for policies, breadth for exploratory search, or hybrid for both.

Data channel.png

MANAGED AND OBSERVABLE BY DESIGN

What it does: Every chatbot and Data Channel runs behind the Kosmoy LLM Gateway with gateway level guardrails (toxic content, prompt injection, PII, EU AI Act), spend controls, conversation logging, and routing.

Why it matters: Your central AI team controls allowed models, safety, and budgets once—and it’s enforced everywhere.

AI Gateway
bottom of page