RAG-IN-A-BOX

Production RAG without the months of Python.

Ingest documents. Chunk and embed. Retrieve. Govern the answer. Connect your vector DB. Ship in days, not months.

The ingestion pipeline connects sources, extraction, processing, vectorization and vector database storage. The retrieval path connects user query, pre-retrieval, retrieval, post-retrieval and governed answer.RAG-in-a-Box · Ingestion to Governed AnswerOne governance path. From any source to a trusted answer.Ingestion pipelineSOURCESObject storesFile storesSharePoint / Conf.EXTRACTTextTablesImagesGraphsMetadataPROCESSChunking strategiesMetadata enrichmentRecord managerVECTORIZEEmbeddingsDeletion strategiesLoad to vector DBVECTOR DBSnowflakeDatabricksPineconeWeaviateOn-prem optionsRetrieval and answer pathUSER QUERYFrom Kosmoy Chat,custom app, agentPRE-RETRIEVALMulti-query rephraseSelf-queryRETRIEVALLexicalSemanticHybridPOST-RETRIEVALRe-rankingFilteringGOVERNEDANSWERRouted via AI GatewayGuardrailsLoggedSame governance path as every other AI call.Identity and RBAC · Guardrails and policies · Logs and evidence · Cost and observability

Four stages. All pre-built.

Ingestion

  • Connect object stores, file stores, SharePoint, Confluence, Salesforce, ServiceNow
  • Parse documents — text, tables, images, graphs
  • Chunking strategies — max-size, hierarchical, hybrid
  • Standard and custom metadata capture

Vectorization

  • Embeddings (provider-agnostic)
  • Record manager with deletion strategies
  • Load to Snowflake, Databricks, Pinecone, Weaviate, Vertex, pgvector, on-prem

Retrieval

  • Multi-query rephrase, self-query
  • Lexical, semantic, hybrid
  • Re-ranking and filtering
  • Source-document filtering by user permissions

Answer path

  • Routed via the AI Gateway
  • Guardrails enforced
  • Citations, conversation logs, feedback to Insights

Months of custom Python vs days with Kosmoy.

PatternMonths of custom PythonDays with Kosmoy
Document ingestionBuild extractors per formatPre-built
Chunking and embeddingsPick libraries, tune paramsConfigured no-code
Retrieval pipelineBuild, tune, evaluateBuilt-in lexical, semantic, hybrid
Prompt managementHand-rolled versioningVersioned, diffable
GuardrailsRoll-your-own checks per appGateway-enforced
Model fallbackCustom retry logicAlgorithmic router
Cost trackingLogs, scripts, spreadsheetsInsights Dashboard
QA before rolloutAd-hocRecorded sessions

Module questions, answered straight.

Does it work with our existing vector DB?

Yes. RAG-in-a-Box loads to Snowflake, Databricks, Pinecone, Weaviate, Vertex, pgvector, and supported on-prem options. The Kosmoy platform doesn't replace the vector DB; it ingests and queries it.

Can we use private embeddings?

Yes. Embeddings are provider-agnostic — public LLM embeddings, private fine-tuned embeddings, on-prem embeddings. Selected per use case.

How does it handle access control on retrieved chunks?

Source-document filtering applies the user's permissions at retrieval time. A document the user can't read is never returned, even if it matches the query.

What about hybrid search?

Out of the box. Lexical + semantic, configurable weights per use case, with re-ranking on top.

Ship a governed RAG system in days.

Walk through ingestion, vectorization, retrieval, and the governed answer path.