Activate Dataset (Corpus)

Activate Dataset (Corpus)

Overview

Flow ID: corpus-activation
Category: Dataset Management
Estimated Duration: < 5 seconds
User Role: All Users
Complexity: High (Impact)

Purpose: Designate which Knowledge Base (Dataset) is actively used for RAG (Retrieval-Augmented Generation) in the Global Search or Chat context. Only one dataset is typically active at a time per chat session to prevent context bleeding.


Trigger

What initiates this flow:

  • User manually initiates

Specific trigger: “Active Corpus” dropdown in Header or Datasets page > “Make Active”.


Prerequisites

Before starting, users must have:


Step-by-Step Flow

Main Path (Happy Path)

Step 1: Open Selector

  • User Action: Click Dataset dropdown in chat header.
  • System Response: List of available datasets with status (Ready/Processing).

Step 2: Select

  • User Action: Click “Engineering Docs”.
  • System Response:
    • UI updates to show “Engineering Docs” as active context.
    • Future queries will search this index.

Step 3: Deactivate

  • User Action: Select “None (General Chat)”.
  • System Response: RAG disabled. Chat reverts to standard LLM knowledge.

Error States & Recovery

Error 1: Dataset Empty

Cause: User created dataset but jobs failed or are pending.
User Experience: Warning “Dataset is empty. Search will yield no results.”
Recovery: Run a Blockify Job to populate it.


Pain Points & Friction

  1. “Why can’t I search two datasets?”: Current system restriction (1-to-1).
    • Mitigation: Encourage users to merge documents into a single larger dataset if they need cross-domain search.

Design Considerations

  • Persistence: Remember the active dataset per Chat Session (see Pinned Chat).


Technical References

  • src/actions/corpus-actions.js (setActiveCorpus)

Related Articles

View All Dataset Management Articles

Still need help?

Get personalized support with our team for tailored guidance and quick resolution.