Activate Dataset (Corpus)
Activate Dataset (Corpus)
Overview
Flow ID: corpus-activation
Category: Dataset Management
Estimated Duration: < 5 seconds
User Role: All Users
Complexity: High (Impact)
Purpose: Designate which Knowledge Base (Dataset) is actively used for RAG (Retrieval-Augmented Generation) in the Global Search or Chat context. Only one dataset is typically active at a time per chat session to prevent context bleeding.
Trigger
What initiates this flow:
- User manually initiates
Specific trigger: “Active Corpus” dropdown in Header or Datasets page > “Make Active”.
Prerequisites
Before starting, users must have:
- At least one processed Dataset (see Create Blockify Job).
Step-by-Step Flow
Main Path (Happy Path)
Step 1: Open Selector
- User Action: Click Dataset dropdown in chat header.
- System Response: List of available datasets with status (Ready/Processing).
Step 2: Select
- User Action: Click “Engineering Docs”.
- System Response:
- UI updates to show “Engineering Docs” as active context.
- Future queries will search this index.
Step 3: Deactivate
- User Action: Select “None (General Chat)”.
- System Response: RAG disabled. Chat reverts to standard LLM knowledge.
Error States & Recovery
Error 1: Dataset Empty
Cause: User created dataset but jobs failed or are pending.
User Experience: Warning “Dataset is empty. Search will yield no results.”
Recovery: Run a Blockify Job to populate it.
Pain Points & Friction
- “Why can’t I search two datasets?”: Current system restriction (1-to-1).
- Mitigation: Encourage users to merge documents into a single larger dataset if they need cross-domain search.
Design Considerations
- Persistence: Remember the active dataset per Chat Session (see Pinned Chat).
Related Flows
Technical References
src/actions/corpus-actions.js(setActiveCorpus)