Configure Advanced Chunk Settings
Configure Advanced Chunk Settings
Overview
Flow ID: configure-advanced-chunks
Category: Blockify Processing
Estimated Duration: 2-5 minutes
User Role: Power Users / Admins
Complexity: Advanced
Purpose: Allows users to fine-tune exactly how documents are split (“chunked”) before processing. This flow features a real-time preview that visualizes chunk boundaries, helping users optimize for specific document structures (e.g., technical manuals vs. narrative text).
Trigger
What initiates this flow:
- User manually initiates
Specific trigger: Clicking the “Show Advanced Settings” or “Preview Chunks” toggle within the job creation workflow.
Prerequisites
Before starting, users must have:
- At least one file uploaded to the job
- Basic understanding of chunk size (tokens) and overlap
User Intent Analysis
Primary Intent
Verify and optimize how the system splits text to ensure important context isn’t lost at cut-off points.
Secondary Intents
- Debugging poor search results (often caused by bad chunking)
- Adjusting settings for specific file types (e.g., small chunks for FAQs, large for articles)
Step-by-Step Flow
Main Path (Happy Path)
Step 1: Open Advanced Panel
- User Action: Click Advanced Settings / Preview
- System Response: Panel expands, showing a preview area and setting sliders.
- UI Elements Visible:
- File Tabs (one for each uploaded file)
- Chunk Size Slider / Input
- Overlap Slider / Input
- Text Preview Window
Step 2: Select File to Preview
- User Action: Click on a specific file tab (e.g.,
manual.pdf) - System Response: Preview window loads the text of that file.
- Visual Cues: Colored highlights indicating separate chunks (e.g., alternating blue/green backgrounds).
Step 3: Adjust Chunk Size
- User Action: Drag Chunk Size slider (e.g., from 512 to 1024)
- System Response:
- Chunks in preview resize instantly/near-instantly.
- Total number of chunks updates.
- Feedback: Users see if sentences are cut in half or if paragraphs fit nicely.
Step 4: Adjust Overlap
- User Action: Drag Overlap slider (e.g., 0 to 50 tokens)
- System Response: The shared text between chunks increases/decreases.
- Visual Cues: Overlapping regions might be highlighted darker or indicated by markers.
Step 5: Confirm Settings
- User Action: Collapse panel or proceed with job
- System Response: Settings are applied to ALL files in the current job (unless per-file settings are supported).
Error States & Recovery
Error 1: Preview Not Loading
Cause: Text extraction pending or failed
User Experience: “Loading preview…” hangs or shows blank
Recovery: Wait for extraction to complete; if stuck, re-upload file.
Error 2: Settings Too Extreme
Cause: Overlap > Chunk Size or Size < 50 tokens
User Experience: Validation error “Overlap must be smaller than chunk size”
Recovery: System auto-corrects or blocks invalid ranges.
Pain Points & Friction
- Global vs. Local Settings: Users often want different settings for different files in the same job, but settings usually apply globally.
- Workaround: Create separate jobs for different file types.
- Technical Complexity: “Tokens” are abstract to non-technical users.
- Improvement: Show approximate word count (e.g., “512 tokens ≈ 380 words”).
Design Considerations
- Color Coding: Use distinct, accessible colors to differentiate adjacent chunks.
- Performance: Preview reconfiguration should be debounced to prevent lag when dragging sliders.
- Persistence: Remember last-used settings for convenience.
Related Flows
- Configure Basic Chunks - The simplified view
- Create Blockify Job - Parent workflow
Technical References
src/components/blockify-corpus/advanced-settings.jssrc/utils/chunking-preview.js