Select Embedding Model for Job
Select Embedding Model for Job
Overview
Flow ID: select-embedding-for-job
Category: Blockify Processing
Estimated Duration: 1 minute
User Role: All Users
Complexity: Simple
Purpose: Choose which embedding model will be used to generate vector representations of processed chunks. Required for creating searchable dataset. Model selection depends on whether creating new dataset or adding to existing one.
Trigger
What initiates this flow:
- User manually initiates
Specific trigger: During job creation, must select embedding model for final dataset creation.
Prerequisites
Before starting, users must have:
- At least one embedding model uploaded
- Job creation form open
- Dataset destination selected (new or existing)
User Intent Analysis
Primary Intent
Select appropriate embedding model for generating vectors that enable semantic search of processed documents.
Secondary Intents
- Match model to dataset requirements
- Optimize search quality vs. speed
- Ensure consistency with existing datasets
Step-by-Step Flow
Main Path A - New Dataset (Happy Path)
Step 1: Locate Embedding Model Selector
- User Action: On job creation form with “Create New Dataset” selected, find embedding model dropdown
- System Response: Dropdown is active and available
- UI Elements Visible:
- Label: “Embedding Model”
- Dropdown selector
- May show currently selected model or “Select model…”
- List of available embedding models
Step 2: Open Dropdown
- User Action: Click embedding model dropdown
- System Response: List of models appears
- UI Elements Visible:
- Dropdown list showing:
- Model names (e.g., “Jina Embeddings”, “BGE-Small”)
- Model types (all show “Embeddings”)
- Current selection highlighted if any
- Dropdown list showing:
Step 3: Select Embedding Model
- User Action: Click desired embedding model
- System Response:
- Dropdown closes
- Selected model appears in field
- This model will be used for all items in new dataset
- UI Elements Visible:
- Selected model name in dropdown
- Model confirmed for use
Final Step: Embedding Model Selected for New Dataset
- Success Indicator: Model selected and shown
- System State Change: Job will use this model for embedding generation
- Next Actions: Continue job creation (upload files, set chunks, start)
Alternative Path B - Existing Dataset
Step 1: View Locked Model
- User Action: When existing dataset selected as target, observe embedding model field
- System Response: Embedding model field is disabled/locked
- UI Elements Visible:
- Embedding model field showing dataset’s existing model
- Field grayed out or marked as read-only
- Cannot change selection
- Tooltip or note: “Model locked to dataset’s embedding model”
Step 2: Verify Model Matches Needs
- User Action: Confirm dataset’s embedding model is acceptable
- System Response: Model name displayed but not editable
- UI Elements Visible: Read-only model name
Final Step: Locked to Dataset’s Model
- Success Indicator: Understand model is predetermined by dataset choice
- System State Change: Job will use dataset’s embedding model
- Next Actions: Continue job creation with locked model
Error States & Recovery
Error 1: No Embedding Models Available
Cause: No embedding models uploaded
User Experience:
- Dropdown empty or shows “No models available”
- Error message: “Embedding model required”
- Cannot proceed with job
Recovery Steps:
- Navigate to Settings
- Upload embedding model (see embedding-model-upload.md)
- Return to job creation
- Model should now be selectable
Error 2: Wrong Model Selected
Cause: User selects model then realizes it’s incompatible
User Experience:
- Job creates with wrong model
- Dataset searches may not work optimally
Recovery Steps:
- Before starting job: Select different model from dropdown
- After job started: Cannot change; must cancel and recreate job
- Prevention: Verify model selection before clicking “Start”
QA Note: No validation error for wrong model; user judgment call on which is “right.”
Pain Points & Friction
Identified Issues:
Cannot Change Model for Existing Dataset
- Impact: Locked in once dataset created
- Frequency: When adding to existing datasets
- Potential Improvement: Explain why upfront, technical documentation
No Guidance on Model Selection
- Impact: Users unsure which embedding model to use
- Frequency: Users with multiple embedding models
- Potential Improvement:
- Recommendations: “Best for general use”, “Fastest”, “Most accurate”
- Model comparison information
- Explain differences between models
Auto-Selection Inconsistent
- Impact: Sometimes auto-selects first model, sometimes doesn’t
- Frequency: New dataset creation
- Potential Improvement:
- Always auto-select if only one model
- Remember last-used model as default
Design Considerations
Following Contextual Design Principles:
Automation Opportunities:
- Auto-select if only one embedding model available
- Remember last-used model as default
- Recommend model based on dataset size/type
Simplification Opportunities:
- Hide selection if only one model exists
- Smart defaults
- Clear “recommended” badge on preferred model
User Trust:
- Clear indication which model will be used
- Locked state prevents accidental changes for existing datasets
- Transparent about model’s role
Cognitive Load:
- Simple dropdown selection
- Don’t require deep understanding of embeddings
- Clear when choice is available vs. locked
Related Flows
- Select Target Dataset for Job - Determines if model is selectable
- Upload Embedding Model - Add models to choose from
- Create New Blockify Job - Parent workflow
- Select Active Embedding Model - Global embedding selection
Technical References
Knowledge Base Sections:
- src/components/blockify-corpus/new-job-screen.js - Embedding model selector
- src/components/ui/model-selection.js - Model selection component
Version History
| Date | Version | Author | Changes |
|---|---|---|---|
| 2025-10-04 | 1.1 | Iternal Technologies | Initial documentation |
Notes
Important Considerations:
- Embedding model is permanent for each dataset
- Cannot change model after dataset created
- Must use same model when adding to existing dataset
- All items in dataset must use same embedding model for search to work
Model Selection Scenarios:
- New Dataset: Can choose any available embedding model
- Existing Dataset: Automatically uses dataset’s model (cannot change)
- No Models Available: Must upload embedding model first
Why Model Cannot Change: Embedding models create specific vector representations. Different models create different vectors, so all items in dataset must use same model for semantic search to work correctly.
Best Practices:
- Choose embedding model carefully for new datasets (permanent)
- Use consistent model across related datasets if possible
- Document which model used with dataset for future reference
- If unsure, use recommended general-purpose model (e.g., Jina)
Common User Questions:
- “Which embedding model should I choose?” - Jina or BGE models good for general use
- “Can I change model later?” - No, permanent for dataset
- “What if I picked wrong model?” - Must create new dataset with correct model
- “Why is model locked for existing dataset?” - Must match for semantic search consistency
- “Do different models give different search results?” - Yes, significantly different