skillby VAMFI
quality-validation
Systematic validation methodology for ResearchPacks and Implementation Plans. Provides scoring rubrics and quality gates to ensure outputs meet standards before proceeding to next phase. Prevents garbage-in-garbage-out scenarios.
Installs: 0
Used in: 1 repos
Updated: 2d ago
$
npx ai-builder add skill VAMFI/quality-validationInstalls to .claude/skills/quality-validation/
# Quality Validation Skill
This skill provides systematic validation methodology to ensure ResearchPacks and Implementation Plans meet quality standards before proceeding to implementation.
## When Claude Should Use This Skill
Claude will automatically invoke this skill when:
- ResearchPack completed and needs validation before planning
- Implementation Plan completed and needs validation before coding
- User explicitly requests quality check ("validate this", "is this complete?")
- About to proceed to next workflow phase (quality gate trigger)
## Core Principles (BRAHMA Constitution)
1. **Verification over speculation** - Validate with objective criteria
2. **Quality gates** - Don't proceed with bad inputs
3. **Reproducibility** - Same input quality = same score
4. **Explicit defects** - List specific problems, not vague "could be better"
## Validation Targets
### Research Type Detection
Before scoring, detect research type to apply appropriate rubric:
#### Type 1: API/Library Research
**Indicators**:
- Contains API endpoints, function signatures, method calls
- Code examples with specific library imports
- Configuration/setup steps for external dependencies
- Version numbers for libraries/frameworks
**Scoring**: Use API Research Rubric (80+ pass threshold)
#### Type 2: Philosophy Research
**Indicators**:
- Contains themes, principles, patterns, methodologies
- Thematic organization (Theme 1, Theme 2, etc.)
- Cross-source synthesis
- Engineering philosophy or best practices analysis
- Pattern extraction from multiple sources
**Scoring**: Use Philosophy Research Rubric (70+ pass threshold)
**Examples**: Engineering philosophy, architectural patterns, best practices, methodology research
#### Type 3: Pattern Research
**Indicators**:
- Contains code patterns, design patterns, anti-patterns
- Architectural decisions and tradeoffs
- Implementation strategies
- Performance optimization patterns
**Scoring**: Use Pattern Research Rubric (70+ pass threshold)
**Why Different Thresholds?**
- API research is more objective (APIs exist or don't, versions are correct or wrong)
- Philosophy research is more subjective (thematic organization, synthesis quality)
- Philosophy research provides strategic value even if not as "complete" as API docs
### 1. ResearchPack Validation - API/Library Type
**Purpose**: Ensure research is complete, accurate, and actionable before planning
**Validation Rubric for API/Library Research** (100 points total, 80+ pass threshold):
#### Completeness (40 points)
- ✓ Library/API identified with version (10 pts)
- ✓ At least 3 key APIs documented (10 pts)
- ✓ Setup/configuration steps provided (10 pts)
- ✓ At least 1 complete code example (10 pts)
#### Accuracy (30 points)
- ✓ All API signatures match official docs exactly (15 pts)
- Check: No paraphrasing, exact parameter types, correct returns
- ✓ Version numbers correct and consistent (5 pts)
- ✓ URLs all valid and point to official sources (10 pts)
- Test: Each URL should be from official domain
#### Citation (20 points)
- ✓ Every API has source URL (10 pts)
- ✓ Sources include version and section references (5 pts)
- ✓ Confidence level stated and justified (5 pts)
#### Actionability (10 points)
- ✓ Implementation checklist provided (5 pts)
- ✓ Open questions identify real decisions (5 pts)
**Passing Score**: 80/100 or higher
**Validation Process**:
```python
# Pseudo-code for validation logic
def validate_research_pack(research_pack):
score = 0
defects = []
# Completeness checks
if has_library_with_version(research_pack):
score += 10
else:
defects.append("CRITICAL: Library/version not identified")
api_count = count_documented_apis(research_pack)
if api_count >= 3:
score += 10
elif api_count > 0:
score += (api_count / 3) * 10
defects.append(f"MINOR: Only {api_count} APIs documented, need 3+")
else:
defects.append("CRITICAL: No APIs documented")
# ... (continue for all criteria)
return {
"score": score,
"grade": "PASS" if score >= 80 else "FAIL",
"defects": defects,
"recommendations": generate_recommendations(defects)
}
```
**Output Format**:
```markdown
## 📊 ResearchPack Validation Report
**Overall Score**: [X]/100
**Grade**: [PASS ✅ / FAIL ❌]
### Breakdown
- Completeness: [X]/40
- Accuracy: [X]/30
- Citation: [X]/20
- Actionability: [X]/10
### Defects Found ([N])
#### CRITICAL (blocks implementation)
1. [Specific defect with example]
2. [Another defect]
#### MAJOR (should fix before proceeding)
1. [Defect]
#### MINOR (nice to have)
1. [Defect]
### Recommendations
**To reach passing score**:
1. [Specific action to take]
2. [Another action]
**If score >= 80**: ✅ **APPROVED** - Proceed to implementation-planner
**If score < 80**: ❌ **BLOCKED** - Fix critical/major defects and re-validate
```
### 1b. ResearchPack Validation - Philosophy Research Type
**Purpose**: Ensure philosophy/pattern research is well-organized, sourced, and actionable
**Validation Rubric for Philosophy Research** (100 points total, 70+ pass threshold):
#### Thematic Organization (30 points)
- ✓ Clear themes/patterns identified with descriptive names (10 pts)
- Check: Each theme has a clear title and scope
- Examples: "Agent Architecture", "Context Engineering", "Multi-Agent Patterns"
- ✓ Each theme well-documented with examples and evidence (10 pts)
- Check: Themes have sub-sections, not just bullet points
- Check: Examples or quotes support each theme
- ✓ Cross-theme synthesis and relationships explained (10 pts)
- Check: "How patterns connect" or "Synthesis" section present
- Check: Explains how themes relate or build on each other
#### Source Quality (20 points)
- ✓ Official/authoritative sources cited (10 pts)
- Check: URLs from official domains (anthropic.com, docs.*, official repos)
- Examples: Anthropic blog, official documentation, framework guides
- ✓ Multiple sources per theme (5 pts)
- Check: Each major theme cites 2+ sources
- No single-source themes (indicates narrow research)
- ✓ Date/version information when applicable (5 pts)
- Check: Article dates, release versions, "as of [date]" present
- Helps determine if research is current
#### Actionable Insights (30 points)
- ✓ Implementation checklist provided (15 pts)
- Check: Concrete next steps for applying research
- Format: "Enhancement 1.1:", "Step 1:", "Action Items"
- Examples: "Add think protocol to agents", "Create context-engineering skill"
- ✓ Specific patterns extracted and documented (10 pts)
- Check: Patterns section with clear pattern names
- Check: Each pattern has description and when to use
- Examples: "Pattern 1: Minimal Scaffolding", "Pattern 2: Think Before Act"
- ✓ Open questions identified for planning phase (5 pts)
- Check: Research acknowledges what's unknown or needs deciding
- Examples: "Which agents need think tool?", "When to use multi-agent?"
#### Depth & Coverage (20 points)
- ✓ Comprehensive coverage of topic (10 pts)
- Check: Multiple aspects of topic covered
- Check: Not surface-level (goes beyond basic definitions)
- Examples: 7+ themes, 10+ sources for major topics
- ✓ Sufficient detail for implementation (10 pts)
- Check: Enough context to make decisions
- Check: Includes performance metrics, tradeoffs, examples
- Examples: "39% improvement", "15x cost", specific numbers
**Passing Score**: 70/100 or higher
**Why Lower Threshold Than API Research?**
Philosophy research is inherently more subjective and thematic. A well-organized thematic analysis with 7 patterns from 11 sources (like the Anthropic ResearchPack) deserves to pass even if it doesn't have "3+ API endpoints with exact signatures."
Philosophy research provides **strategic value**:
- Informs how to build, not just what APIs to call
- Establishes principles that apply across implementations
- Captures institutional knowledge and best practices
- Enables better decision-making during planning
**Example: Anthropic Engineering Philosophy ResearchPack**
Would score:
- **Thematic Organization**: 30/30 (7 clear themes, cross-synthesis section)
- **Source Quality**: 20/20 (11 official Anthropic articles, all dated)
- **Actionable Insights**: 28/30 (Implementation checklist present, 7 patterns extracted, open questions listed)
- **Depth & Coverage**: 18/20 (Comprehensive, but more examples would help)
- **Total**: 96/100 ✅ **PASS** (well above 70 threshold)
**Output Format**:
```markdown
## 📊 ResearchPack Validation Report (Philosophy Research)
**Overall Score**: [X]/100
**Grade**: [PASS ✅ / FAIL ❌]
**Research Type**: Philosophy/Pattern Research
### Breakdown
**Thematic Organization** ([X]/30):
- Clear themes: [Y/10] [✓/✗]
- Theme documentation: [Y/10] [✓/✗]
- Cross-synthesis: [Y/10] [✓/✗]
**Source Quality** ([X]/20):
- Official sources: [Y/10] [✓/✗]
- Multiple sources per theme: [Y/5] [✓/✗]
- Date/version info: [Y/5] [✓/✗]
**Actionable Insights** ([X]/30):
- Implementation checklist: [Y/15] [✓/✗]
- Patterns extracted: [Y/10] [✓/✗]
- Open questions: [Y/5] [✓/✗]
**Depth & Coverage** ([X]/20):
- Comprehensive coverage: [Y/10] [✓/✗]
- Sufficient detail: [Y/10] [✓/✗]
### Defects Found ([N])
#### CRITICAL (blocks implementation)
1. [Defect - if no themes identified, no patterns extracted, etc.]
#### MAJOR (should fix before proceeding)
1. [Defect - if only 1 source per theme, missing implementation checklist, etc.]
#### MINOR (nice to have)
1. [Defect - if some themes lack examples, could use more sources, etc.]
### Recommendations
**To reach passing score** (if < 70):
1. [Specific action to take]
2. [Another action]
**If score >= 70**: ✅ **APPROVED** - Proceed to implementation-planner
**If score < 70**: ❌ **BLOCKED** - Fix critical/major defects and re-validate
**Philosophy Research Note**: This research provides strategic guidance for implementation. Even if specific API details are needed later, the principles and patterns documented here are valuable for decision-making.
```
### 2. Implementation Plan Validation
**Purpose**: Ensure plan is complete, safe, and executable before coding
**Validation Rubric** (100 points total):
#### Completeness (35 points)
- ✓ All file changes listed with purposes (10 pts)
- ✓ Step-by-step implementation sequence (10 pts)
- ✓ Each step has verification method (10 pts)
- ✓ Test plan included (5 pts)
#### Safety (30 points)
- ✓ Rollback plan complete and specific (15 pts)
- Must include: exact commands, verification steps, triggers
- ✓ Risk assessment done (10 pts)
- At least 3 risks identified with mitigations
- ✓ Changes are minimal (fewest files possible) (5 pts)
#### Clarity (20 points)
- ✓ Steps are actionable (no ambiguity) (10 pts)
- ✓ Success criteria defined (5 pts)
- ✓ Time estimates provided (5 pts)
#### Alignment (15 points)
- ✓ Plan matches ResearchPack APIs (10 pts)
- ✓ Plan addresses all requirements from user (5 pts)
**Passing Score**: 85/100 or higher (higher bar than research)
**Validation Process**:
```python
def validate_implementation_plan(plan, research_pack):
score = 0
defects = []
# Completeness checks
if has_file_changes_list(plan):
score += 10
else:
defects.append("CRITICAL: No file changes specified")
steps = extract_steps(plan)
if all(step_has_verification(s) for s in steps):
score += 10
else:
missing = [s for s in steps if not step_has_verification(s)]
score += (len(steps) - len(missing)) / len(steps) * 10
defects.append(f"MAJOR: Steps {missing} lack verification")
# Safety checks
rollback = extract_rollback_plan(plan)
if has_exact_commands(rollback) and has_triggers(rollback):
score += 15
elif has_rollback_section(plan):
score += 8
defects.append("MAJOR: Rollback plan incomplete (missing commands or triggers)")
else:
defects.append("CRITICAL: No rollback plan")
# Alignment checks
apis_used = extract_apis_from_plan(plan)
research_apis = extract_apis_from_research(research_pack)
if all(api_matches_research(a, research_apis) for a in apis_used):
score += 10
else:
mismatches = find_api_mismatches(apis_used, research_apis)
defects.append(f"CRITICAL: APIs don't match ResearchPack: {mismatches}")
# ... (continue for all criteria)
return {
"score": score,
"grade": "PASS" if score >= 85 else "FAIL",
"defects": defects,
"recommendations": generate_recommendations(defects)
}
```
**Output Format**:
```markdown
## 📊 Implementation Plan Validation Report
**Overall Score**: [X]/100
**Grade**: [PASS ✅ / FAIL ❌]
### Breakdown
- Completeness: [X]/35
- Safety: [X]/30
- Clarity: [X]/20
- Alignment: [X]/15
### Defects Found ([N])
#### CRITICAL (blocks implementation)
1. [Specific defect]
#### MAJOR (should fix)
1. [Defect]
#### MINOR (nice to have)
1. [Defect]
### API Alignment Check
✅ All APIs match ResearchPack
OR
❌ Mismatches found:
- Plan uses `foo(x, y)` but ResearchPack shows `foo(x: string, y?: number)`
### Recommendations
**To reach passing score**:
1. [Action]
**If score >= 85**: ✅ **APPROVED** - Proceed to code-implementer
**If score < 85**: ❌ **BLOCKED** - Fix defects and re-validate
```
## Quality Gate Protocol
**Gates are MANDATORY checkpoints** - cannot proceed to next phase without passing validation.
### Gate 1: Research → Planning
```
Trigger: @docs-researcher completes ResearchPack
Action: Validate ResearchPack
Decision:
- Score >= 80: ✅ Allow @implementation-planner to proceed
- Score < 80: ❌ Block, return to @docs-researcher with defect list
```
### Gate 2: Planning → Implementation
```
Trigger: @implementation-planner completes Implementation Plan
Action: Validate Implementation Plan + check alignment with ResearchPack
Decision:
- Score >= 85 AND APIs match: ✅ Allow @code-implementer to proceed
- Score < 85 OR APIs mismatch: ❌ Block, return to @implementation-planner with defect list
```
### Gate 3: Implementation → Completion
```
Trigger: @code-implementer reports completion
Action: Validate tests passed, build succeeded, no regressions
Decision:
- All checks pass: ✅ Mark complete
- Any check fails: ❌ Trigger self-correction loop (up to 3 attempts)
```
## Validation Automation
**These validations should be automated via hooks** (see hooks implementation):
```json
{
"hooks": {
"PreToolUse": [
{
"matcher": "implementation-planner",
"command": "validate-research-pack.sh",
"action": "block_if_fails"
},
{
"matcher": "code-implementer",
"command": "validate-implementation-plan.sh",
"action": "block_if_fails"
}
]
}
}
```
**Validation scripts return**:
- Exit code 0: Validation passed, proceed
- Exit code 1: Validation failed, defects printed to stdout, block
## Common Validation Failures
### ResearchPack Failures
**Hallucinated APIs**:
```
❌ CRITICAL: API `redis.client.fetch()` not found in official docs
ResearchPack cites: redis.io/docs/clients/nodejs
Actual API: `client.get()` (verified at redis.io/docs/clients/nodejs#get)
FIX: Replace all instances of `fetch` with correct `get` API
```
**Version mismatch**:
```
❌ MAJOR: ResearchPack uses v3.x docs but project has v4.6.0
Example: v3 uses callbacks, v4 uses promises
FIX: Re-fetch docs for v4.6.0 specifically
```
**Missing citations**:
```
❌ MAJOR: 5 APIs listed without source URLs
APIs: set(), del(), ttl(), exists(), keys()
FIX: Add source URL for each (format: docs.com/path#section)
```
### Implementation Plan Failures
**No rollback plan**:
```
❌ CRITICAL: Rollback plan missing
FIX: Add section "## 🔄 Rollback Plan" with:
- Exact git commands to revert
- Configuration restoration steps
- Verification after rollback
- Triggers for when to rollback
```
**Ambiguous steps**:
```
❌ MAJOR: Step 3 says "Update the service" (too vague)
FIX: Specify:
- Which service? (path/to/ServiceName.ts)
- What update? (Add method X, modify method Y)
- How to verify? (run `npm test path/to/test.ts`)
```
**API misalignment**:
```
❌ CRITICAL: Plan uses `client.fetch(key)` but ResearchPack shows `client.get(key)`
FIX: Update plan to use correct API signature from ResearchPack
```
## Performance Targets
- **Validation time**: < 15 seconds per validation
- **Defect detection rate**: 95%+ of major issues caught
- **False positive rate**: < 5% (don't block good work)
## Integration with Hooks
Hooks provide deterministic enforcement (always run, not LLM-dependent):
**Research validation hook**:
```bash
#!/bin/bash
# .claude/hooks/validate-research-pack.sh
RESEARCH_FILE="$1" # Path to ResearchPack file
# Check completeness
if ! grep -q "Target Library:" "$RESEARCH_FILE"; then
echo "❌ CRITICAL: Library not identified"
exit 1
fi
# Check API count
API_COUNT=$(grep -c "^###.*API" "$RESEARCH_FILE" || echo 0)
if [ "$API_COUNT" -lt 3 ]; then
echo "❌ MINOR: Only $API_COUNT APIs documented, need 3+"
# Don't block for this, just warn
fi
# Check citations
if ! grep -q "Source:" "$RESEARCH_FILE"; then
echo "❌ CRITICAL: No source citations found"
exit 1
fi
echo "✅ ResearchPack validation passed (score: [calculated]/100)"
exit 0
```
**Plan validation hook** (similar structure).
---
**This skill ensures quality gates are objective, automated, and enforce the Research → Plan → Implement workflow deterministically.**Quick Install
$
npx ai-builder add skill VAMFI/quality-validationDetails
- Type
- skill
- Author
- VAMFI
- Slug
- VAMFI/quality-validation
- Created
- 6d ago