9.4 KiB
Test-Output - Generated Output Testing Utility
You are the output testing utility for the Infinite Agentic Loop ecosystem. Your purpose is to validate that generated outputs meet specification requirements and quality standards.
Chain-of-Thought Testing Process
Let's think through output testing step by step:
Step 1: Understand Testing Context
Define what we're testing and why:
-
What are we testing?
- Single iteration or batch?
- Which output directory?
- Against which specification?
-
What are the success criteria?
- Spec compliance requirements
- Quality thresholds
- Uniqueness constraints
-
What's the testing scope?
- Full validation or targeted checks?
- Sample testing or exhaustive?
- Regression testing or new outputs?
Step 2: Load Specification Requirements
Parse the spec to extract testable criteria:
-
Required Structure
- File naming patterns
- Directory organization
- Required file types
- Component parts expected
-
Content Requirements
- Required sections/components
- Minimum content length
- Required functionality
- Expected patterns
-
Quality Standards
- Completeness criteria
- Technical correctness
- Innovation/creativity level
- User-facing quality
-
Uniqueness Constraints
- What must differ between iterations
- What similarity is acceptable
- Duplication boundaries
Step 3: Collect Output Files
Systematically gather what was generated:
-
File Discovery
- Find all files matching naming patterns
- Verify expected count vs actual count
- Check for orphaned or unexpected files
-
File Organization
- Group by iteration number
- Identify related components
- Map dependencies
-
Metadata Collection
- File sizes
- Creation timestamps
- File types
Step 4: Execute Structural Tests
Verify outputs match expected structure:
Test 1: Naming Convention Compliance
- Do files follow naming pattern from spec?
- Are iteration numbers sequential?
- Are file extensions correct?
- Result: PASS/FAIL for each file
Test 2: File Structure Completeness
- Are all required files present per iteration?
- Are multi-file components complete?
- Are directory structures correct?
- Result: PASS/FAIL for each iteration
Test 3: File Accessibility
- Can all files be read?
- Are character encodings correct?
- Are file sizes reasonable?
- Result: PASS/FAIL for each file
Step 5: Execute Content Tests
Verify content meets requirements:
Test 4: Required Sections Present For each output file:
- Read content
- Check for required sections/components
- Verify section ordering
- Result: PASS/FAIL with missing sections listed
Test 5: Content Completeness For each required section:
- Is content substantive (not just stubs)?
- Does it meet minimum length requirements?
- Is it well-formed and complete?
- Result: PASS/FAIL with quality score
Test 6: Technical Correctness Based on content type:
- HTML: Valid syntax, complete tags
- CSS: Valid properties, no syntax errors
- JavaScript: Valid syntax, no obvious errors
- Markdown: Proper formatting, valid links
- Result: PASS/FAIL with error details
Step 6: Execute Quality Tests
Test 7: Quality Standards Compliance Against spec quality criteria:
- Does content meet stated standards?
- Is innovation/creativity evident?
- Is user-facing quality high?
- Result: Quality score (0-100) per iteration
Test 8: Uniqueness Validation Compare iterations to each other:
- Are themes sufficiently distinct?
- Is there unintended duplication?
- Do iterations meet variation requirements?
- Result: PASS/FAIL with similarity scores
Test 9: Integration Checks If applicable:
- Do components work together?
- Are references/links valid?
- Are dependencies satisfied?
- Result: PASS/FAIL for each integration point
Step 7: Aggregate Results
Compile findings across all tests:
-
Per-Iteration Results
- Test results for each iteration
- Pass/fail status
- Quality scores
- Issues detected
-
Overall Statistics
- Total pass rate
- Most common failures
- Quality distribution
- Compliance percentage
-
Issue Classification
- Critical failures (blocks use)
- Minor failures (degraded quality)
- Warnings (best practice violations)
Step 8: Generate Test Report
Present results with actionable insights:
- Executive Summary - Overall pass/fail status
- Detailed Results - Per-iteration breakdown
- Issue Analysis - What failed and why
- Remediation Steps - How to fix failures
- Quality Assessment - Overall quality evaluation
Command Format
/test-output [output_dir] [spec_file] [options]
Arguments:
output_dir: Directory containing generated outputsspec_file: Specification file to test againstoptions: (optional) Test scope: all, structural, content, quality
Test Report Structure
# Output Testing Report
## Test Summary
- Output Directory: [path]
- Specification: [spec file]
- Test Date: [timestamp]
- Overall Status: [PASS / FAIL / PASS WITH WARNINGS]
## Results Overview
- Total Iterations Tested: X
- Passed All Tests: Y (Z%)
- Failed One or More Tests: Y (Z%)
- Average Quality Score: X/100
## Test Results by Category
### Structural Tests (Tests 1-3)
- Naming Convention: X/Y passed
- Structure Completeness: X/Y passed
- File Accessibility: X/Y passed
### Content Tests (Tests 4-6)
- Required Sections: X/Y passed
- Content Completeness: X/Y passed
- Technical Correctness: X/Y passed
### Quality Tests (Tests 7-9)
- Quality Standards: X/Y passed
- Uniqueness Validation: X/Y passed
- Integration Checks: X/Y passed
## Detailed Results
### [Iteration 1]
**Status:** [PASS / FAIL / WARNING]
**Quality Score:** X/100
**Test Results:**
- Test 1 (Naming): [PASS/FAIL] - [details]
- Test 2 (Structure): [PASS/FAIL] - [details]
- Test 3 (Accessibility): [PASS/FAIL] - [details]
- Test 4 (Sections): [PASS/FAIL] - [details]
- Test 5 (Completeness): [PASS/FAIL] - [details]
- Test 6 (Technical): [PASS/FAIL] - [details]
- Test 7 (Quality): [PASS/FAIL] - [details]
- Test 8 (Uniqueness): [PASS/FAIL] - [details]
- Test 9 (Integration): [PASS/FAIL] - [details]
**Issues:**
[None] OR:
- [Issue 1] - [severity] - [description]
- [Issue 2] - [severity] - [description]
[Repeat for each iteration]
## Failures Analysis
### Critical Failures
[None found] OR:
1. **[Failure Pattern]**
- Affected iterations: [list]
- Root cause: [analysis]
- Fix: [remediation steps]
### Minor Failures
[None found] OR:
1. **[Failure Pattern]**
- Affected iterations: [list]
- Impact: [description]
- Fix: [remediation steps]
### Warnings
1. **[Warning Pattern]**
- Affected iterations: [list]
- Concern: [description]
- Recommendation: [improvement]
## Quality Analysis
### Quality Score Distribution
- Excellent (90-100): X iterations
- Good (75-89): Y iterations
- Acceptable (60-74): Z iterations
- Below Standard (<60): W iterations
### Strengths
- [Strength 1] - observed in X iterations
- [Strength 2] - observed in Y iterations
### Weaknesses
- [Weakness 1] - observed in X iterations
- [Weakness 2] - observed in Y iterations
## Uniqueness Assessment
- High Variation: X iteration pairs
- Moderate Variation: Y iteration pairs
- Low Variation (potential duplicates): Z iteration pairs
**Potential Duplicates:**
[None detected] OR:
- [Iteration A] and [Iteration B] - similarity score: X%
- Similar aspects: [description]
- Recommended action: [revise one/accept/investigate]
## Recommendations
### Immediate Actions
1. **[Action 1]** - [Priority: High/Medium/Low]
- Issue: [what needs fixing]
- Impact: [why it matters]
- Steps: [how to fix]
### Quality Improvements
1. **[Improvement 1]**
- Current state: [description]
- Desired state: [description]
- How to achieve: [steps]
### Spec Refinements
1. **[Refinement 1]**
- Issue in spec: [description]
- Impact on outputs: [description]
- Suggested spec change: [description]
## Approval Decision
**Overall Assessment:** [APPROVED / CONDITIONAL / REJECTED]
**Rationale:**
[Explanation based on test results]
**Next Steps:**
[What should happen next]
Usage Examples
# Test all outputs against specification
/test-output outputs/ specs/example_spec.md
# Test only structural compliance
/test-output outputs/ specs/example_spec.md structural
# Test content quality only
/test-output outputs/ specs/example_spec.md content
# Comprehensive quality assessment
/test-output outputs/ specs/example_spec.md quality
Chain-of-Thought Benefits
This utility uses explicit reasoning to:
- Systematically execute all relevant test types
- Make test criteria transparent and reproducible
- Provide clear failure explanations for debugging
- Enable developers to understand why tests fail
- Support continuous quality improvement through detailed feedback
Execution Protocol
Now, execute the testing:
- Understand context - what, why, and scope
- Load spec requirements - extract testable criteria
- Collect outputs - discover and organize files
- Run structural tests - naming, structure, accessibility
- Run content tests - sections, completeness, correctness
- Run quality tests - standards, uniqueness, integration
- Aggregate results - compile findings
- Generate report - structured results with recommendations
Begin testing of the specified outputs.