infinite-agents-public/infinite_variants/infinite_variant_2/.claude/commands/test-output.md

9.4 KiB

Test-Output - Generated Output Testing Utility

You are the output testing utility for the Infinite Agentic Loop ecosystem. Your purpose is to validate that generated outputs meet specification requirements and quality standards.

Chain-of-Thought Testing Process

Let's think through output testing step by step:

Step 1: Understand Testing Context

Define what we're testing and why:

  1. What are we testing?

    • Single iteration or batch?
    • Which output directory?
    • Against which specification?
  2. What are the success criteria?

    • Spec compliance requirements
    • Quality thresholds
    • Uniqueness constraints
  3. What's the testing scope?

    • Full validation or targeted checks?
    • Sample testing or exhaustive?
    • Regression testing or new outputs?

Step 2: Load Specification Requirements

Parse the spec to extract testable criteria:

  1. Required Structure

    • File naming patterns
    • Directory organization
    • Required file types
    • Component parts expected
  2. Content Requirements

    • Required sections/components
    • Minimum content length
    • Required functionality
    • Expected patterns
  3. Quality Standards

    • Completeness criteria
    • Technical correctness
    • Innovation/creativity level
    • User-facing quality
  4. Uniqueness Constraints

    • What must differ between iterations
    • What similarity is acceptable
    • Duplication boundaries

Step 3: Collect Output Files

Systematically gather what was generated:

  1. File Discovery

    • Find all files matching naming patterns
    • Verify expected count vs actual count
    • Check for orphaned or unexpected files
  2. File Organization

    • Group by iteration number
    • Identify related components
    • Map dependencies
  3. Metadata Collection

    • File sizes
    • Creation timestamps
    • File types

Step 4: Execute Structural Tests

Verify outputs match expected structure:

Test 1: Naming Convention Compliance

  • Do files follow naming pattern from spec?
  • Are iteration numbers sequential?
  • Are file extensions correct?
  • Result: PASS/FAIL for each file

Test 2: File Structure Completeness

  • Are all required files present per iteration?
  • Are multi-file components complete?
  • Are directory structures correct?
  • Result: PASS/FAIL for each iteration

Test 3: File Accessibility

  • Can all files be read?
  • Are character encodings correct?
  • Are file sizes reasonable?
  • Result: PASS/FAIL for each file

Step 5: Execute Content Tests

Verify content meets requirements:

Test 4: Required Sections Present For each output file:

  • Read content
  • Check for required sections/components
  • Verify section ordering
  • Result: PASS/FAIL with missing sections listed

Test 5: Content Completeness For each required section:

  • Is content substantive (not just stubs)?
  • Does it meet minimum length requirements?
  • Is it well-formed and complete?
  • Result: PASS/FAIL with quality score

Test 6: Technical Correctness Based on content type:

  • HTML: Valid syntax, complete tags
  • CSS: Valid properties, no syntax errors
  • JavaScript: Valid syntax, no obvious errors
  • Markdown: Proper formatting, valid links
  • Result: PASS/FAIL with error details

Step 6: Execute Quality Tests

Test 7: Quality Standards Compliance Against spec quality criteria:

  • Does content meet stated standards?
  • Is innovation/creativity evident?
  • Is user-facing quality high?
  • Result: Quality score (0-100) per iteration

Test 8: Uniqueness Validation Compare iterations to each other:

  • Are themes sufficiently distinct?
  • Is there unintended duplication?
  • Do iterations meet variation requirements?
  • Result: PASS/FAIL with similarity scores

Test 9: Integration Checks If applicable:

  • Do components work together?
  • Are references/links valid?
  • Are dependencies satisfied?
  • Result: PASS/FAIL for each integration point

Step 7: Aggregate Results

Compile findings across all tests:

  1. Per-Iteration Results

    • Test results for each iteration
    • Pass/fail status
    • Quality scores
    • Issues detected
  2. Overall Statistics

    • Total pass rate
    • Most common failures
    • Quality distribution
    • Compliance percentage
  3. Issue Classification

    • Critical failures (blocks use)
    • Minor failures (degraded quality)
    • Warnings (best practice violations)

Step 8: Generate Test Report

Present results with actionable insights:

  1. Executive Summary - Overall pass/fail status
  2. Detailed Results - Per-iteration breakdown
  3. Issue Analysis - What failed and why
  4. Remediation Steps - How to fix failures
  5. Quality Assessment - Overall quality evaluation

Command Format

/test-output [output_dir] [spec_file] [options]

Arguments:

  • output_dir: Directory containing generated outputs
  • spec_file: Specification file to test against
  • options: (optional) Test scope: all, structural, content, quality

Test Report Structure

# Output Testing Report

## Test Summary
- Output Directory: [path]
- Specification: [spec file]
- Test Date: [timestamp]
- Overall Status: [PASS / FAIL / PASS WITH WARNINGS]

## Results Overview
- Total Iterations Tested: X
- Passed All Tests: Y (Z%)
- Failed One or More Tests: Y (Z%)
- Average Quality Score: X/100

## Test Results by Category

### Structural Tests (Tests 1-3)
- Naming Convention: X/Y passed
- Structure Completeness: X/Y passed
- File Accessibility: X/Y passed

### Content Tests (Tests 4-6)
- Required Sections: X/Y passed
- Content Completeness: X/Y passed
- Technical Correctness: X/Y passed

### Quality Tests (Tests 7-9)
- Quality Standards: X/Y passed
- Uniqueness Validation: X/Y passed
- Integration Checks: X/Y passed

## Detailed Results

### [Iteration 1]
**Status:** [PASS / FAIL / WARNING]
**Quality Score:** X/100

**Test Results:**
- Test 1 (Naming): [PASS/FAIL] - [details]
- Test 2 (Structure): [PASS/FAIL] - [details]
- Test 3 (Accessibility): [PASS/FAIL] - [details]
- Test 4 (Sections): [PASS/FAIL] - [details]
- Test 5 (Completeness): [PASS/FAIL] - [details]
- Test 6 (Technical): [PASS/FAIL] - [details]
- Test 7 (Quality): [PASS/FAIL] - [details]
- Test 8 (Uniqueness): [PASS/FAIL] - [details]
- Test 9 (Integration): [PASS/FAIL] - [details]

**Issues:**
[None] OR:
- [Issue 1] - [severity] - [description]
- [Issue 2] - [severity] - [description]

[Repeat for each iteration]

## Failures Analysis

### Critical Failures
[None found] OR:
1. **[Failure Pattern]**
   - Affected iterations: [list]
   - Root cause: [analysis]
   - Fix: [remediation steps]

### Minor Failures
[None found] OR:
1. **[Failure Pattern]**
   - Affected iterations: [list]
   - Impact: [description]
   - Fix: [remediation steps]

### Warnings
1. **[Warning Pattern]**
   - Affected iterations: [list]
   - Concern: [description]
   - Recommendation: [improvement]

## Quality Analysis

### Quality Score Distribution
- Excellent (90-100): X iterations
- Good (75-89): Y iterations
- Acceptable (60-74): Z iterations
- Below Standard (<60): W iterations

### Strengths
- [Strength 1] - observed in X iterations
- [Strength 2] - observed in Y iterations

### Weaknesses
- [Weakness 1] - observed in X iterations
- [Weakness 2] - observed in Y iterations

## Uniqueness Assessment
- High Variation: X iteration pairs
- Moderate Variation: Y iteration pairs
- Low Variation (potential duplicates): Z iteration pairs

**Potential Duplicates:**
[None detected] OR:
- [Iteration A] and [Iteration B] - similarity score: X%
  - Similar aspects: [description]
  - Recommended action: [revise one/accept/investigate]

## Recommendations

### Immediate Actions
1. **[Action 1]** - [Priority: High/Medium/Low]
   - Issue: [what needs fixing]
   - Impact: [why it matters]
   - Steps: [how to fix]

### Quality Improvements
1. **[Improvement 1]**
   - Current state: [description]
   - Desired state: [description]
   - How to achieve: [steps]

### Spec Refinements
1. **[Refinement 1]**
   - Issue in spec: [description]
   - Impact on outputs: [description]
   - Suggested spec change: [description]

## Approval Decision

**Overall Assessment:** [APPROVED / CONDITIONAL / REJECTED]

**Rationale:**
[Explanation based on test results]

**Next Steps:**
[What should happen next]

Usage Examples

# Test all outputs against specification
/test-output outputs/ specs/example_spec.md

# Test only structural compliance
/test-output outputs/ specs/example_spec.md structural

# Test content quality only
/test-output outputs/ specs/example_spec.md content

# Comprehensive quality assessment
/test-output outputs/ specs/example_spec.md quality

Chain-of-Thought Benefits

This utility uses explicit reasoning to:

  • Systematically execute all relevant test types
  • Make test criteria transparent and reproducible
  • Provide clear failure explanations for debugging
  • Enable developers to understand why tests fail
  • Support continuous quality improvement through detailed feedback

Execution Protocol

Now, execute the testing:

  1. Understand context - what, why, and scope
  2. Load spec requirements - extract testable criteria
  3. Collect outputs - discover and organize files
  4. Run structural tests - naming, structure, accessibility
  5. Run content tests - sections, completeness, correctness
  6. Run quality tests - standards, uniqueness, integration
  7. Aggregate results - compile findings
  8. Generate report - structured results with recommendations

Begin testing of the specified outputs.