infinite-agents-public/infinite_variants/infinite_variant_7/.claude/commands/self-test.md

13 KiB

Self Test - System Validation and Testing

Purpose: Validate system capabilities, test recent improvements, and ensure everything works as expected.

Usage

/self-test [scope] [depth]

Parameters

  • scope: What to test - "commands", "generation", "improvement", "integration", "all" (default: "all")
  • depth: Test thoroughness - "smoke", "standard", "comprehensive" (default: "standard")

Examples

# Quick smoke test of all systems
/self-test all smoke

# Standard test of generation capabilities
/self-test generation standard

# Comprehensive test of improvement system
/self-test improvement comprehensive

# Test command integration
/self-test integration standard

Command Implementation

You are the Self-Testing System. Your role is to validate that all components work correctly and improvements haven't broken functionality.

Phase 1: Test Planning

  1. Determine Test Scope

    Based on {{scope}} and {{depth}}:

    Smoke Testing (quick validation):

    • Basic functionality checks
    • Critical path verification
    • 5-10 minute execution
    • Pass/fail only

    Standard Testing (thorough validation):

    • Full functionality coverage
    • Quality metrics measurement
    • 15-30 minute execution
    • Detailed results

    Comprehensive Testing (complete validation):

    • All edge cases
    • Performance benchmarking
    • Integration testing
    • Regression detection
    • 45-90 minute execution
    • Full diagnostics
  2. Load Test Definitions

    Read test specifications from specs/test_definitions.md (auto-generated if missing).

Phase 2: Command Testing

  1. Test /infinite-meta Command

    ## Test: infinite-meta Basic Functionality
    
    **Setup:**
    - Use test spec: specs/example_spec.md
    - Generate 3 iterations
    - Output to: test_output/meta_test_{{timestamp}}/
    
    **Expected Behavior:**
    - Reads spec correctly
    - Generates 3 unique outputs
    - Creates improvement logs
    - No errors
    
    **Validation:**
    - [ ] All 3 files created
    - [ ] Files follow spec requirements
    - [ ] Improvement log exists
    - [ ] Files are unique (similarity < 70%)
    - [ ] Quality score  7.0
    
    **Result:** [PASS/FAIL]
    **Notes:** [observations]
    
  2. Test /improve-self Command

    ## Test: improve-self Analysis
    
    **Setup:**
    - Run: /improve-self all quick
    
    **Expected Behavior:**
    - Analyzes current system state
    - Generates improvement proposals (3-10)
    - Creates actionable recommendations
    - Applies meta-prompting principles
    
    **Validation:**
    - [ ] Analysis completes without errors
    - [ ] Proposals are concrete and actionable
    - [ ] Risk assessments included
    - [ ] Meta-prompting principles evident
    - [ ] Output format is correct
    
    **Result:** [PASS/FAIL]
    **Notes:** [observations]
    
  3. Test /generate-spec Command

    ## Test: generate-spec Creation
    
    **Setup:**
    - Run: /generate-spec patterns novel test_spec
    
    **Expected Behavior:**
    - Analyzes patterns successfully
    - Generates valid specification
    - Creates companion files
    - Spec is actually usable
    
    **Validation:**
    - [ ] Spec file created with correct structure
    - [ ] Guide file generated
    - [ ] Examples file generated
    - [ ] Spec follows meta-prompting principles
    - [ ] Validation test passes
    
    **Bonus Validation:**
    - [ ] Use generated spec with /infinite-meta
    - [ ] Verify it produces valid output
    
    **Result:** [PASS/FAIL]
    **Notes:** [observations]
    
  4. Test /evolve-strategy Command

    ## Test: evolve-strategy Evolution
    
    **Setup:**
    - Run: /evolve-strategy quality incremental
    
    **Expected Behavior:**
    - Analyzes current strategy
    - Generates evolution proposals
    - Creates implementation plan
    - Includes rollback procedures
    
    **Validation:**
    - [ ] Strategy analysis is thorough
    - [ ] Evolution proposals are concrete
    - [ ] Risk assessment included
    - [ ] Validation plan exists
    - [ ] Meta-level insights present
    
    **Result:** [PASS/FAIL]
    **Notes:** [observations]
    
  5. Test /self-test Command (meta!)

    ## Test: self-test Self-Testing
    
    **Setup:**
    - This test is self-referential!
    
    **Expected Behavior:**
    - Can test itself
    - Detects if testing framework is broken
    - Meta-awareness demonstrated
    
    **Validation:**
    - [ ] This test runs successfully
    - [ ] Can detect test failures
    - [ ] Generates valid test reports
    - [ ] Shows meta-level capability
    
    **Result:** [PASS/FAIL]
    **Notes:** [This is meta-testing - testing the tester]
    
  6. Test /self-document Command

    ## Test: self-document Documentation
    
    **Setup:**
    - Run: /self-document all standard
    
    **Expected Behavior:**
    - Analyzes system state
    - Generates/updates documentation
    - Reflects current capabilities
    - Clear and accurate
    
    **Validation:**
    - [ ] README.md updated/created
    - [ ] Documentation is accurate
    - [ ] Examples are current
    - [ ] Reflects latest improvements
    - [ ] Meta-capabilities documented
    
    **Result:** [PASS/FAIL]
    **Notes:** [observations]
    

Phase 3: Generation Testing

  1. Test Content Generation Quality

    ## Test: Generation Quality Baseline
    
    **Setup:**
    - Generate 5 iterations with specs/example_spec.md
    - Measure quality metrics
    
    **Expected Metrics:**
    - Quality avg: ≥ 7.5
    - Uniqueness: ≥ 80%
    - Spec compliance: 100%
    - Meta-awareness: ≥ 6.0
    
    **Validation:**
    - [ ] All files generated successfully
    - [ ] Quality meets threshold
    - [ ] No duplicates
    - [ ] Spec requirements met
    - [ ] Meta-reflection present
    
    **Result:** [PASS/FAIL]
    **Metrics:** [actual values]
    
  2. Test Progressive Improvement

    ## Test: Wave-Based Improvement
    
    **Setup:**
    - Run 3 waves of 5 iterations each
    - Track quality across waves
    - Enable improvement_mode = "evolve"
    
    **Expected Behavior:**
    - Quality improves across waves
    - Diversity maintained
    - Strategy evolves
    - Meta-insights accumulated
    
    **Validation:**
    - [ ] Wave 2 quality > Wave 1 quality
    - [ ] Wave 3 quality > Wave 2 quality
    - [ ] Diversity maintained (≥ 80%)
    - [ ] Strategy evolution documented
    - [ ] Improvement logs created
    
    **Result:** [PASS/FAIL]
    **Metrics:** [quality progression]
    

Phase 4: Improvement System Testing

  1. Test Self-Improvement Loop

    ## Test: Complete Self-Improvement Cycle
    
    **Setup:**
    1. Run baseline generation (5 iterations)
    2. Run /improve-self to analyze
    3. Run /evolve-strategy to evolve
    4. Run improved generation (5 iterations)
    5. Compare results
    
    **Expected Behavior:**
    - Improvements detected
    - Strategy evolved
    - Second generation better than first
    - Changes are traceable
    
    **Validation:**
    - [ ] Improvement analysis complete
    - [ ] Strategy evolution applied
    - [ ] Measurable improvement in metrics
    - [ ] All changes logged
    - [ ] Rollback possible
    
    **Result:** [PASS/FAIL]
    **Improvement:** [% increase in quality]
    
  2. Test Meta-Prompting Application

    ## Test: Meta-Prompting Principles
    
    **Setup:**
    - Review all generated content
    - Analyze for meta-prompting principles
    
    **Expected Behavior:**
    - Structure-oriented approaches evident
    - Abstract frameworks used
    - Minimal example dependency
    - Efficient reasoning patterns
    
    **Validation:**
    - [ ] Commands use structural patterns
    - [ ] Specs define frameworks, not examples
    - [ ] Reasoning is principle-based
    - [ ] Meta-awareness present throughout
    - [ ] Self-improvement capability demonstrated
    
    **Result:** [PASS/FAIL]
    **Notes:** [specific examples]
    

Phase 5: Integration Testing

  1. Test Command Integration

    ## Test: Multi-Command Workflow
    
    **Setup:**
    1. /generate-spec patterns novel workflow_test
    2. /infinite-meta workflow_test.md output/ 5 evolve
    3. /improve-self all standard
    4. /evolve-strategy quality incremental
    5. /infinite-meta workflow_test.md output/ 5 evolve
    6. /self-document all standard
    
    **Expected Behavior:**
    - All commands work together
    - Data flows between commands
    - Improvements compound
    - System remains stable
    
    **Validation:**
    - [ ] All steps complete successfully
    - [ ] Generated spec is usable
    - [ ] Improvements detected and applied
    - [ ] Second generation better than first
    - [ ] Documentation reflects changes
    - [ ] No data corruption or errors
    
    **Result:** [PASS/FAIL]
    **Notes:** [integration observations]
    
  2. Test Regression Detection

    ## Test: No Regressions from Improvements
    
    **Setup:**
    - Load baseline metrics from improvement_log/baseline.json
    - Run current system with same test cases
    - Compare results
    
    **Expected Behavior:**
    - No metric should regress >10%
    - Most metrics should improve
    - Quality maintains or increases
    
    **Validation:**
    - [ ] Quality: No regression
    - [ ] Efficiency: No regression
    - [ ] Diversity: No regression
    - [ ] Meta-awareness: No regression
    - [ ] Overall: Net improvement
    
    **Result:** [PASS/FAIL]
    **Regressions:** [list any]
    **Improvements:** [list all]
    

Phase 6: Reporting

  1. Generate Test Report

    Create improvement_log/test_report_{{timestamp}}.md:

    # Self-Test Report - {{timestamp}}
    
    **Scope:** {{scope}}
    **Depth:** {{depth}}
    **Duration:** {{duration}} minutes
    
    ## Executive Summary
    - Total Tests: {{total}}
    - Passed: {{passed}} ({{pass_rate}}%)
    - Failed: {{failed}}
    - Warnings: {{warnings}}
    
    ## Overall Status: [PASS/FAIL/WARNING]
    
    ## Test Results by Category
    
    ### Command Tests ({{n}} tests)
    | Command | Status | Notes |
    |---------|--------|-------|
    | /infinite-meta | PASS/FAIL | ... |
    | /improve-self | PASS/FAIL | ... |
    | /generate-spec | PASS/FAIL | ... |
    | /evolve-strategy | PASS/FAIL | ... |
    | /self-test | PASS/FAIL | ... |
    | /self-document | PASS/FAIL | ... |
    
    ### Generation Tests ({{n}} tests)
    | Test | Status | Metrics |
    |------|--------|---------|
    | Quality Baseline | PASS/FAIL | 7.8 avg |
    | Progressive Improvement | PASS/FAIL | +12% |
    
    ### Improvement Tests ({{n}} tests)
    | Test | Status | Impact |
    |------|--------|--------|
    | Self-Improvement Loop | PASS/FAIL | +15% quality |
    | Meta-Prompting | PASS/FAIL | Evident |
    
    ### Integration Tests ({{n}} tests)
    | Test | Status | Notes |
    |------|--------|-------|
    | Multi-Command Workflow | PASS/FAIL | ... |
    | Regression Detection | PASS/FAIL | ... |
    
    ## Failed Tests (if any)
    [Detailed analysis of failures]
    
    ## Performance Metrics
    - Avg generation time: {{time}}s
    - Context usage: {{context}}%
    - Quality score: {{quality}}
    - Improvement rate: {{rate}}%
    
    ## Recommendations
    [What should be fixed or improved based on test results]
    
    ## Meta-Insights
    [What testing reveals about system capabilities and improvement areas]
    
    ## Next Steps
    1. [Action item 1]
    2. [Action item 2]
    
  2. Update Health Dashboard

    Create/update improvement_log/system_health.json:

    {
      "last_test": "{{timestamp}}",
      "overall_status": "healthy|degraded|critical",
      "test_pass_rate": 0.95,
      "quality_trend": "improving|stable|declining",
      "metrics": {
        "quality_avg": 8.2,
        "efficiency": 0.82,
        "diversity": 0.88,
        "meta_awareness": 0.75
      },
      "issues": [
        "Minor: Edge case in /generate-spec with empty patterns"
      ],
      "recent_improvements": [
        "Quality +12% since last test",
        "Efficiency +8% since last test"
      ]
    }
    

Meta-Testing Pattern

This command tests itself using meta-prompting:

STRUCTURE: Self-testing system
ABSTRACTION: Testing validates structure and patterns, not just functionality

REASONING:
1. Define test structure (framework)
2. Execute tests (pattern validation)
3. Analyze results (meta-evaluation)
4. Report findings (actionable insights)

SELF_REFLECTION:
- Do my tests validate structure or just surface functionality?
- Are test patterns generalizable to new capabilities?
- Can I detect meta-level issues?
- Does testing improve the system?

META_CAPABILITY:
Testing should improve testability.
Each test run should enhance test coverage.
Failures should generate better tests.

Success Criteria

A successful self-test:

  1. Validates all critical functionality
  2. Detects regressions if present
  3. Measures key metrics accurately
  4. Provides actionable recommendations
  5. Shows meta-awareness (can test itself)
  6. Generates clear, useful reports
  7. Enables continuous improvement

This command validates system capabilities using meta-prompting principles. It tests structure and patterns, not just functionality, enabling generalizable quality assurance and continuous improvement.