13 KiB
Self Test - System Validation and Testing
Purpose: Validate system capabilities, test recent improvements, and ensure everything works as expected.
Usage
/self-test [scope] [depth]
Parameters
scope: What to test - "commands", "generation", "improvement", "integration", "all" (default: "all")depth: Test thoroughness - "smoke", "standard", "comprehensive" (default: "standard")
Examples
# Quick smoke test of all systems
/self-test all smoke
# Standard test of generation capabilities
/self-test generation standard
# Comprehensive test of improvement system
/self-test improvement comprehensive
# Test command integration
/self-test integration standard
Command Implementation
You are the Self-Testing System. Your role is to validate that all components work correctly and improvements haven't broken functionality.
Phase 1: Test Planning
-
Determine Test Scope
Based on
{{scope}}and{{depth}}:Smoke Testing (quick validation):
- Basic functionality checks
- Critical path verification
- 5-10 minute execution
- Pass/fail only
Standard Testing (thorough validation):
- Full functionality coverage
- Quality metrics measurement
- 15-30 minute execution
- Detailed results
Comprehensive Testing (complete validation):
- All edge cases
- Performance benchmarking
- Integration testing
- Regression detection
- 45-90 minute execution
- Full diagnostics
-
Load Test Definitions
Read test specifications from
specs/test_definitions.md(auto-generated if missing).
Phase 2: Command Testing
-
Test /infinite-meta Command
## Test: infinite-meta Basic Functionality **Setup:** - Use test spec: specs/example_spec.md - Generate 3 iterations - Output to: test_output/meta_test_{{timestamp}}/ **Expected Behavior:** - Reads spec correctly - Generates 3 unique outputs - Creates improvement logs - No errors **Validation:** - [ ] All 3 files created - [ ] Files follow spec requirements - [ ] Improvement log exists - [ ] Files are unique (similarity < 70%) - [ ] Quality score ≥ 7.0 **Result:** [PASS/FAIL] **Notes:** [observations] -
Test /improve-self Command
## Test: improve-self Analysis **Setup:** - Run: /improve-self all quick **Expected Behavior:** - Analyzes current system state - Generates improvement proposals (3-10) - Creates actionable recommendations - Applies meta-prompting principles **Validation:** - [ ] Analysis completes without errors - [ ] Proposals are concrete and actionable - [ ] Risk assessments included - [ ] Meta-prompting principles evident - [ ] Output format is correct **Result:** [PASS/FAIL] **Notes:** [observations] -
Test /generate-spec Command
## Test: generate-spec Creation **Setup:** - Run: /generate-spec patterns novel test_spec **Expected Behavior:** - Analyzes patterns successfully - Generates valid specification - Creates companion files - Spec is actually usable **Validation:** - [ ] Spec file created with correct structure - [ ] Guide file generated - [ ] Examples file generated - [ ] Spec follows meta-prompting principles - [ ] Validation test passes **Bonus Validation:** - [ ] Use generated spec with /infinite-meta - [ ] Verify it produces valid output **Result:** [PASS/FAIL] **Notes:** [observations] -
Test /evolve-strategy Command
## Test: evolve-strategy Evolution **Setup:** - Run: /evolve-strategy quality incremental **Expected Behavior:** - Analyzes current strategy - Generates evolution proposals - Creates implementation plan - Includes rollback procedures **Validation:** - [ ] Strategy analysis is thorough - [ ] Evolution proposals are concrete - [ ] Risk assessment included - [ ] Validation plan exists - [ ] Meta-level insights present **Result:** [PASS/FAIL] **Notes:** [observations] -
Test /self-test Command (meta!)
## Test: self-test Self-Testing **Setup:** - This test is self-referential! **Expected Behavior:** - Can test itself - Detects if testing framework is broken - Meta-awareness demonstrated **Validation:** - [ ] This test runs successfully - [ ] Can detect test failures - [ ] Generates valid test reports - [ ] Shows meta-level capability **Result:** [PASS/FAIL] **Notes:** [This is meta-testing - testing the tester] -
Test /self-document Command
## Test: self-document Documentation **Setup:** - Run: /self-document all standard **Expected Behavior:** - Analyzes system state - Generates/updates documentation - Reflects current capabilities - Clear and accurate **Validation:** - [ ] README.md updated/created - [ ] Documentation is accurate - [ ] Examples are current - [ ] Reflects latest improvements - [ ] Meta-capabilities documented **Result:** [PASS/FAIL] **Notes:** [observations]
Phase 3: Generation Testing
-
Test Content Generation Quality
## Test: Generation Quality Baseline **Setup:** - Generate 5 iterations with specs/example_spec.md - Measure quality metrics **Expected Metrics:** - Quality avg: ≥ 7.5 - Uniqueness: ≥ 80% - Spec compliance: 100% - Meta-awareness: ≥ 6.0 **Validation:** - [ ] All files generated successfully - [ ] Quality meets threshold - [ ] No duplicates - [ ] Spec requirements met - [ ] Meta-reflection present **Result:** [PASS/FAIL] **Metrics:** [actual values] -
Test Progressive Improvement
## Test: Wave-Based Improvement **Setup:** - Run 3 waves of 5 iterations each - Track quality across waves - Enable improvement_mode = "evolve" **Expected Behavior:** - Quality improves across waves - Diversity maintained - Strategy evolves - Meta-insights accumulated **Validation:** - [ ] Wave 2 quality > Wave 1 quality - [ ] Wave 3 quality > Wave 2 quality - [ ] Diversity maintained (≥ 80%) - [ ] Strategy evolution documented - [ ] Improvement logs created **Result:** [PASS/FAIL] **Metrics:** [quality progression]
Phase 4: Improvement System Testing
-
Test Self-Improvement Loop
## Test: Complete Self-Improvement Cycle **Setup:** 1. Run baseline generation (5 iterations) 2. Run /improve-self to analyze 3. Run /evolve-strategy to evolve 4. Run improved generation (5 iterations) 5. Compare results **Expected Behavior:** - Improvements detected - Strategy evolved - Second generation better than first - Changes are traceable **Validation:** - [ ] Improvement analysis complete - [ ] Strategy evolution applied - [ ] Measurable improvement in metrics - [ ] All changes logged - [ ] Rollback possible **Result:** [PASS/FAIL] **Improvement:** [% increase in quality] -
Test Meta-Prompting Application
## Test: Meta-Prompting Principles **Setup:** - Review all generated content - Analyze for meta-prompting principles **Expected Behavior:** - Structure-oriented approaches evident - Abstract frameworks used - Minimal example dependency - Efficient reasoning patterns **Validation:** - [ ] Commands use structural patterns - [ ] Specs define frameworks, not examples - [ ] Reasoning is principle-based - [ ] Meta-awareness present throughout - [ ] Self-improvement capability demonstrated **Result:** [PASS/FAIL] **Notes:** [specific examples]
Phase 5: Integration Testing
-
Test Command Integration
## Test: Multi-Command Workflow **Setup:** 1. /generate-spec patterns novel workflow_test 2. /infinite-meta workflow_test.md output/ 5 evolve 3. /improve-self all standard 4. /evolve-strategy quality incremental 5. /infinite-meta workflow_test.md output/ 5 evolve 6. /self-document all standard **Expected Behavior:** - All commands work together - Data flows between commands - Improvements compound - System remains stable **Validation:** - [ ] All steps complete successfully - [ ] Generated spec is usable - [ ] Improvements detected and applied - [ ] Second generation better than first - [ ] Documentation reflects changes - [ ] No data corruption or errors **Result:** [PASS/FAIL] **Notes:** [integration observations] -
Test Regression Detection
## Test: No Regressions from Improvements **Setup:** - Load baseline metrics from improvement_log/baseline.json - Run current system with same test cases - Compare results **Expected Behavior:** - No metric should regress >10% - Most metrics should improve - Quality maintains or increases **Validation:** - [ ] Quality: No regression - [ ] Efficiency: No regression - [ ] Diversity: No regression - [ ] Meta-awareness: No regression - [ ] Overall: Net improvement **Result:** [PASS/FAIL] **Regressions:** [list any] **Improvements:** [list all]
Phase 6: Reporting
-
Generate Test Report
Create
improvement_log/test_report_{{timestamp}}.md:# Self-Test Report - {{timestamp}} **Scope:** {{scope}} **Depth:** {{depth}} **Duration:** {{duration}} minutes ## Executive Summary - Total Tests: {{total}} - Passed: {{passed}} ({{pass_rate}}%) - Failed: {{failed}} - Warnings: {{warnings}} ## Overall Status: [PASS/FAIL/WARNING] ## Test Results by Category ### Command Tests ({{n}} tests) | Command | Status | Notes | |---------|--------|-------| | /infinite-meta | PASS/FAIL | ... | | /improve-self | PASS/FAIL | ... | | /generate-spec | PASS/FAIL | ... | | /evolve-strategy | PASS/FAIL | ... | | /self-test | PASS/FAIL | ... | | /self-document | PASS/FAIL | ... | ### Generation Tests ({{n}} tests) | Test | Status | Metrics | |------|--------|---------| | Quality Baseline | PASS/FAIL | 7.8 avg | | Progressive Improvement | PASS/FAIL | +12% | ### Improvement Tests ({{n}} tests) | Test | Status | Impact | |------|--------|--------| | Self-Improvement Loop | PASS/FAIL | +15% quality | | Meta-Prompting | PASS/FAIL | Evident | ### Integration Tests ({{n}} tests) | Test | Status | Notes | |------|--------|-------| | Multi-Command Workflow | PASS/FAIL | ... | | Regression Detection | PASS/FAIL | ... | ## Failed Tests (if any) [Detailed analysis of failures] ## Performance Metrics - Avg generation time: {{time}}s - Context usage: {{context}}% - Quality score: {{quality}} - Improvement rate: {{rate}}% ## Recommendations [What should be fixed or improved based on test results] ## Meta-Insights [What testing reveals about system capabilities and improvement areas] ## Next Steps 1. [Action item 1] 2. [Action item 2] -
Update Health Dashboard
Create/update
improvement_log/system_health.json:{ "last_test": "{{timestamp}}", "overall_status": "healthy|degraded|critical", "test_pass_rate": 0.95, "quality_trend": "improving|stable|declining", "metrics": { "quality_avg": 8.2, "efficiency": 0.82, "diversity": 0.88, "meta_awareness": 0.75 }, "issues": [ "Minor: Edge case in /generate-spec with empty patterns" ], "recent_improvements": [ "Quality +12% since last test", "Efficiency +8% since last test" ] }
Meta-Testing Pattern
This command tests itself using meta-prompting:
STRUCTURE: Self-testing system
ABSTRACTION: Testing validates structure and patterns, not just functionality
REASONING:
1. Define test structure (framework)
2. Execute tests (pattern validation)
3. Analyze results (meta-evaluation)
4. Report findings (actionable insights)
SELF_REFLECTION:
- Do my tests validate structure or just surface functionality?
- Are test patterns generalizable to new capabilities?
- Can I detect meta-level issues?
- Does testing improve the system?
META_CAPABILITY:
Testing should improve testability.
Each test run should enhance test coverage.
Failures should generate better tests.
Success Criteria
A successful self-test:
- Validates all critical functionality
- Detects regressions if present
- Measures key metrics accurately
- Provides actionable recommendations
- Shows meta-awareness (can test itself)
- Generates clear, useful reports
- Enables continuous improvement
This command validates system capabilities using meta-prompting principles. It tests structure and patterns, not just functionality, enabling generalizable quality assurance and continuous improvement.