13 KiB

Raw Blame History

Self Test - System Validation and Testing

Purpose: Validate system capabilities, test recent improvements, and ensure everything works as expected.

Usage

/self-test [scope] [depth]

Parameters

scope: What to test - "commands", "generation", "improvement", "integration", "all" (default: "all")
depth: Test thoroughness - "smoke", "standard", "comprehensive" (default: "standard")

Examples

# Quick smoke test of all systems
/self-test all smoke

# Standard test of generation capabilities
/self-test generation standard

# Comprehensive test of improvement system
/self-test improvement comprehensive

# Test command integration
/self-test integration standard

Command Implementation

You are the Self-Testing System. Your role is to validate that all components work correctly and improvements haven't broken functionality.

Phase 1: Test Planning

Determine Test Scope

Based on {{scope}} and {{depth}}:

Smoke Testing (quick validation):
- Basic functionality checks
- Critical path verification
- 5-10 minute execution
- Pass/fail only
Standard Testing (thorough validation):
- Full functionality coverage
- Quality metrics measurement
- 15-30 minute execution
- Detailed results
Comprehensive Testing (complete validation):
- All edge cases
- Performance benchmarking
- Integration testing
- Regression detection
- 45-90 minute execution
- Full diagnostics
Load Test Definitions

Read test specifications from specs/test_definitions.md (auto-generated if missing).

Phase 2: Command Testing

Test /infinite-meta Command

## Test: infinite-meta Basic Functionality

**Setup:**
- Use test spec: specs/example_spec.md
- Generate 3 iterations
- Output to: test_output/meta_test_{{timestamp}}/

**Expected Behavior:**
- Reads spec correctly
- Generates 3 unique outputs
- Creates improvement logs
- No errors

**Validation:**
- [ ] All 3 files created
- [ ] Files follow spec requirements
- [ ] Improvement log exists
- [ ] Files are unique (similarity < 70%)
- [ ] Quality score ≥ 7.0

**Result:** [PASS/FAIL]
**Notes:** [observations]

Test /improve-self Command

## Test: improve-self Analysis

**Setup:**
- Run: /improve-self all quick

**Expected Behavior:**
- Analyzes current system state
- Generates improvement proposals (3-10)
- Creates actionable recommendations
- Applies meta-prompting principles

**Validation:**
- [ ] Analysis completes without errors
- [ ] Proposals are concrete and actionable
- [ ] Risk assessments included
- [ ] Meta-prompting principles evident
- [ ] Output format is correct

**Result:** [PASS/FAIL]
**Notes:** [observations]

Test /generate-spec Command

## Test: generate-spec Creation

**Setup:**
- Run: /generate-spec patterns novel test_spec

**Expected Behavior:**
- Analyzes patterns successfully
- Generates valid specification
- Creates companion files
- Spec is actually usable

**Validation:**
- [ ] Spec file created with correct structure
- [ ] Guide file generated
- [ ] Examples file generated
- [ ] Spec follows meta-prompting principles
- [ ] Validation test passes

**Bonus Validation:**
- [ ] Use generated spec with /infinite-meta
- [ ] Verify it produces valid output

**Result:** [PASS/FAIL]
**Notes:** [observations]

Test /evolve-strategy Command

## Test: evolve-strategy Evolution

**Setup:**
- Run: /evolve-strategy quality incremental

**Expected Behavior:**
- Analyzes current strategy
- Generates evolution proposals
- Creates implementation plan
- Includes rollback procedures

**Validation:**
- [ ] Strategy analysis is thorough
- [ ] Evolution proposals are concrete
- [ ] Risk assessment included
- [ ] Validation plan exists
- [ ] Meta-level insights present

**Result:** [PASS/FAIL]
**Notes:** [observations]

Test /self-test Command (meta!)

## Test: self-test Self-Testing

**Setup:**
- This test is self-referential!

**Expected Behavior:**
- Can test itself
- Detects if testing framework is broken
- Meta-awareness demonstrated

**Validation:**
- [ ] This test runs successfully
- [ ] Can detect test failures
- [ ] Generates valid test reports
- [ ] Shows meta-level capability

**Result:** [PASS/FAIL]
**Notes:** [This is meta-testing - testing the tester]

Test /self-document Command

## Test: self-document Documentation

**Setup:**
- Run: /self-document all standard

**Expected Behavior:**
- Analyzes system state
- Generates/updates documentation
- Reflects current capabilities
- Clear and accurate

**Validation:**
- [ ] README.md updated/created
- [ ] Documentation is accurate
- [ ] Examples are current
- [ ] Reflects latest improvements
- [ ] Meta-capabilities documented

**Result:** [PASS/FAIL]
**Notes:** [observations]

Phase 3: Generation Testing

Test Content Generation Quality

## Test: Generation Quality Baseline

**Setup:**
- Generate 5 iterations with specs/example_spec.md
- Measure quality metrics

**Expected Metrics:**
- Quality avg: ≥ 7.5
- Uniqueness: ≥ 80%
- Spec compliance: 100%
- Meta-awareness: ≥ 6.0

**Validation:**
- [ ] All files generated successfully
- [ ] Quality meets threshold
- [ ] No duplicates
- [ ] Spec requirements met
- [ ] Meta-reflection present

**Result:** [PASS/FAIL]
**Metrics:** [actual values]

Test Progressive Improvement

## Test: Wave-Based Improvement

**Setup:**
- Run 3 waves of 5 iterations each
- Track quality across waves
- Enable improvement_mode = "evolve"

**Expected Behavior:**
- Quality improves across waves
- Diversity maintained
- Strategy evolves
- Meta-insights accumulated

**Validation:**
- [ ] Wave 2 quality > Wave 1 quality
- [ ] Wave 3 quality > Wave 2 quality
- [ ] Diversity maintained (≥ 80%)
- [ ] Strategy evolution documented
- [ ] Improvement logs created

**Result:** [PASS/FAIL]
**Metrics:** [quality progression]

Phase 4: Improvement System Testing

Test Self-Improvement Loop

## Test: Complete Self-Improvement Cycle

**Setup:**
1. Run baseline generation (5 iterations)
2. Run /improve-self to analyze
3. Run /evolve-strategy to evolve
4. Run improved generation (5 iterations)
5. Compare results

**Expected Behavior:**
- Improvements detected
- Strategy evolved
- Second generation better than first
- Changes are traceable

**Validation:**
- [ ] Improvement analysis complete
- [ ] Strategy evolution applied
- [ ] Measurable improvement in metrics
- [ ] All changes logged
- [ ] Rollback possible

**Result:** [PASS/FAIL]
**Improvement:** [% increase in quality]

Test Meta-Prompting Application

## Test: Meta-Prompting Principles

**Setup:**
- Review all generated content
- Analyze for meta-prompting principles

**Expected Behavior:**
- Structure-oriented approaches evident
- Abstract frameworks used
- Minimal example dependency
- Efficient reasoning patterns

**Validation:**
- [ ] Commands use structural patterns
- [ ] Specs define frameworks, not examples
- [ ] Reasoning is principle-based
- [ ] Meta-awareness present throughout
- [ ] Self-improvement capability demonstrated

**Result:** [PASS/FAIL]
**Notes:** [specific examples]

Phase 5: Integration Testing

Test Command Integration

## Test: Multi-Command Workflow

**Setup:**
1. /generate-spec patterns novel workflow_test
2. /infinite-meta workflow_test.md output/ 5 evolve
3. /improve-self all standard
4. /evolve-strategy quality incremental
5. /infinite-meta workflow_test.md output/ 5 evolve
6. /self-document all standard

**Expected Behavior:**
- All commands work together
- Data flows between commands
- Improvements compound
- System remains stable

**Validation:**
- [ ] All steps complete successfully
- [ ] Generated spec is usable
- [ ] Improvements detected and applied
- [ ] Second generation better than first
- [ ] Documentation reflects changes
- [ ] No data corruption or errors

**Result:** [PASS/FAIL]
**Notes:** [integration observations]

Test Regression Detection

## Test: No Regressions from Improvements

**Setup:**
- Load baseline metrics from improvement_log/baseline.json
- Run current system with same test cases
- Compare results

**Expected Behavior:**
- No metric should regress >10%
- Most metrics should improve
- Quality maintains or increases

**Validation:**
- [ ] Quality: No regression
- [ ] Efficiency: No regression
- [ ] Diversity: No regression
- [ ] Meta-awareness: No regression
- [ ] Overall: Net improvement

**Result:** [PASS/FAIL]
**Regressions:** [list any]
**Improvements:** [list all]

Phase 6: Reporting

Generate Test Report

Create improvement_log/test_report_{{timestamp}}.md:

# Self-Test Report - {{timestamp}}

**Scope:** {{scope}}
**Depth:** {{depth}}
**Duration:** {{duration}} minutes

## Executive Summary
- Total Tests: {{total}}
- Passed: {{passed}} ({{pass_rate}}%)
- Failed: {{failed}}
- Warnings: {{warnings}}

## Overall Status: [PASS/FAIL/WARNING]

## Test Results by Category

### Command Tests ({{n}} tests)
| Command | Status | Notes |
|---------|--------|-------|
| /infinite-meta | PASS/FAIL | ... |
| /improve-self | PASS/FAIL | ... |
| /generate-spec | PASS/FAIL | ... |
| /evolve-strategy | PASS/FAIL | ... |
| /self-test | PASS/FAIL | ... |
| /self-document | PASS/FAIL | ... |

### Generation Tests ({{n}} tests)
| Test | Status | Metrics |
|------|--------|---------|
| Quality Baseline | PASS/FAIL | 7.8 avg |
| Progressive Improvement | PASS/FAIL | +12% |

### Improvement Tests ({{n}} tests)
| Test | Status | Impact |
|------|--------|--------|
| Self-Improvement Loop | PASS/FAIL | +15% quality |
| Meta-Prompting | PASS/FAIL | Evident |

### Integration Tests ({{n}} tests)
| Test | Status | Notes |
|------|--------|-------|
| Multi-Command Workflow | PASS/FAIL | ... |
| Regression Detection | PASS/FAIL | ... |

## Failed Tests (if any)
[Detailed analysis of failures]

## Performance Metrics
- Avg generation time: {{time}}s
- Context usage: {{context}}%
- Quality score: {{quality}}
- Improvement rate: {{rate}}%

## Recommendations
[What should be fixed or improved based on test results]

## Meta-Insights
[What testing reveals about system capabilities and improvement areas]

## Next Steps
1. [Action item 1]
2. [Action item 2]

Update Health Dashboard

Create/update improvement_log/system_health.json:

{
  "last_test": "{{timestamp}}",
  "overall_status": "healthy|degraded|critical",
  "test_pass_rate": 0.95,
  "quality_trend": "improving|stable|declining",
  "metrics": {
    "quality_avg": 8.2,
    "efficiency": 0.82,
    "diversity": 0.88,
    "meta_awareness": 0.75
  },
  "issues": [
    "Minor: Edge case in /generate-spec with empty patterns"
  ],
  "recent_improvements": [
    "Quality +12% since last test",
    "Efficiency +8% since last test"
  ]
}

Meta-Testing Pattern

This command tests itself using meta-prompting:

STRUCTURE: Self-testing system
ABSTRACTION: Testing validates structure and patterns, not just functionality

REASONING:
1. Define test structure (framework)
2. Execute tests (pattern validation)
3. Analyze results (meta-evaluation)
4. Report findings (actionable insights)

SELF_REFLECTION:
- Do my tests validate structure or just surface functionality?
- Are test patterns generalizable to new capabilities?
- Can I detect meta-level issues?
- Does testing improve the system?

META_CAPABILITY:
Testing should improve testability.
Each test run should enhance test coverage.
Failures should generate better tests.

Success Criteria

A successful self-test:

Validates all critical functionality
Detects regressions if present
Measures key metrics accurately
Provides actionable recommendations
Shows meta-awareness (can test itself)
Generates clear, useful reports
Enables continuous improvement

This command validates system capabilities using meta-prompting principles. It tests structure and patterns, not just functionality, enabling generalizable quality assurance and continuous improvement.

13 KiB Raw Blame History