infinite-agents-public/infinite_variants/infinite_variant_4/.claude/commands/quality-report.md

12 KiB
Raw Blame History

Quality Report Generation Command

Generate comprehensive quality reports with visualizations and strategic insights using ReAct reasoning.

Syntax

/quality-report <output_dir> [wave_number]

Parameters:

  • output_dir: Directory containing iterations and evaluations
  • wave_number: Optional - Generate report for specific wave (infinite mode)

Examples:

/quality-report output/
/quality-report output/ 3

Execution Process

THOUGHT Phase: Reasoning About Reporting

Before generating report, reason about:

  1. What is the purpose of this report?

    • Provide quality overview at a glance
    • Identify trends and patterns
    • Guide strategic decisions for next wave
    • Document quality evolution
  2. Who is the audience?

    • Primary: The orchestrator AI planning next wave
    • Secondary: Human users reviewing quality
    • Format should serve both audiences
  3. What insights matter most?

    • Overall quality trajectory
    • Dimension-specific patterns
    • Trade-offs and correlations
    • Actionable improvement opportunities
  4. How can I visualize quality effectively?

    • Text-based charts and distributions
    • Ranking tables
    • Trend indicators
    • Quality quadrant mappings

ACTION Phase: Generate Report

  1. Aggregate All Evaluation Data

    • Load all evaluations from {output_dir}/quality_reports/evaluations/
    • Load ranking data from {output_dir}/quality_reports/rankings/
    • Compile statistics across all iterations
    • Identify data completeness and gaps
  2. Calculate Comprehensive Statistics

    Overall Metrics:

    - Total iterations
    - Mean/median/mode for all dimensions
    - Standard deviations
    - Min/max/range
    - Quartile distributions
    - Coefficient of variation (CV = std/mean)
    

    Correlations:

    - Technical vs Creativity correlation
    - Creativity vs Compliance correlation
    - Technical vs Compliance correlation
    - Identify trade-off patterns
    

    Quality Progression:

    - Score trend over iteration sequence
    - Wave-over-wave improvement (infinite mode)
    - Improvement rate
    - Quality plateau detection
    
  3. Generate Visualizations (Text-Based)

    Score Distribution Chart:

    Composite Score Distribution
    
    90-100 ████                     (2)  10%
    80-89  ████████████             (6)  30%
    70-79  ████████████████         (8)  40%
    60-69  ████████                 (4)  20%
    50-59                           (0)   0%
    Below 50                        (0)   0%
    
    Distribution: Right-skewed, most iterations in 70-79 range
    

    Quality Quadrant Map:

    Technical vs Creativity Quadrant
    
    High Tech
    High Creative   Low Tech
                    High Creative
    
        ┌─────────────────┐
      C │  7,12,3  │  11  │  High Creativity
      r │─────────────────│  (> 75)
      e │  9,18,15 │  1,5 │  Low Creativity
      a │─────────────────│  (< 75)
      t └─────────────────┘
        Low Tech   High Tech
        (< 75)     (> 75)
    
    Insight: Most iterations cluster in high-tech, high-creative quadrant
    

    Dimension Radar Chart:

    Mean Scores by Dimension
    
            Technical (74.2)
                       ╲
     Compliance ───────── Creativity
       (67.3)            (75.8)
    
    Pattern: Creativity strongest, Compliance weakest
    

    Quality Timeline:

    Score Progression Over Iterations
    
    100 │
     90 │           ●
     80 │     ●   ● │ ●     ●
     70 │   ● │ ●   │   ● ● │ ●
     60 │ ●   │     │     ●   │
     50 │     │     │         │
        └─────┴─────┴─────────┴─────
         1-5  6-10  11-15  16-20
    
    Trend: Upward through iteration 12, then slight decline
    
  4. Identify Key Insights

    Use ReAct reasoning to discover:

    A. Surprising Patterns

    • Unexpected correlations
    • Counterintuitive rankings
    • Outliers that defy expectations

    B. Quality Drivers

    • What makes top iterations succeed?
    • Common characteristics of high scorers
    • Success factor analysis

    C. Quality Inhibitors

    • What causes low scores?
    • Common weaknesses across iterations
    • Failure pattern analysis

    D. Trade-off Analysis

    • Which dimensions compete?
    • Which dimensions synergize?
    • Optimal balance points

    E. Improvement Opportunities

    • Easiest wins (high impact, low effort)
    • Strategic pivots needed
    • Dimension-specific focus areas
  5. Generate Strategic Recommendations

    Based on observations, create actionable recommendations:

    For Next Wave:

    • Specific creative directions to try
    • Quality targets for each dimension
    • Techniques to amplify from top iterations
    • Pitfalls to avoid from low iterations

    For Spec Refinement:

    • Clarity improvements needed
    • Missing quality criteria
    • Ambiguous requirements to clarify

    For Evaluation System:

    • Criteria adjustments
    • Weight rebalancing
    • New evaluation dimensions to consider

OBSERVATION Phase: Reflect on Report Quality

After generating report, reason about:

  1. Is this report actionable?

    • Can recommendations be directly implemented?
    • Are insights specific enough?
    • Does it guide next wave effectively?
  2. Is this report honest?

    • Does it acknowledge weaknesses?
    • Are improvements realistic?
    • Does it avoid artificial positivity?
  3. Is this report comprehensive?

    • Covers all quality dimensions?
    • Addresses all iterations?
    • Provides both overview and detail?
  4. What meta-insights emerge?

    • How is the quality system itself performing?
    • Are we learning and improving?
    • Is the evaluation process working?

Report Template Structure

# Quality Evaluation Report

**Generated**: {timestamp}
**Directory**: {output_dir}
**Wave**: {wave_number} (if applicable)
**Iterations Evaluated**: {count}

---

## Executive Summary

### Overall Quality Assessment
{1-2 paragraph summary of overall quality state}

### Key Findings
1. {Most important insight}
2. {Second most important insight}
3. {Third most important insight}

### Strategic Recommendation
{Single most important action for next wave}

---

## Quality Metrics Overview

### Composite Scores
- **Mean**: {mean} / 100
- **Median**: {median} / 100
- **Std Dev**: {std}
- **Range**: {min} - {max}
- **Top Score**: {max} (iteration_{X})
- **Quality Spread**: {range} points

### Dimensional Breakdown

**Technical Quality**
- Mean: {tech_mean} / 100
- Range: {tech_min} - {tech_max}
- Top: iteration_{X} ({tech_max})
- Distribution: {description}

**Creativity Score**
- Mean: {creative_mean} / 100
- Range: {creative_min} - {creative_max}
- Top: iteration_{X} ({creative_max})
- Distribution: {description}

**Spec Compliance**
- Mean: {compliance_mean} / 100
- Range: {compliance_min} - {compliance_max}
- Top: iteration_{X} ({compliance_max})
- Distribution: {description}

---

## Visualizations

### Score Distribution
{Text-based histogram}

### Quality Quadrants
{Text-based quadrant map}

### Dimensional Radar
{Text-based radar chart}

### Score Progression
{Text-based timeline}

---

## Rankings Summary

### Top 5 Iterations
1. iteration_{X} - {score} - {profile} - {key_strength}
2. iteration_{Y} - {score} - {profile} - {key_strength}
3. iteration_{Z} - {score} - {profile} - {key_strength}
4. iteration_{A} - {score} - {profile} - {key_strength}
5. iteration_{B} - {score} - {profile} - {key_strength}

### Quality Segments
- **Exemplary (Top 20%)**: {count} iterations, avg {avg}
- **Proficient (30-50%)**: {count} iterations, avg {avg}
- **Adequate (50-80%)**: {count} iterations, avg {avg}
- **Developing (Bottom 20%)**: {count} iterations, avg {avg}

---

## Deep Analysis

### Quality Patterns

**Pattern 1: {Pattern Name}**
- Observations: {observations}
- Iterations: {affected_iterations}
- Impact: {quality_impact}
- Insight: {strategic_insight}

**Pattern 2: {Pattern Name}**
[... repeat ...]

### Quality Trade-offs

**Trade-off 1: {Dimension A} vs {Dimension B}**
- Correlation: {correlation_coefficient}
- Pattern: {description}
- Iterations Affected: {list}
- Strategic Implication: {insight}

**Trade-off 2: {Dimension A} vs {Dimension B}**
[... repeat ...]

### Quality Drivers

**What Makes Iterations Succeed:**
1. {Success factor 1} - Evidence: {iterations}
2. {Success factor 2} - Evidence: {iterations}
3. {Success factor 3} - Evidence: {iterations}

**What Causes Lower Scores:**
1. {Failure factor 1} - Evidence: {iterations}
2. {Failure factor 2} - Evidence: {iterations}
3. {Failure factor 3} - Evidence: {iterations}

---

## Strategic Insights

### Insight 1: {Insight Title}
**Observation**: {What we see in the data}
**Analysis**: {Why this matters}
**Implication**: {What this means for strategy}
**Action**: {What to do about it}

### Insight 2: {Insight Title}
[... repeat ...]

---

## Recommendations for Next Wave

### Priority 1: {Recommendation Title}
**Rationale**: {Why this matters}
**Action**: {Specific steps}
**Expected Impact**: {Quality improvement anticipated}
**Dimensions Affected**: {Which dimensions benefit}

### Priority 2: {Recommendation Title}
[... repeat ...]

### Creative Directions to Explore
1. {Direction 1} - Based on success of iteration_{X}
2. {Direction 2} - To address gap in {dimension}
3. {Direction 3} - To push frontier of {aspect}

### Quality Targets for Next Wave
- Technical Quality: Target mean of {target} (current: {current})
- Creativity Score: Target mean of {target} (current: {current})
- Spec Compliance: Target mean of {target} (current: {current})
- Composite: Target mean of {target} (current: {current})

---

## Quality System Performance

### Evaluation System Assessment
- **Differentiation**: {How well scores separate quality levels}
- **Consistency**: {How reliably criteria are applied}
- **Fairness**: {Whether scoring feels balanced}
- **Actionability**: {Whether results guide improvement}

### Recommended System Adjustments
1. {Adjustment 1}
2. {Adjustment 2}
3. {Adjustment 3}

---

## Appendix: Detailed Iteration Data

### Complete Rankings
{Full ranking table with all iterations}

### Evaluation Details
{Summary of each iteration's evaluation}

---

## Meta-Reflection: Quality of Quality Assessment

**Self-Evaluation of This Report:**
- Actionability: {assessment}
- Comprehensiveness: {assessment}
- Honesty: {assessment}
- Usefulness: {assessment}

**Report Limitations:**
- {Limitation 1}
- {Limitation 2}

**Confidence Level**: {High/Medium/Low} - {Reasoning}

---

*This report generated using ReAct pattern: Reasoning → Action → Observation*
*All insights derived from evidence-based analysis of evaluation data*

Output Storage

Reports are stored in:

{output_dir}/quality_reports/reports/wave_{N}_report.md
{output_dir}/quality_reports/reports/wave_{N}_data.json

Success Criteria

A successful quality report demonstrates:

  • Clear, actionable insights
  • Evidence-based recommendations
  • Comprehensive coverage of all quality dimensions
  • Honest assessment of strengths and weaknesses
  • Strategic guidance for improvement
  • ReAct-style reasoning throughout
  • Self-awareness about report quality

Remember: A quality report is only valuable if it drives improvement. Make every insight actionable, every observation meaningful, and every recommendation strategic.