12 KiB
Quality Report Generation Command
Generate comprehensive quality reports with visualizations and strategic insights using ReAct reasoning.
Syntax
/quality-report <output_dir> [wave_number]
Parameters:
output_dir: Directory containing iterations and evaluationswave_number: Optional - Generate report for specific wave (infinite mode)
Examples:
/quality-report output/
/quality-report output/ 3
Execution Process
THOUGHT Phase: Reasoning About Reporting
Before generating report, reason about:
-
What is the purpose of this report?
- Provide quality overview at a glance
- Identify trends and patterns
- Guide strategic decisions for next wave
- Document quality evolution
-
Who is the audience?
- Primary: The orchestrator AI planning next wave
- Secondary: Human users reviewing quality
- Format should serve both audiences
-
What insights matter most?
- Overall quality trajectory
- Dimension-specific patterns
- Trade-offs and correlations
- Actionable improvement opportunities
-
How can I visualize quality effectively?
- Text-based charts and distributions
- Ranking tables
- Trend indicators
- Quality quadrant mappings
ACTION Phase: Generate Report
-
Aggregate All Evaluation Data
- Load all evaluations from
{output_dir}/quality_reports/evaluations/ - Load ranking data from
{output_dir}/quality_reports/rankings/ - Compile statistics across all iterations
- Identify data completeness and gaps
- Load all evaluations from
-
Calculate Comprehensive Statistics
Overall Metrics:
- Total iterations - Mean/median/mode for all dimensions - Standard deviations - Min/max/range - Quartile distributions - Coefficient of variation (CV = std/mean)Correlations:
- Technical vs Creativity correlation - Creativity vs Compliance correlation - Technical vs Compliance correlation - Identify trade-off patternsQuality Progression:
- Score trend over iteration sequence - Wave-over-wave improvement (infinite mode) - Improvement rate - Quality plateau detection -
Generate Visualizations (Text-Based)
Score Distribution Chart:
Composite Score Distribution 90-100 ████ (2) 10% 80-89 ████████████ (6) 30% 70-79 ████████████████ (8) 40% 60-69 ████████ (4) 20% 50-59 (0) 0% Below 50 (0) 0% Distribution: Right-skewed, most iterations in 70-79 rangeQuality Quadrant Map:
Technical vs Creativity Quadrant High Tech High Creative Low Tech High Creative ┌─────────────────┐ C │ 7,12,3 │ 11 │ High Creativity r │─────────────────│ (> 75) e │ 9,18,15 │ 1,5 │ Low Creativity a │─────────────────│ (< 75) t └─────────────────┘ Low Tech High Tech (< 75) (> 75) Insight: Most iterations cluster in high-tech, high-creative quadrantDimension Radar Chart:
Mean Scores by Dimension Technical (74.2) ╱ ╲ ╱ ╲ ╱ ╲ Compliance ───────── Creativity (67.3) (75.8) Pattern: Creativity strongest, Compliance weakestQuality Timeline:
Score Progression Over Iterations 100 │ 90 │ ● 80 │ ● ● │ ● ● 70 │ ● │ ● │ ● ● │ ● 60 │ ● │ │ ● │ 50 │ │ │ │ └─────┴─────┴─────────┴───── 1-5 6-10 11-15 16-20 Trend: Upward through iteration 12, then slight decline -
Identify Key Insights
Use ReAct reasoning to discover:
A. Surprising Patterns
- Unexpected correlations
- Counterintuitive rankings
- Outliers that defy expectations
B. Quality Drivers
- What makes top iterations succeed?
- Common characteristics of high scorers
- Success factor analysis
C. Quality Inhibitors
- What causes low scores?
- Common weaknesses across iterations
- Failure pattern analysis
D. Trade-off Analysis
- Which dimensions compete?
- Which dimensions synergize?
- Optimal balance points
E. Improvement Opportunities
- Easiest wins (high impact, low effort)
- Strategic pivots needed
- Dimension-specific focus areas
-
Generate Strategic Recommendations
Based on observations, create actionable recommendations:
For Next Wave:
- Specific creative directions to try
- Quality targets for each dimension
- Techniques to amplify from top iterations
- Pitfalls to avoid from low iterations
For Spec Refinement:
- Clarity improvements needed
- Missing quality criteria
- Ambiguous requirements to clarify
For Evaluation System:
- Criteria adjustments
- Weight rebalancing
- New evaluation dimensions to consider
OBSERVATION Phase: Reflect on Report Quality
After generating report, reason about:
-
Is this report actionable?
- Can recommendations be directly implemented?
- Are insights specific enough?
- Does it guide next wave effectively?
-
Is this report honest?
- Does it acknowledge weaknesses?
- Are improvements realistic?
- Does it avoid artificial positivity?
-
Is this report comprehensive?
- Covers all quality dimensions?
- Addresses all iterations?
- Provides both overview and detail?
-
What meta-insights emerge?
- How is the quality system itself performing?
- Are we learning and improving?
- Is the evaluation process working?
Report Template Structure
# Quality Evaluation Report
**Generated**: {timestamp}
**Directory**: {output_dir}
**Wave**: {wave_number} (if applicable)
**Iterations Evaluated**: {count}
---
## Executive Summary
### Overall Quality Assessment
{1-2 paragraph summary of overall quality state}
### Key Findings
1. {Most important insight}
2. {Second most important insight}
3. {Third most important insight}
### Strategic Recommendation
{Single most important action for next wave}
---
## Quality Metrics Overview
### Composite Scores
- **Mean**: {mean} / 100
- **Median**: {median} / 100
- **Std Dev**: {std}
- **Range**: {min} - {max}
- **Top Score**: {max} (iteration_{X})
- **Quality Spread**: {range} points
### Dimensional Breakdown
**Technical Quality**
- Mean: {tech_mean} / 100
- Range: {tech_min} - {tech_max}
- Top: iteration_{X} ({tech_max})
- Distribution: {description}
**Creativity Score**
- Mean: {creative_mean} / 100
- Range: {creative_min} - {creative_max}
- Top: iteration_{X} ({creative_max})
- Distribution: {description}
**Spec Compliance**
- Mean: {compliance_mean} / 100
- Range: {compliance_min} - {compliance_max}
- Top: iteration_{X} ({compliance_max})
- Distribution: {description}
---
## Visualizations
### Score Distribution
{Text-based histogram}
### Quality Quadrants
{Text-based quadrant map}
### Dimensional Radar
{Text-based radar chart}
### Score Progression
{Text-based timeline}
---
## Rankings Summary
### Top 5 Iterations
1. iteration_{X} - {score} - {profile} - {key_strength}
2. iteration_{Y} - {score} - {profile} - {key_strength}
3. iteration_{Z} - {score} - {profile} - {key_strength}
4. iteration_{A} - {score} - {profile} - {key_strength}
5. iteration_{B} - {score} - {profile} - {key_strength}
### Quality Segments
- **Exemplary (Top 20%)**: {count} iterations, avg {avg}
- **Proficient (30-50%)**: {count} iterations, avg {avg}
- **Adequate (50-80%)**: {count} iterations, avg {avg}
- **Developing (Bottom 20%)**: {count} iterations, avg {avg}
---
## Deep Analysis
### Quality Patterns
**Pattern 1: {Pattern Name}**
- Observations: {observations}
- Iterations: {affected_iterations}
- Impact: {quality_impact}
- Insight: {strategic_insight}
**Pattern 2: {Pattern Name}**
[... repeat ...]
### Quality Trade-offs
**Trade-off 1: {Dimension A} vs {Dimension B}**
- Correlation: {correlation_coefficient}
- Pattern: {description}
- Iterations Affected: {list}
- Strategic Implication: {insight}
**Trade-off 2: {Dimension A} vs {Dimension B}**
[... repeat ...]
### Quality Drivers
**What Makes Iterations Succeed:**
1. {Success factor 1} - Evidence: {iterations}
2. {Success factor 2} - Evidence: {iterations}
3. {Success factor 3} - Evidence: {iterations}
**What Causes Lower Scores:**
1. {Failure factor 1} - Evidence: {iterations}
2. {Failure factor 2} - Evidence: {iterations}
3. {Failure factor 3} - Evidence: {iterations}
---
## Strategic Insights
### Insight 1: {Insight Title}
**Observation**: {What we see in the data}
**Analysis**: {Why this matters}
**Implication**: {What this means for strategy}
**Action**: {What to do about it}
### Insight 2: {Insight Title}
[... repeat ...]
---
## Recommendations for Next Wave
### Priority 1: {Recommendation Title}
**Rationale**: {Why this matters}
**Action**: {Specific steps}
**Expected Impact**: {Quality improvement anticipated}
**Dimensions Affected**: {Which dimensions benefit}
### Priority 2: {Recommendation Title}
[... repeat ...]
### Creative Directions to Explore
1. {Direction 1} - Based on success of iteration_{X}
2. {Direction 2} - To address gap in {dimension}
3. {Direction 3} - To push frontier of {aspect}
### Quality Targets for Next Wave
- Technical Quality: Target mean of {target} (current: {current})
- Creativity Score: Target mean of {target} (current: {current})
- Spec Compliance: Target mean of {target} (current: {current})
- Composite: Target mean of {target} (current: {current})
---
## Quality System Performance
### Evaluation System Assessment
- **Differentiation**: {How well scores separate quality levels}
- **Consistency**: {How reliably criteria are applied}
- **Fairness**: {Whether scoring feels balanced}
- **Actionability**: {Whether results guide improvement}
### Recommended System Adjustments
1. {Adjustment 1}
2. {Adjustment 2}
3. {Adjustment 3}
---
## Appendix: Detailed Iteration Data
### Complete Rankings
{Full ranking table with all iterations}
### Evaluation Details
{Summary of each iteration's evaluation}
---
## Meta-Reflection: Quality of Quality Assessment
**Self-Evaluation of This Report:**
- Actionability: {assessment}
- Comprehensiveness: {assessment}
- Honesty: {assessment}
- Usefulness: {assessment}
**Report Limitations:**
- {Limitation 1}
- {Limitation 2}
**Confidence Level**: {High/Medium/Low} - {Reasoning}
---
*This report generated using ReAct pattern: Reasoning → Action → Observation*
*All insights derived from evidence-based analysis of evaluation data*
Output Storage
Reports are stored in:
{output_dir}/quality_reports/reports/wave_{N}_report.md
{output_dir}/quality_reports/reports/wave_{N}_data.json
Success Criteria
A successful quality report demonstrates:
- Clear, actionable insights
- Evidence-based recommendations
- Comprehensive coverage of all quality dimensions
- Honest assessment of strengths and weaknesses
- Strategic guidance for improvement
- ReAct-style reasoning throughout
- Self-awareness about report quality
Remember: A quality report is only valuable if it drives improvement. Make every insight actionable, every observation meaningful, and every recommendation strategic.