# Ranking Utility Command Rank all iterations in a directory based on composite quality scores using ReAct reasoning. ## Syntax ``` /rank [dimension] ``` **Parameters:** - `output_dir`: Directory containing iterations and evaluation results - `dimension`: Optional - Rank by specific dimension (technical/creativity/compliance) instead of composite **Examples:** ``` /rank output/ /rank output/ creativity /rank output/ technical ``` ## Execution Process ### THOUGHT Phase: Reasoning About Ranking Before ranking, reason about: 1. **What makes a fair ranking system?** - Consistent evaluation criteria across all iterations - Appropriate weighting of dimensions - Recognition of different quality profiles - Avoidance of artificial precision 2. **What patterns should I look for?** - Quality clusters (groups of similar scores) - Outliers (exceptionally high or low) - Quality trade-offs (high in one dimension, low in another) - Quality progression (improvement over iteration sequence) 3. **How should I interpret rankings?** - Top 20%: Exemplary iterations - Middle 60%: Solid, meeting expectations - Bottom 20%: Learning opportunities - Not about "bad" vs "good" but about relative quality 4. **What insights can rankings reveal?** - Which creative directions succeed? - Which quality dimensions need more focus? - Are there unexpected quality leaders? - Is quality improving over time? ### ACTION Phase: Execute Ranking 1. **Load All Evaluations** - Scan `{output_dir}/quality_reports/evaluations/` for all evaluation JSON files - Parse each evaluation result - Extract scores for all dimensions - Verify evaluation completeness 2. **Calculate Composite Scores** (if not already calculated) For each iteration: ``` composite_score = (technical * 0.35) + (creativity * 0.35) + (compliance * 0.30) ``` Store in ranking structure: ```json { "iteration": "iteration_001.html", "scores": { "technical": 78, "creativity": 82, "compliance": 68, "composite": 76.0 } } ``` 3. **Sort by Selected Dimension** - Sort iterations by composite score (or specified dimension) - Maintain stable sort (preserve order for ties) - Assign ranks (1 = highest) 4. **Calculate Statistics** ``` Statistics: - Count: Total number of iterations - Mean: Average score - Median: Middle value - Std Dev: Score distribution spread - Min: Lowest score - Max: Highest score - Range: Max - Min - Quartiles: Q1 (25th %), Q2 (50th %), Q3 (75th %) ``` 5. **Identify Quality Segments** - **Exemplary (Top 20%)**: Rank 1 to ceil(count * 0.2) - **Proficient (Next 30%)**: Rank ceil(count * 0.2)+1 to ceil(count * 0.5) - **Adequate (Next 30%)**: Rank ceil(count * 0.5)+1 to ceil(count * 0.8) - **Developing (Bottom 20%)**: Rank ceil(count * 0.8)+1 to count 6. **Analyze Quality Profiles** For each iteration, determine quality profile: ```python def quality_profile(tech, creative, compliance): if tech > 80 and creative > 80 and compliance > 80: return "Triple Threat - Excellent in all dimensions" elif tech > 80 and creative > 80: return "Technical Innovator - Strong tech + creativity" elif creative > 80 and compliance > 80: return "Compliant Creator - Creative within bounds" elif tech > 80 and compliance > 80: return "Reliable Engineer - Solid technical compliance" elif creative > 80: return "Creative Maverick - Innovation focus" elif tech > 80: return "Technical Specialist - Engineering excellence" elif compliance > 80: return "Spec Guardian - Perfect adherence" else: return "Balanced Generalist - Even across dimensions" ``` ### OBSERVATION Phase: Document Rankings Output comprehensive ranking report: ``` === QUALITY RANKINGS REPORT === Directory: output/ Ranked by: Composite Score Total Iterations: 20 Generated: 2025-10-10T14:45:23Z --- SUMMARY STATISTICS --- Composite Scores: Mean: 72.4 Median: 73.5 Std Dev: 8.2 Min: 58.0 Max: 89.5 Range: 31.5 Quartiles: Q1 (25%): 67.2 Q2 (50%): 73.5 Q3 (75%): 78.8 --- TOP PERFORMERS (Top 20%) --- Rank 1: iteration_012.html - Score: 89.5 Technical: 92 | Creativity: 95 | Compliance: 78 Profile: Technical Innovator - Strong tech + creativity Strengths: Exceptional innovation, excellent code quality, novel approach Notable: Highest creativity score in entire batch Rank 2: iteration_007.html - Score: 86.2 Technical: 88 | Creativity: 89 | Compliance: 81 Profile: Triple Threat - Excellent in all dimensions Strengths: Well-rounded excellence, balanced quality, consistent execution Notable: Most balanced high performer Rank 3: iteration_018.html - Score: 84.7 Technical: 85 | Creativity: 82 | Compliance: 87 Profile: Reliable Engineer - Solid technical compliance Strengths: Perfect spec adherence, clean architecture, robust implementation Notable: Highest compliance score in batch Rank 4: iteration_003.html - Score: 82.1 Technical: 80 | Creativity: 88 | Compliance: 76 Profile: Creative Maverick - Innovation focus Strengths: Unique visual design, innovative interactions, aesthetic excellence --- PROFICIENT PERFORMERS (30-50%) --- Rank 5: iteration_015.html - Score: 78.9 Technical: 77 | Creativity: 79 | Compliance: 80 Profile: Balanced Generalist - Even across dimensions Rank 6: iteration_009.html - Score: 77.6 Technical: 82 | Creativity: 75 | Compliance: 76 Profile: Technical Specialist - Engineering excellence [... continues ...] --- DEVELOPING ITERATIONS (Bottom 20%) --- Rank 17: iteration_005.html - Score: 62.3 Technical: 65 | Creativity: 68 | Compliance: 55 Profile: Balanced Generalist - Even across dimensions Growth Areas: Improve spec compliance, strengthen naming conventions Rank 18: iteration_011.html - Score: 60.8 Technical: 58 | Creativity: 72 | Compliance: 52 Profile: Creative Maverick - Innovation focus Growth Areas: Boost technical robustness, enhance spec adherence Rank 19: iteration_016.html - Score: 59.4 Technical: 62 | Creativity: 55 | Compliance: 61 Profile: Balanced Generalist - Even across dimensions Growth Areas: Increase creativity, explore unique approaches Rank 20: iteration_001.html - Score: 58.0 Technical: 60 | Creativity: 58 | Compliance: 56 Profile: Balanced Generalist - Even across dimensions Growth Areas: Early iteration - establish stronger foundation --- DIMENSIONAL ANALYSIS --- Technical Quality Distribution: Mean: 74.2, Range: 58-92 Top: iteration_012 (92) Pattern: Strong technical quality overall, few outliers Creativity Score Distribution: Mean: 75.8, Range: 55-95 Top: iteration_012 (95) Pattern: Wide distribution, high variance in creative approaches Spec Compliance Distribution: Mean: 67.3, Range: 52-87 Top: iteration_018 (87) Pattern: Compliance varies significantly, improvement opportunity --- QUALITY TRADE-OFFS --- Trade-off Pattern 1: "Creativity vs Compliance" Iterations: 003, 011, 004 Pattern: High creativity (avg 85) paired with lower compliance (avg 62) Insight: Creative explorations sometimes sacrifice spec adherence Trade-off Pattern 2: "Technical vs Creative" Iterations: 006, 013 Pattern: High technical (avg 88) paired with moderate creativity (avg 70) Insight: Technical focus may constrain creative experimentation --- QUALITY INSIGHTS --- 1. Quality Leaders Excel in Balance - Top 3 iterations all score 80+ in at least 2 dimensions - Success requires multi-dimensional excellence, not single strength 2. Compliance is Weakest Dimension - Mean compliance (67.3) lags technical (74.2) and creativity (75.8) - 60% of iterations score below 70 in compliance - Recommendation: Emphasize spec adherence in next wave 3. Creativity Shows Highest Variance - Std dev of 12.1 (vs 8.4 technical, 9.2 compliance) - Indicates diverse creative approaches - positive diversity - Some iterations play it safe, others push boundaries 4. Quality Improves Mid-Batch - Iterations 7-15 show 8% higher average scores than 1-6 or 16-20 - Pattern suggests learning curve, then fatigue/repetition - Recommendation: Maintain mid-batch momentum in future waves 5. No "Perfect 100" Iterations - Max score: 89.5 (iteration_012) - Indicates room for improvement across all dimensions - Opportunity: Study iteration_012 and push further --- RECOMMENDATIONS FOR NEXT WAVE --- Based on ranking analysis: 1. **Amplify Success Patterns** - Study iteration_012 creative techniques - Replicate iteration_018 compliance approach - Maintain iteration_007 balanced excellence 2. **Address Compliance Gap** - Provide clearer spec guidance in sub-agent prompts - Add compliance checkpoints during generation - Review spec for clarity issues 3. **Encourage Balanced Excellence** - Reward multi-dimensional quality over single-dimension spikes - Design creative directions that maintain compliance - Set minimum thresholds for all dimensions (e.g., 70+) 4. **Explore Quality Frontiers** - Current max is 89.5 - can we reach 95+? - Identify specific innovations from top iterations - Push technical, creative, AND compliance simultaneously 5. **Maintain Creative Diversity** - High creativity variance is valuable - Continue diverse creative directions - But add "creative compliance" as explicit goal --- RANKING DATA (JSON) --- [Export full ranking data as JSON for programmatic access] ``` ### THOUGHT Phase: Reflect on Rankings After generating rankings, reason about: 1. **Do the rankings make sense?** - Do high-ranked iterations genuinely feel higher quality? - Are low-ranked iterations actually weaker? - Any surprising rankings that warrant investigation? 2. **What story do the rankings tell?** - Is quality improving, declining, or stable? - Are there clear quality clusters? - What separates good from great? 3. **How should this inform strategy?** - What should next wave prioritize? - Which creative directions should be amplified? - Which quality dimensions need focus? 4. **Are evaluation criteria working?** - Do scores differentiate quality meaningfully? - Are weights (35/35/30) appropriate? - Should criteria be adjusted? ## Output Storage Rankings are stored in: ``` {output_dir}/quality_reports/rankings/ranking_report.md {output_dir}/quality_reports/rankings/ranking_data.json ``` JSON format enables: - Historical tracking - Trend analysis - Visualization - Programmatic access ## Success Criteria A successful ranking demonstrates: - Clear differentiation of quality levels - Meaningful insights about quality patterns - Actionable recommendations for improvement - Fair and consistent application of criteria - Transparent reasoning about rankings - Evidence-based quality assessment --- **Remember**: Rankings are not judgments of worth - they're tools for learning. Every iteration teaches us something about quality, and rankings help us identify patterns and opportunities for growth.