# Ranking Utility Command

Rank all iterations in a directory based on composite quality scores using ReAct reasoning.

## Syntax

```
/rank <output_dir> [dimension]
```

**Parameters:**
- `output_dir`: Directory containing iterations and evaluation results
- `dimension`: Optional - Rank by specific dimension (technical/creativity/compliance) instead of composite

**Examples:**
```
/rank output/
/rank output/ creativity
/rank output/ technical
```

## Execution Process

### THOUGHT Phase: Reasoning About Ranking

Before ranking, reason about:

1. **What makes a fair ranking system?**
   - Consistent evaluation criteria across all iterations
   - Appropriate weighting of dimensions
   - Recognition of different quality profiles
   - Avoidance of artificial precision

2. **What patterns should I look for?**
   - Quality clusters (groups of similar scores)
   - Outliers (exceptionally high or low)
   - Quality trade-offs (high in one dimension, low in another)
   - Quality progression (improvement over iteration sequence)

3. **How should I interpret rankings?**
   - Top 20%: Exemplary iterations
   - Middle 60%: Solid, meeting expectations
   - Bottom 20%: Learning opportunities
   - Not about "bad" vs "good" but about relative quality

4. **What insights can rankings reveal?**
   - Which creative directions succeed?
   - Which quality dimensions need more focus?
   - Are there unexpected quality leaders?
   - Is quality improving over time?

### ACTION Phase: Execute Ranking

1. **Load All Evaluations**

   - Scan `{output_dir}/quality_reports/evaluations/` for all evaluation JSON files
   - Parse each evaluation result
   - Extract scores for all dimensions
   - Verify evaluation completeness

2. **Calculate Composite Scores** (if not already calculated)

   For each iteration:
   ```
   composite_score = (technical * 0.35) + (creativity * 0.35) + (compliance * 0.30)
   ```

   Store in ranking structure:
   ```json
   {
     "iteration": "iteration_001.html",
     "scores": {
       "technical": 78,
       "creativity": 82,
       "compliance": 68,
       "composite": 76.0
     }
   }
   ```

3. **Sort by Selected Dimension**

   - Sort iterations by composite score (or specified dimension)
   - Maintain stable sort (preserve order for ties)
   - Assign ranks (1 = highest)

4. **Calculate Statistics**

   ```
   Statistics:
   - Count: Total number of iterations
   - Mean: Average score
   - Median: Middle value
   - Std Dev: Score distribution spread
   - Min: Lowest score
   - Max: Highest score
   - Range: Max - Min
   - Quartiles: Q1 (25th %), Q2 (50th %), Q3 (75th %)
   ```

5. **Identify Quality Segments**

   - **Exemplary (Top 20%)**: Rank 1 to ceil(count * 0.2)
   - **Proficient (Next 30%)**: Rank ceil(count * 0.2)+1 to ceil(count * 0.5)
   - **Adequate (Next 30%)**: Rank ceil(count * 0.5)+1 to ceil(count * 0.8)
   - **Developing (Bottom 20%)**: Rank ceil(count * 0.8)+1 to count

6. **Analyze Quality Profiles**

   For each iteration, determine quality profile:

   ```python
   def quality_profile(tech, creative, compliance):
       if tech > 80 and creative > 80 and compliance > 80:
           return "Triple Threat - Excellent in all dimensions"
       elif tech > 80 and creative > 80:
           return "Technical Innovator - Strong tech + creativity"
       elif creative > 80 and compliance > 80:
           return "Compliant Creator - Creative within bounds"
       elif tech > 80 and compliance > 80:
           return "Reliable Engineer - Solid technical compliance"
       elif creative > 80:
           return "Creative Maverick - Innovation focus"
       elif tech > 80:
           return "Technical Specialist - Engineering excellence"
       elif compliance > 80:
           return "Spec Guardian - Perfect adherence"
       else:
           return "Balanced Generalist - Even across dimensions"
   ```

### OBSERVATION Phase: Document Rankings

Output comprehensive ranking report:

```
=== QUALITY RANKINGS REPORT ===

Directory: output/
Ranked by: Composite Score
Total Iterations: 20
Generated: 2025-10-10T14:45:23Z

--- SUMMARY STATISTICS ---

Composite Scores:
  Mean:     72.4
  Median:   73.5
  Std Dev:  8.2
  Min:      58.0
  Max:      89.5
  Range:    31.5

Quartiles:
  Q1 (25%): 67.2
  Q2 (50%): 73.5
  Q3 (75%): 78.8

--- TOP PERFORMERS (Top 20%) ---

Rank 1: iteration_012.html - Score: 89.5
  Technical: 92 | Creativity: 95 | Compliance: 78
  Profile: Technical Innovator - Strong tech + creativity
  Strengths: Exceptional innovation, excellent code quality, novel approach
  Notable: Highest creativity score in entire batch

Rank 2: iteration_007.html - Score: 86.2
  Technical: 88 | Creativity: 89 | Compliance: 81
  Profile: Triple Threat - Excellent in all dimensions
  Strengths: Well-rounded excellence, balanced quality, consistent execution
  Notable: Most balanced high performer

Rank 3: iteration_018.html - Score: 84.7
  Technical: 85 | Creativity: 82 | Compliance: 87
  Profile: Reliable Engineer - Solid technical compliance
  Strengths: Perfect spec adherence, clean architecture, robust implementation
  Notable: Highest compliance score in batch

Rank 4: iteration_003.html - Score: 82.1
  Technical: 80 | Creativity: 88 | Compliance: 76
  Profile: Creative Maverick - Innovation focus
  Strengths: Unique visual design, innovative interactions, aesthetic excellence

--- PROFICIENT PERFORMERS (30-50%) ---

Rank 5: iteration_015.html - Score: 78.9
  Technical: 77 | Creativity: 79 | Compliance: 80
  Profile: Balanced Generalist - Even across dimensions

Rank 6: iteration_009.html - Score: 77.6
  Technical: 82 | Creativity: 75 | Compliance: 76
  Profile: Technical Specialist - Engineering excellence

[... continues ...]

--- DEVELOPING ITERATIONS (Bottom 20%) ---

Rank 17: iteration_005.html - Score: 62.3
  Technical: 65 | Creativity: 68 | Compliance: 55
  Profile: Balanced Generalist - Even across dimensions
  Growth Areas: Improve spec compliance, strengthen naming conventions

Rank 18: iteration_011.html - Score: 60.8
  Technical: 58 | Creativity: 72 | Compliance: 52
  Profile: Creative Maverick - Innovation focus
  Growth Areas: Boost technical robustness, enhance spec adherence

Rank 19: iteration_016.html - Score: 59.4
  Technical: 62 | Creativity: 55 | Compliance: 61
  Profile: Balanced Generalist - Even across dimensions
  Growth Areas: Increase creativity, explore unique approaches

Rank 20: iteration_001.html - Score: 58.0
  Technical: 60 | Creativity: 58 | Compliance: 56
  Profile: Balanced Generalist - Even across dimensions
  Growth Areas: Early iteration - establish stronger foundation

--- DIMENSIONAL ANALYSIS ---

Technical Quality Distribution:
  Mean: 74.2, Range: 58-92
  Top: iteration_012 (92)
  Pattern: Strong technical quality overall, few outliers

Creativity Score Distribution:
  Mean: 75.8, Range: 55-95
  Top: iteration_012 (95)
  Pattern: Wide distribution, high variance in creative approaches

Spec Compliance Distribution:
  Mean: 67.3, Range: 52-87
  Top: iteration_018 (87)
  Pattern: Compliance varies significantly, improvement opportunity

--- QUALITY TRADE-OFFS ---

Trade-off Pattern 1: "Creativity vs Compliance"
  Iterations: 003, 011, 004
  Pattern: High creativity (avg 85) paired with lower compliance (avg 62)
  Insight: Creative explorations sometimes sacrifice spec adherence

Trade-off Pattern 2: "Technical vs Creative"
  Iterations: 006, 013
  Pattern: High technical (avg 88) paired with moderate creativity (avg 70)
  Insight: Technical focus may constrain creative experimentation

--- QUALITY INSIGHTS ---

1. Quality Leaders Excel in Balance
   - Top 3 iterations all score 80+ in at least 2 dimensions
   - Success requires multi-dimensional excellence, not single strength

2. Compliance is Weakest Dimension
   - Mean compliance (67.3) lags technical (74.2) and creativity (75.8)
   - 60% of iterations score below 70 in compliance
   - Recommendation: Emphasize spec adherence in next wave

3. Creativity Shows Highest Variance
   - Std dev of 12.1 (vs 8.4 technical, 9.2 compliance)
   - Indicates diverse creative approaches - positive diversity
   - Some iterations play it safe, others push boundaries

4. Quality Improves Mid-Batch
   - Iterations 7-15 show 8% higher average scores than 1-6 or 16-20
   - Pattern suggests learning curve, then fatigue/repetition
   - Recommendation: Maintain mid-batch momentum in future waves

5. No "Perfect 100" Iterations
   - Max score: 89.5 (iteration_012)
   - Indicates room for improvement across all dimensions
   - Opportunity: Study iteration_012 and push further

--- RECOMMENDATIONS FOR NEXT WAVE ---

Based on ranking analysis:

1. **Amplify Success Patterns**
   - Study iteration_012 creative techniques
   - Replicate iteration_018 compliance approach
   - Maintain iteration_007 balanced excellence

2. **Address Compliance Gap**
   - Provide clearer spec guidance in sub-agent prompts
   - Add compliance checkpoints during generation
   - Review spec for clarity issues

3. **Encourage Balanced Excellence**
   - Reward multi-dimensional quality over single-dimension spikes
   - Design creative directions that maintain compliance
   - Set minimum thresholds for all dimensions (e.g., 70+)

4. **Explore Quality Frontiers**
   - Current max is 89.5 - can we reach 95+?
   - Identify specific innovations from top iterations
   - Push technical, creative, AND compliance simultaneously

5. **Maintain Creative Diversity**
   - High creativity variance is valuable
   - Continue diverse creative directions
   - But add "creative compliance" as explicit goal

--- RANKING DATA (JSON) ---

[Export full ranking data as JSON for programmatic access]

```

### THOUGHT Phase: Reflect on Rankings

After generating rankings, reason about:

1. **Do the rankings make sense?**
   - Do high-ranked iterations genuinely feel higher quality?
   - Are low-ranked iterations actually weaker?
   - Any surprising rankings that warrant investigation?

2. **What story do the rankings tell?**
   - Is quality improving, declining, or stable?
   - Are there clear quality clusters?
   - What separates good from great?

3. **How should this inform strategy?**
   - What should next wave prioritize?
   - Which creative directions should be amplified?
   - Which quality dimensions need focus?

4. **Are evaluation criteria working?**
   - Do scores differentiate quality meaningfully?
   - Are weights (35/35/30) appropriate?
   - Should criteria be adjusted?

## Output Storage

Rankings are stored in:

```
{output_dir}/quality_reports/rankings/ranking_report.md
{output_dir}/quality_reports/rankings/ranking_data.json
```

JSON format enables:
- Historical tracking
- Trend analysis
- Visualization
- Programmatic access

## Success Criteria

A successful ranking demonstrates:

- Clear differentiation of quality levels
- Meaningful insights about quality patterns
- Actionable recommendations for improvement
- Fair and consistent application of criteria
- Transparent reasoning about rankings
- Evidence-based quality assessment

---

**Remember**: Rankings are not judgments of worth - they're tools for learning. Every iteration teaches us something about quality, and rankings help us identify patterns and opportunities for growth.