# State Management System

This directory contains run state files for the stateful infinite loop system. Each run generates a persistent state file that enables resumability, deduplication, and consistency validation.

## State File Format

State files are named: `run_YYYYMMDD_HHMMSS.json`

Example: `run_20250310_143022.json`

## State Schema

```json
{
  "run_id": "run_20250310_143022",
  "spec_path": "specs/example_spec.md",
  "output_dir": "outputs",
  "total_count": 10,
  "url_strategy_path": "specs/url_strategy.json",
  "status": "in_progress",
  "created_at": "2025-03-10T14:30:22Z",
  "updated_at": "2025-03-10T14:35:10Z",
  "completed_iterations": 3,
  "failed_iterations": 0,
  "iterations": [
    {
      "number": 1,
      "status": "completed",
      "output_file": "outputs/example_1.html",
      "web_url": "https://example.com/tutorial",
      "started_at": "2025-03-10T14:30:25Z",
      "completed_at": "2025-03-10T14:31:40Z",
      "validation_hash": "abc123def456",
      "metadata": {}
    }
  ],
  "used_urls": [
    "https://example.com/tutorial",
    "https://example.com/advanced"
  ],
  "validation": {
    "last_check": "2025-03-10T14:35:10Z",
    "consistency_score": 1.0,
    "issues": []
  }
}
```

## Field Descriptions

### Top Level

- **run_id**: Unique identifier for this run (timestamp-based)
- **spec_path**: Path to specification file used
- **output_dir**: Directory where output files are generated
- **total_count**: Total iterations requested (number or "infinite")
- **url_strategy_path**: Optional URL strategy file path
- **status**: Current run status (in_progress, paused, completed, failed)
- **created_at**: ISO timestamp of run creation
- **updated_at**: ISO timestamp of last state update
- **completed_iterations**: Count of successfully completed iterations
- **failed_iterations**: Count of failed iterations
- **iterations**: Array of iteration records
- **used_urls**: Array of all web URLs used (for deduplication)
- **validation**: State consistency validation metadata

### Iteration Record

- **number**: Iteration number (1-indexed)
- **status**: Iteration status (pending, in_progress, completed, failed)
- **output_file**: Path to generated output file
- **web_url**: Web URL used for research (if applicable)
- **started_at**: ISO timestamp of iteration start
- **completed_at**: ISO timestamp of iteration completion
- **validation_hash**: SHA256 hash of output file (first 16 chars)
- **metadata**: Additional iteration-specific metadata

### Validation Record

- **last_check**: ISO timestamp of last validation
- **consistency_score**: Score from 0.0-1.0 based on multi-check validation
- **issues**: Array of detected issues (if any)

## Status Values

### Run Status

- `in_progress`: Run is currently executing
- `paused`: Run was stopped but can be resumed
- `completed`: Run finished successfully
- `failed`: Run encountered unrecoverable error

### Iteration Status

- `pending`: Iteration not yet started
- `in_progress`: Iteration currently executing
- `completed`: Iteration finished successfully
- `failed`: Iteration encountered error

## State Operations

### Read State

```python
import json

with open('.claude/state/run_20250310_143022.json', 'r') as f:
    state = json.load(f)
```

### Update State

```python
import json
from datetime import datetime

# Load state
with open(state_file, 'r') as f:
    state = json.load(f)

# Modify state
state['completed_iterations'] += 1
state['updated_at'] = datetime.now().isoformat() + 'Z'

# Save atomically
temp_file = f"{state_file}.tmp"
with open(temp_file, 'w') as f:
    json.dump(state, f, indent=2)
os.rename(temp_file, state_file)
```

### Validate State

```python
# Multiple consistency checks
validations = []

# Check 1: File count
file_count = len(glob.glob(f"{state['output_dir']}/*"))
validations.append(file_count >= state['completed_iterations'])

# Check 2: Iteration records
iter_count = len([i for i in state['iterations'] if i['status'] == 'completed'])
validations.append(iter_count == state['completed_iterations'])

# Check 3: URL uniqueness
url_count = len(state['used_urls'])
unique_count = len(set(state['used_urls']))
validations.append(url_count == unique_count)

# Majority voting
consistency_score = sum(validations) / len(validations)
```

## Self-Consistency Validation

The state system applies self-consistency prompting principles:

### Multiple Sampling

State validation samples multiple independent verification approaches:
1. File system count
2. State record count
3. URL uniqueness check
4. File existence verification
5. Timestamp chronology
6. Schema completeness

### Consistency Checks

Each validation approach provides an independent assessment of state validity.

### Majority Voting

Consistency score computed via majority voting:
- Score = (passed checks) / (total checks)
- Score >= 0.8: State is consistent
- Score 0.5-0.79: State has warnings
- Score < 0.5: State is corrupted

This approach ensures reliable state validation even if individual checks have errors or edge cases.

## Recovery Scenarios

### Scenario 1: Graceful Interruption

Run interrupted by user or context limit:
- State saved at last batch completion
- Resume continues from next iteration
- No duplicate iterations or URLs

### Scenario 2: Crash Recovery

Process crashes during execution:
- State reflects last successful batch
- Incomplete iterations not in state
- Resume rebuilds from state safely

### Scenario 3: Manual File Modification

User manually adds/removes output files:
- State validation detects mismatch
- `--rebuild` mode reconstructs state from files
- Consistency checks identify discrepancies

### Scenario 4: State Corruption

State file becomes corrupted:
- `--verify` mode detects low consistency score
- `--rebuild` mode can reconstruct from outputs
- `--delete` mode provides clean start option

## Best Practices

### Do

- Always save state after batch completion
- Use atomic writes (temp file + rename)
- Validate state before critical operations
- Backup state before modifications
- Use run_id for resume operations

### Don't

- Manually edit state files unless necessary
- Delete state files without backup
- Assume state is always consistent
- Skip validation checks
- Reuse run_id for different specs

## File Management

### Cleanup Old States

```bash
# List old state files
ls -lh .claude/state/run_*.json

# Remove states older than 30 days
find .claude/state -name "run_*.json" -mtime +30 -delete

# Keep only recent 10 runs
ls -t .claude/state/run_*.json | tail -n +11 | xargs rm
```

### Backup States

```bash
# Backup all states
tar -czf state_backup_$(date +%Y%m%d).tar.gz .claude/state/

# Restore from backup
tar -xzf state_backup_20250310.tar.gz
```

### Export State Summary

```bash
# Generate summary of all runs
for f in .claude/state/run_*.json; do
  echo "$(basename $f):"
  jq -r '"\(.status): \(.completed_iterations) iterations"' $f
done
```

## Troubleshooting

### Problem: State file corrupted

**Solution**: Use `/reset-state <run_id> --rebuild` to reconstruct from files

### Problem: Missing state file

**Solution**: State was deleted or never created. Check backups or start new run.

### Problem: Low consistency score

**Solution**: Run `/status <run_id>` to identify specific failed checks, then fix manually or rebuild.

### Problem: Duplicate URLs

**Solution**: Rebuild state with `/reset-state <run_id> --rebuild` to re-extract and deduplicate.

### Problem: File count mismatch

**Solution**: Files may have been manually added/removed. Rebuild state or manually reconcile.

## Advanced Usage

### State Migration

When updating state schema:

```python
# Add migration function
def migrate_v1_to_v2(old_state):
    new_state = old_state.copy()
    # Add new fields with defaults
    new_state['version'] = 2
    new_state['migration_date'] = datetime.now().isoformat()
    return new_state

# Apply migration
state = load_state(state_file)
if state.get('version', 1) == 1:
    state = migrate_v1_to_v2(state)
    save_state(state_file, state)
```

### Custom Validation

Add domain-specific checks:

```python
def validate_output_quality(state):
    """Custom validation for output quality"""
    for iteration in state['iterations']:
        if iteration['status'] == 'completed':
            output_file = iteration['output_file']
            # Check file size
            size = os.path.getsize(output_file)
            if size < 1000:  # Less than 1KB
                return False
    return True
```

### Parallel Run Detection

Prevent concurrent runs:

```python
import fcntl

lock_file = '.claude/state/.lock'

def acquire_lock():
    lock = open(lock_file, 'w')
    try:
        fcntl.flock(lock, fcntl.LOCK_EX | fcntl.LOCK_NB)
        return lock
    except IOError:
        print("Another run is in progress")
        exit(1)
```

## Related Commands

- `/infinite-stateful`: Main command with state management
- `/status`: View state and validate consistency
- `/resume`: Resume interrupted run
- `/reset-state`: Reset or rebuild state

## State Philosophy

The state management system follows these principles:

1. **Single Source of Truth**: State file is authoritative
2. **Self-Consistency**: Multiple validation approaches with majority voting
3. **Resumability**: Any interruption can be recovered
4. **Atomicity**: State updates are atomic (temp + rename)
5. **Auditability**: All state changes are timestamped
6. **Durability**: State persists across sessions
7. **Deduplication**: Used resources tracked to prevent duplicates
8. **Validation**: Continuous consistency checking

These principles ensure reliable operation even in adverse conditions.