# State Management System This directory contains run state files for the stateful infinite loop system. Each run generates a persistent state file that enables resumability, deduplication, and consistency validation. ## State File Format State files are named: `run_YYYYMMDD_HHMMSS.json` Example: `run_20250310_143022.json` ## State Schema ```json { "run_id": "run_20250310_143022", "spec_path": "specs/example_spec.md", "output_dir": "outputs", "total_count": 10, "url_strategy_path": "specs/url_strategy.json", "status": "in_progress", "created_at": "2025-03-10T14:30:22Z", "updated_at": "2025-03-10T14:35:10Z", "completed_iterations": 3, "failed_iterations": 0, "iterations": [ { "number": 1, "status": "completed", "output_file": "outputs/example_1.html", "web_url": "https://example.com/tutorial", "started_at": "2025-03-10T14:30:25Z", "completed_at": "2025-03-10T14:31:40Z", "validation_hash": "abc123def456", "metadata": {} } ], "used_urls": [ "https://example.com/tutorial", "https://example.com/advanced" ], "validation": { "last_check": "2025-03-10T14:35:10Z", "consistency_score": 1.0, "issues": [] } } ``` ## Field Descriptions ### Top Level - **run_id**: Unique identifier for this run (timestamp-based) - **spec_path**: Path to specification file used - **output_dir**: Directory where output files are generated - **total_count**: Total iterations requested (number or "infinite") - **url_strategy_path**: Optional URL strategy file path - **status**: Current run status (in_progress, paused, completed, failed) - **created_at**: ISO timestamp of run creation - **updated_at**: ISO timestamp of last state update - **completed_iterations**: Count of successfully completed iterations - **failed_iterations**: Count of failed iterations - **iterations**: Array of iteration records - **used_urls**: Array of all web URLs used (for deduplication) - **validation**: State consistency validation metadata ### Iteration Record - **number**: Iteration number (1-indexed) - **status**: Iteration status (pending, in_progress, completed, failed) - **output_file**: Path to generated output file - **web_url**: Web URL used for research (if applicable) - **started_at**: ISO timestamp of iteration start - **completed_at**: ISO timestamp of iteration completion - **validation_hash**: SHA256 hash of output file (first 16 chars) - **metadata**: Additional iteration-specific metadata ### Validation Record - **last_check**: ISO timestamp of last validation - **consistency_score**: Score from 0.0-1.0 based on multi-check validation - **issues**: Array of detected issues (if any) ## Status Values ### Run Status - `in_progress`: Run is currently executing - `paused`: Run was stopped but can be resumed - `completed`: Run finished successfully - `failed`: Run encountered unrecoverable error ### Iteration Status - `pending`: Iteration not yet started - `in_progress`: Iteration currently executing - `completed`: Iteration finished successfully - `failed`: Iteration encountered error ## State Operations ### Read State ```python import json with open('.claude/state/run_20250310_143022.json', 'r') as f: state = json.load(f) ``` ### Update State ```python import json from datetime import datetime # Load state with open(state_file, 'r') as f: state = json.load(f) # Modify state state['completed_iterations'] += 1 state['updated_at'] = datetime.now().isoformat() + 'Z' # Save atomically temp_file = f"{state_file}.tmp" with open(temp_file, 'w') as f: json.dump(state, f, indent=2) os.rename(temp_file, state_file) ``` ### Validate State ```python # Multiple consistency checks validations = [] # Check 1: File count file_count = len(glob.glob(f"{state['output_dir']}/*")) validations.append(file_count >= state['completed_iterations']) # Check 2: Iteration records iter_count = len([i for i in state['iterations'] if i['status'] == 'completed']) validations.append(iter_count == state['completed_iterations']) # Check 3: URL uniqueness url_count = len(state['used_urls']) unique_count = len(set(state['used_urls'])) validations.append(url_count == unique_count) # Majority voting consistency_score = sum(validations) / len(validations) ``` ## Self-Consistency Validation The state system applies self-consistency prompting principles: ### Multiple Sampling State validation samples multiple independent verification approaches: 1. File system count 2. State record count 3. URL uniqueness check 4. File existence verification 5. Timestamp chronology 6. Schema completeness ### Consistency Checks Each validation approach provides an independent assessment of state validity. ### Majority Voting Consistency score computed via majority voting: - Score = (passed checks) / (total checks) - Score >= 0.8: State is consistent - Score 0.5-0.79: State has warnings - Score < 0.5: State is corrupted This approach ensures reliable state validation even if individual checks have errors or edge cases. ## Recovery Scenarios ### Scenario 1: Graceful Interruption Run interrupted by user or context limit: - State saved at last batch completion - Resume continues from next iteration - No duplicate iterations or URLs ### Scenario 2: Crash Recovery Process crashes during execution: - State reflects last successful batch - Incomplete iterations not in state - Resume rebuilds from state safely ### Scenario 3: Manual File Modification User manually adds/removes output files: - State validation detects mismatch - `--rebuild` mode reconstructs state from files - Consistency checks identify discrepancies ### Scenario 4: State Corruption State file becomes corrupted: - `--verify` mode detects low consistency score - `--rebuild` mode can reconstruct from outputs - `--delete` mode provides clean start option ## Best Practices ### Do - Always save state after batch completion - Use atomic writes (temp file + rename) - Validate state before critical operations - Backup state before modifications - Use run_id for resume operations ### Don't - Manually edit state files unless necessary - Delete state files without backup - Assume state is always consistent - Skip validation checks - Reuse run_id for different specs ## File Management ### Cleanup Old States ```bash # List old state files ls -lh .claude/state/run_*.json # Remove states older than 30 days find .claude/state -name "run_*.json" -mtime +30 -delete # Keep only recent 10 runs ls -t .claude/state/run_*.json | tail -n +11 | xargs rm ``` ### Backup States ```bash # Backup all states tar -czf state_backup_$(date +%Y%m%d).tar.gz .claude/state/ # Restore from backup tar -xzf state_backup_20250310.tar.gz ``` ### Export State Summary ```bash # Generate summary of all runs for f in .claude/state/run_*.json; do echo "$(basename $f):" jq -r '"\(.status): \(.completed_iterations) iterations"' $f done ``` ## Troubleshooting ### Problem: State file corrupted **Solution**: Use `/reset-state --rebuild` to reconstruct from files ### Problem: Missing state file **Solution**: State was deleted or never created. Check backups or start new run. ### Problem: Low consistency score **Solution**: Run `/status ` to identify specific failed checks, then fix manually or rebuild. ### Problem: Duplicate URLs **Solution**: Rebuild state with `/reset-state --rebuild` to re-extract and deduplicate. ### Problem: File count mismatch **Solution**: Files may have been manually added/removed. Rebuild state or manually reconcile. ## Advanced Usage ### State Migration When updating state schema: ```python # Add migration function def migrate_v1_to_v2(old_state): new_state = old_state.copy() # Add new fields with defaults new_state['version'] = 2 new_state['migration_date'] = datetime.now().isoformat() return new_state # Apply migration state = load_state(state_file) if state.get('version', 1) == 1: state = migrate_v1_to_v2(state) save_state(state_file, state) ``` ### Custom Validation Add domain-specific checks: ```python def validate_output_quality(state): """Custom validation for output quality""" for iteration in state['iterations']: if iteration['status'] == 'completed': output_file = iteration['output_file'] # Check file size size = os.path.getsize(output_file) if size < 1000: # Less than 1KB return False return True ``` ### Parallel Run Detection Prevent concurrent runs: ```python import fcntl lock_file = '.claude/state/.lock' def acquire_lock(): lock = open(lock_file, 'w') try: fcntl.flock(lock, fcntl.LOCK_EX | fcntl.LOCK_NB) return lock except IOError: print("Another run is in progress") exit(1) ``` ## Related Commands - `/infinite-stateful`: Main command with state management - `/status`: View state and validate consistency - `/resume`: Resume interrupted run - `/reset-state`: Reset or rebuild state ## State Philosophy The state management system follows these principles: 1. **Single Source of Truth**: State file is authoritative 2. **Self-Consistency**: Multiple validation approaches with majority voting 3. **Resumability**: Any interruption can be recovered 4. **Atomicity**: State updates are atomic (temp + rename) 5. **Auditability**: All state changes are timestamped 6. **Durability**: State persists across sessions 7. **Deduplication**: Used resources tracked to prevent duplicates 8. **Validation**: Continuous consistency checking These principles ensure reliable operation even in adverse conditions.