"""
Expand phi-fractal training data to 100+ examples
Covers diverse causal reasoning scenarios
"""

import json
import random

# Load existing examples
with open('phi_fractal_training_data.json', 'r') as f:
    data = json.load(f)

existing = data['examples']

# Generate many more examples covering different causal patterns
new_examples = [
    # Spurious Correlation Examples
    {
        "instruction": "Shark attacks and ice cream sales are correlated. Do ice cream sales cause shark attacks?",
        "input": "",
        "output": """<think>
**Layer 5 - Chronos (Temporal Sequence):**
- Summer → hot weather
- Hot weather → ice cream sales rise
- Hot weather → people swim more → shark encounters increase

**Layer 6 - Causalis (Direct Cause-Effect):**
- Heat → ice cream consumption ✓
- More swimmers → shark encounters ✓
- Ice cream → shark aggression? ✗

**Layer 7 - ChainReasoning (Causal Chain):**
- Summer heat causes BOTH ice cream sales AND swimming
- These are parallel, independent effects

**Layer 8 - CounterFactual (Test):**
- Ban ice cream: shark attacks continue (people still swim)
- Close beaches: ice cream sales continue
- No causal link
</think>

No. This is spurious correlation caused by a confounding variable (summer weather). Both increase for independent reasons."""
    },
    
    {
        "instruction": "Countries with more Nobel Prize winners have higher chocolate consumption. Does chocolate cause Nobel Prizes?",
        "input": "",
        "output": """<think>
**Layer 5 - Chronos (Temporal Sequence):**
- Wealthy countries develop → better education systems
- Better education → more research funding → more Nobel winners
- Wealth also → higher chocolate consumption

**Layer 6 - Causalis (Direct Cause-Effect):**
- Education/research funding → Nobel Prizes ✓
- Wealth → chocolate consumption ✓
- Chocolate → intelligence? ✗

**Layer 7 - ChainReasoning (Causal Chain):**
- National wealth → education systems + chocolate imports
- Education → research → Nobel Prizes
- These are separate chains from common cause

**Layer 8 - CounterFactual (Test):**
- Ban chocolate: Nobel Prizes would continue (education remains)
- Poor education but chocolate: No Nobel Prizes
- Wealth is confounding variable
</think>

No. National wealth is a confounding variable causing both high chocolate consumption and strong education systems."""
    },
    
    # Direct Causation Examples
    {
        "instruction": "You flip a light switch and the room gets bright. What caused the light?",
        "input": "",
        "output": """<think>
**Layer 5 - Chronos (Temporal Sequence):**
- T1: Switch flipped
- T2: Circuit closes immediately
- T3: Current flows
- T4: Light illuminates

**Layer 6 - Causalis (Direct Cause-Effect):**
- Switch flip → circuit closes ✓
- Closed circuit → current flows ✓
- Current → light bulb illuminates ✓

**Layer 7 - ChainReasoning (Causal Chain):**
- Physical action (flip) → mechanical switch → electrical connection → current flow → filament heats → light emitted
- Direct, immediate causal chain

**Layer 8 - CounterFactual (Test):**
- No switch flip: room stays dark
- Switch flip without power: no light
- Flipping switch is necessary and sufficient
</think>

Flipping the switch caused the light. This is direct causation through a clear physical mechanism."""
    },
    
    {
        "instruction": "A match is struck and ignites. What caused the fire?",
        "input": "",
        "output": """<think>
**Layer 5 - Chronos (Temporal Sequence):**
- T1: Match struck against surface
- T2: Friction generates heat
- T3: Chemical reaction begins
- T4: Flame appears

**Layer 6 - Causalis (Direct Cause-Effect):**
- Friction → heat generation ✓
- Heat + chemicals → ignition ✓
- Ignition → flame ✓

**Layer 7 - ChainReasoning (Causal Chain):**
- Physical motion → friction → heat → activation energy reached → chemical reaction → combustion → flame
- Each step enables the next

**Layer 8 - CounterFactual (Test):**
- No striking: no flame
- Strike without chemicals: no flame
- Strike without friction surface: no flame
- All components necessary
</think>

Striking the match caused the fire through friction-generated heat initiating a chemical reaction."""
    },
    
    # Reverse Causation Examples
    {
        "instruction": "People who exercise more have gym memberships. Do gym memberships cause exercise?",
        "input": "",
        "output": """<think>
**Layer 5 - Chronos (Temporal Sequence):**
- Person decides to exercise regularly
- Person buys gym membership
- Person uses gym to exercise
- Membership exists AFTER decision, not before

**Layer 6 - Causalis (Direct Cause-Effect):**
- Desire to exercise → buy membership ✓
- Membership → enables exercise ✓
- But: Decision precedes membership

**Layer 7 - ChainReasoning (Causal Chain):**
- Motivation to exercise → seek facilities → purchase membership → use facilities
- Membership is tool, not cause

**Layer 8 - CounterFactual (Test):**
- No membership: can still exercise (outdoors, home)
- Membership without motivation: no exercise
- Motivation is primary cause
</think>

This is partially reverse causation. While membership enables exercise, the decision to exercise causes membership purchase. Motivation is the root cause."""
    },
    
    {
        "instruction": "Firefighters are present at fires. Do firefighters cause fires?",
        "input": "",
        "output": """<think>
**Layer 5 - Chronos (Temporal Sequence):**
- Fire starts → someone calls 911
- Firefighters dispatched → arrive at scene
- Firefighters present AFTER fire starts

**Layer 6 - Causalis (Direct Cause-Effect):**
- Fire → call for help ✓
- Call → firefighters arrive ✓
- Firefighters → extinguish fire ✓
- Firefighters → cause fire? ✗

**Layer 7 - ChainReasoning (Causal Chain):**
- Fire occurs → detected → emergency response → firefighters arrive → fight fire
- Firefighters are RESPONSE, not cause

**Layer 8 - CounterFactual (Test):**
- No firefighters: fire continues/worsens
- No fire: firefighters not dispatched
- Fire precedes firefighter arrival
</think>

No. This is reverse causation - fires cause firefighters to be present, not the other way around. Firefighters respond to fires."""
    },
]

# Add confounding variable examples
confounding_examples = [
    {
        "instruction": "Students who attend tutoring get better grades. Does tutoring cause better grades?",
        "input": "",
        "output": """<think>
**Layer 5 - Chronos (Temporal Sequence):**
- Students struggle → seek tutoring
- Or: Motivated students → proactively get tutoring
- Tutoring sessions occur
- Grades improve (or were already improving)

**Layer 6 - Causalis (Direct Cause-Effect):**
- Tutoring → learning support ✓
- But: Motivation might cause both tutoring AND better grades
- Selection bias: motivated students choose tutoring

**Layer 7 - ChainReasoning (Causal Chain):**
- High motivation → seek tutoring + study more → better grades
- OR: Low grades → tutoring → improved understanding → better grades
- Both chains possible

**Layer 8 - CounterFactual (Test):**
- Tutoring without effort: minimal improvement
- High motivation without tutoring: still improves
- Tutoring helps but motivation is confounding variable
</think>

Tutoring likely helps, but student motivation is a confounding variable. Motivated students both seek tutoring AND study harder."""
    },
]

# Add necessary vs sufficient causation
necessary_sufficient = [
    {
        "instruction": "Plants need water to survive. You watered a plant and it lived. Did water cause it to live?",
        "input": "",
        "output": """<think>
**Layer 5 - Chronos (Temporal Sequence):**
- Regular watering → plant remains hydrated
- Hydration maintained → cellular functions continue
- Continuous function → plant survives

**Layer 6 - Causalis (Direct Cause-Effect):**
- Water → enables photosynthesis ✓
- Water → cellular hydration ✓
- Water necessary but not sufficient alone

**Layer 7 - ChainReasoning (Causal Chain):**
- Water + sunlight + nutrients → photosynthesis → energy → growth → survival
- Water is ONE necessary component

**Layer 8 - CounterFactual (Test):**
- No water: plant dies (water is necessary)
- Water but no sunlight: plant dies (not sufficient alone)
- Water is necessary but requires other factors
</think>

Water was necessary for survival but not sufficient alone. The plant also needed sunlight, nutrients, and proper temperature. Water is a necessary cause."""
    },
]

# Combine all examples
all_examples = existing + new_examples + confounding_examples + necessary_sufficient

# Generate more variations programmatically
templates = [
    ("Rooster crows before sunrise. Does the rooster cause the sun to rise?", "spurious"),
    ("Wearing shoes to bed correlates with waking up with a headache. Do shoes cause headaches?", "spurious"),
    ("The weatherman predicts rain and it rains. Did the prediction cause the rain?", "reverse"),
    ("Smoking correlates with yellow fingers. Do yellow fingers cause smoking?", "reverse"),
]

for template_q, cause_type in templates:
    # Generate example based on template type
    pass  # Add more if needed

# Save expanded dataset
output = {
    "examples": all_examples,
    "metadata": {
        "purpose": "Fine-tune for phi-fractal causal reasoning",
        "layers": ["Chronos (Temporal)", "Causalis (Direct Cause)", "ChainReasoning (Causal Chains)", "CounterFactual (Testing)"],
        "target_improvement": "Fluid intelligence and causal reasoning",
        "total_examples": len(all_examples),
        "categories": {
            "spurious_correlation": 2,
            "direct_causation": 2,
            "reverse_causation": 2,
            "confounding_variables": 1,
            "necessary_sufficient": 1,
            "original": len(existing)
        }
    }
}

with open('phi_fractal_training_data_expanded.json', 'w') as f:
    json.dump(output, f, indent=2)

print(f"✅ Expanded to {len(all_examples)} training examples")
print(f"📊 Categories covered:")
print(f"   - Spurious correlation")
print(f"   - Direct causation")
print(f"   - Reverse causation")
print(f"   - Confounding variables")
print(f"   - Necessary vs sufficient")
print(f"💾 Saved to: phi_fractal_training_data_expanded.json")
print(f"\n🎯 Next: Create 50+ more examples to reach 100 total")
