skillby blueman82

config-schema-migrator

Expert at evolving Pydantic configuration schemas with backward compatibility and automated migrations

Installs: 0
Used in: 1 repos
Updated: 2d ago
$npx ai-builder add skill blueman82/config-schema-migrator

Installs to .claude/skills/config-schema-migrator/

# Config Schema Migrator Skill

## When to Use This Skill

Activate this skill when you need to:
- Evolve Pydantic configuration schemas (add fields, change types, restructure sections)
- Maintain backward compatibility with existing config files
- Write migration scripts to automate config updates
- Implement environment variable substitution in config fields
- Add deprecation warnings for old config patterns
- Validate config schemas with field validators
- Create discriminated unions for config type discrimination

## Key Principles

1. **Backward Compatibility First**: Never break existing configs without a migration path
2. **Deprecation Before Removal**: Warn users before removing old config sections
3. **Automatic Migration**: Provide scripts to automate config updates (don't force manual editing)
4. **Type Safety**: Use Pydantic validators to catch config errors at load time
5. **Environment Variables**: Support ${VAR_NAME} substitution for secrets
6. **Clear Messaging**: Provide helpful error messages with migration instructions

## Pattern 1: Adding New Config Sections with Backward Compatibility

### Example: Adding `adapters` Section While Keeping `cli_tools`

**Problem**: You need to add a new config section (`adapters`) to replace an old one (`cli_tools`) without breaking existing configs.

**Solution Pattern** (from `models/config.py`):

```python
from typing import Optional
from pydantic import BaseModel
import warnings

class Config(BaseModel):
    """Root configuration model."""

    # New section (preferred)
    adapters: Optional[dict[str, AdapterConfig]] = None

    # Legacy section (deprecated)
    cli_tools: Optional[dict[str, CLIToolConfig]] = None

    def model_post_init(self, __context):
        """Post-initialization validation."""
        # Ensure at least one section exists
        if self.adapters is None and self.cli_tools is None:
            raise ValueError(
                "Configuration must include either 'adapters' or 'cli_tools' section"
            )

        # Emit deprecation warning for old section
        if self.cli_tools is not None and self.adapters is None:
            warnings.warn(
                "The 'cli_tools' configuration section is deprecated. "
                "Please migrate to 'adapters' section with explicit 'type' field. "
                "See migration guide: docs/migration/cli_tools_to_adapters.md",
                DeprecationWarning,
                stacklevel=2,
            )
```

**Key Techniques**:
- Use `Optional` for both old and new sections
- Validate in `model_post_init()` that at least one exists
- Emit `DeprecationWarning` when old section is used
- Reference migration documentation in warning message
- Allow both sections temporarily for gradual migration

## Pattern 2: Type Discrimination with Discriminated Unions

### Example: CLI vs HTTP Adapters

**Problem**: You have config objects that can be one of several types (CLI adapter, HTTP adapter, etc).

**Solution Pattern** (from `models/config.py`):

```python
from typing import Annotated, Literal, Union
from pydantic import BaseModel, Field

class CLIAdapterConfig(BaseModel):
    """Configuration for CLI-based adapter."""
    type: Literal["cli"] = "cli"
    command: str
    args: list[str]
    timeout: int = 60

class HTTPAdapterConfig(BaseModel):
    """Configuration for HTTP-based adapter."""
    type: Literal["http"] = "http"
    base_url: str
    api_key: Optional[str] = None
    timeout: int = 60
    max_retries: int = 3

# Discriminated union - Pydantic uses 'type' field to determine which model
AdapterConfig = Annotated[
    Union[CLIAdapterConfig, HTTPAdapterConfig],
    Field(discriminator="type")
]
```

**YAML Usage**:
```yaml
adapters:
  claude:
    type: cli  # Discriminator field
    command: "claude"
    args: ["-p", "{prompt}"]
    timeout: 60

  ollama:
    type: http  # Different type triggers HTTPAdapterConfig
    base_url: "http://localhost:11434"
    timeout: 120
```

**Key Techniques**:
- Use `Literal["value"]` for discriminator field with default value
- Create `Annotated[Union[...], Field(discriminator="type")]`
- Pydantic automatically routes to correct model based on `type` field
- Each type has different required fields (validated automatically)

## Pattern 3: Environment Variable Substitution

### Example: API Keys and Secrets

**Problem**: You need to inject secrets from environment variables without hardcoding in YAML.

**Solution Pattern** (from `models/config.py`):

```python
import os
import re
from pydantic import BaseModel, field_validator

class HTTPAdapterConfig(BaseModel):
    """Configuration for HTTP-based adapter."""
    base_url: str
    api_key: Optional[str] = None

    @field_validator("api_key", "base_url")
    @classmethod
    def resolve_env_vars(cls, v: Optional[str], info) -> Optional[str]:
        """Resolve ${ENV_VAR} references in string fields."""
        if v is None:
            return v

        # Pattern: ${VAR_NAME}
        pattern = r"\$\{([^}]+)\}"
        is_api_key = info.field_name == "api_key"

        def replacer(match):
            env_var = match.group(1)
            value = os.getenv(env_var)
            if value is None:
                # For optional fields like api_key, use sentinel
                if is_api_key:
                    return "__MISSING_API_KEY__"
                # For required fields, raise error
                raise ValueError(
                    f"Environment variable '{env_var}' is not set. "
                    f"Required for configuration."
                )
            return value

        result = re.sub(pattern, replacer, v)

        # If api_key has sentinel marker, return None (graceful degradation)
        if is_api_key and "__MISSING_API_KEY__" in result:
            return None

        return result
```

**YAML Usage**:
```yaml
adapters:
  openrouter:
    type: http
    base_url: "https://openrouter.ai/api/v1"
    api_key: "${OPENROUTER_API_KEY}"  # Resolved from environment
```

**Key Techniques**:
- Use `@field_validator` on fields that may contain env vars
- Use `info.field_name` to customize behavior per field
- Regex pattern `r"\$\{([^}]+)\}"` to find `${VAR_NAME}`
- For optional fields (api_key): return `None` if env var missing
- For required fields (base_url): raise `ValueError` if env var missing
- Always load `.env` file first with `dotenv.load_dotenv()` in `load_config()`

## Pattern 4: Writing Migration Scripts

### Example: CLI Tools to Adapters Migration

**Problem**: You need to migrate existing config files from old format to new format automatically.

**Solution Pattern** (from `scripts/migrate_config.py`):

```python
#!/usr/bin/env python3
"""
Migration script: cli_tools → adapters

Migrates config.yaml from legacy cli_tools format to new adapters format
with explicit type fields.

Usage:
    python scripts/migrate_config.py [path/to/config.yaml]
"""

import shutil
import sys
from pathlib import Path
from typing import Any, Dict
import yaml

def migrate_config_dict(config: Dict[str, Any]) -> Dict[str, Any]:
    """
    Migrate config dictionary from cli_tools to adapters format.

    Returns:
        Migrated config with adapters section
    """
    # If already migrated, return as-is
    if "adapters" in config and "cli_tools" not in config:
        print("Info: Config already migrated (has 'adapters' section)")
        return config

    # If no cli_tools, nothing to migrate
    if "cli_tools" not in config:
        print("Warning: No 'cli_tools' section found, nothing to migrate")
        return config

    # Create new config with adapters
    migrated = config.copy()

    # Transform cli_tools → adapters
    adapters = {}
    for name, cli_config in config["cli_tools"].items():
        adapters[name] = {
            "type": "cli",  # Add explicit type discriminator
            "command": cli_config["command"],
            "args": cli_config["args"],
            "timeout": cli_config["timeout"],
        }

    migrated["adapters"] = adapters
    del migrated["cli_tools"]

    print(f"Success: Migrated {len(adapters)} CLI tools to adapters format")
    return migrated

def migrate_config_file(path: str) -> None:
    """
    Migrate config file from cli_tools to adapters format.

    Creates a backup at {path}.bak before modifying.
    """
    config_path = Path(path)

    if not config_path.exists():
        raise FileNotFoundError(f"Config file not found: {path}")

    # Create backup BEFORE modifying
    backup_path = Path(f"{path}.bak")
    shutil.copy2(config_path, backup_path)
    print(f"Created backup: {backup_path}")

    # Load config
    with open(config_path, "r") as f:
        config = yaml.safe_load(f)

    # Migrate
    migrated = migrate_config_dict(config)

    # Write migrated config
    with open(config_path, "w") as f:
        yaml.dump(migrated, f, default_flow_style=False, sort_keys=False)

    print(f"Migrated config written to: {config_path}")
    print(f"\nInfo: Review the changes and delete {backup_path} when satisfied.")

def main():
    """Main entry point."""
    config_path = sys.argv[1] if len(sys.argv) > 1 else "config.yaml"

    print(f"Migrating config: {config_path}")
    print("-" * 50)

    try:
        migrate_config_file(config_path)
        print("\nMigration complete!")
        print("\nNext steps:")
        print("1. Review the migrated config.yaml")
        print("2. Test loading: python -c 'from models.config import load_config; load_config()'")
        print("3. Delete backup if satisfied: rm config.yaml.bak")
    except Exception as e:
        print(f"\nError: Migration failed: {e}")
        sys.exit(1)

if __name__ == "__main__":
    main()
```

**Key Techniques**:
- **Always create backup** before modifying config file (`shutil.copy2`)
- **Idempotent migrations**: Check if already migrated, return early if so
- **Separate dict and file logic**: `migrate_config_dict()` for logic, `migrate_config_file()` for I/O
- **Clear console output**: Print status messages for user feedback
- **Testing instructions**: Print validation commands after migration
- **Error handling**: Catch exceptions, print helpful message, exit with code 1
- **YAML preservation**: Use `sort_keys=False` to preserve key order

## Pattern 5: Field Validation for Path Resolution

### Example: Database Path with Environment Variables

**Problem**: You need to resolve relative paths and environment variables for config fields.

**Solution Pattern** (from `models/config.py`):

```python
import os
import re
from pathlib import Path
from pydantic import BaseModel, field_validator

class DecisionGraphConfig(BaseModel):
    """Configuration for decision graph memory."""
    db_path: str = "decision_graph.db"

    @field_validator("db_path")
    @classmethod
    def resolve_db_path(cls, v: str) -> str:
        """
        Resolve db_path to absolute path relative to project root.

        Processing steps:
        1. Resolve ${ENV_VAR} environment variable references
        2. Convert relative paths to absolute paths relative to project root
        3. Keep absolute paths unchanged
        4. Return normalized absolute path as string

        Examples:
            "decision_graph.db" → "/path/to/project/decision_graph.db"
            "/tmp/foo.db" → "/tmp/foo.db" (unchanged)
            "${DATA_DIR}/graph.db" → "/var/data/graph.db" (if DATA_DIR=/var/data)
        """
        # Step 1: Resolve environment variables
        pattern = r"\$\{([^}]+)\}"

        def replacer(match):
            env_var = match.group(1)
            value = os.getenv(env_var)
            if value is None:
                raise ValueError(
                    f"Environment variable '{env_var}' is not set. "
                    f"Required for db_path configuration."
                )
            return value

        resolved = re.sub(pattern, replacer, v)

        # Step 2: Convert to Path object
        path = Path(resolved)

        # Step 3: If relative, make it relative to project root
        if not path.is_absolute():
            # This file is at: project_root/models/config.py
            # Project root is two levels up from this file
            project_root = Path(__file__).parent.parent
            path = (project_root / path).resolve()

        # Step 4: Return as string (normalized, absolute)
        return str(path)
```

**Key Techniques**:
- Resolve env vars BEFORE path resolution
- Use `Path(__file__).parent.parent` to find project root
- Convert relative paths to absolute (prevents CWD issues)
- Keep absolute paths unchanged
- Return as string for serialization compatibility

## Pattern 6: Deprecating Fields with Validation

### Example: Deprecating `similarity_threshold` in Favor of `tier_boundaries`

**Problem**: You need to replace a single config field with a more complex structure.

**Solution Pattern** (from `models/config.py`):

```python
from pydantic import BaseModel, Field, field_validator

class DecisionGraphConfig(BaseModel):
    """Configuration for decision graph memory."""

    # DEPRECATED field (kept for backward compatibility)
    similarity_threshold: float = Field(
        0.7,
        ge=0.0,
        le=1.0,
        description="DEPRECATED: Use tier_boundaries instead.",
    )

    # NEW field (preferred)
    tier_boundaries: dict[str, float] = Field(
        default_factory=lambda: {"strong": 0.75, "moderate": 0.60},
        description="Similarity score boundaries for tiered injection"
    )

    @field_validator("tier_boundaries")
    @classmethod
    def validate_tier_boundaries(cls, v: dict[str, float]) -> dict[str, float]:
        """Validate tier boundaries: strong > moderate > 0."""
        if not isinstance(v, dict) or "strong" not in v or "moderate" not in v:
            raise ValueError("tier_boundaries must have 'strong' and 'moderate' keys")

        if not (0.0 < v["moderate"] < v["strong"] <= 1.0):
            raise ValueError(
                f"tier_boundaries must satisfy: 0 < moderate ({v['moderate']}) "
                f"< strong ({v['strong']}) <= 1"
            )

        return v
```

**YAML Usage**:
```yaml
decision_graph:
  # OLD (still works, but deprecated in field description)
  similarity_threshold: 0.7

  # NEW (preferred)
  tier_boundaries:
    strong: 0.75
    moderate: 0.60
```

**Key Techniques**:
- Keep deprecated field with default value
- Add "DEPRECATED" to field description
- Validate new field structure with `@field_validator`
- Document migration in code comments and CLAUDE.md
- Eventually remove deprecated field in future major version

## Testing Migration Scripts

### Before Deployment Checklist

1. **Unit Test the Migration Logic**:
   ```python
   def test_migrate_config_dict():
       """Test migration transforms cli_tools to adapters."""
       old_config = {
           "cli_tools": {
               "claude": {
                   "command": "claude",
                   "args": ["-p", "{prompt}"],
                   "timeout": 60
               }
           }
       }

       migrated = migrate_config_dict(old_config)

       assert "adapters" in migrated
       assert "cli_tools" not in migrated
       assert migrated["adapters"]["claude"]["type"] == "cli"
       assert migrated["adapters"]["claude"]["command"] == "claude"
   ```

2. **Test Idempotency**:
   ```python
   def test_migrate_idempotent():
       """Test migrating already-migrated config is safe."""
       already_migrated = {
           "adapters": {
               "claude": {"type": "cli", "command": "claude"}
           }
       }

       result = migrate_config_dict(already_migrated)
       assert result == already_migrated  # No changes
   ```

3. **Manual Testing Steps**:
   ```bash
   # 1. Create test config
   cp config.yaml config.test.yaml

   # 2. Run migration
   python scripts/migrate_config.py config.test.yaml

   # 3. Verify backup created
   ls -la config.test.yaml.bak

   # 4. Test loading migrated config
   python -c "from models.config import load_config; c = load_config('config.test.yaml'); print('OK')"

   # 5. Compare files
   diff config.test.yaml.bak config.test.yaml

   # 6. Clean up
   rm config.test.yaml config.test.yaml.bak
   ```

4. **Load-Time Validation**:
   ```bash
   # After migration, always test that config loads without errors
   python -c "from models.config import load_config; load_config()"
   ```

## Complete Migration Workflow

When you need to evolve a config schema:

### Step 1: Update Pydantic Models

1. Add new section/fields as `Optional` (don't break existing configs)
2. Keep old section/fields for backward compatibility
3. Add `@field_validator` for new field validation
4. Add deprecation warnings in `model_post_init()`

### Step 2: Write Migration Script

1. Create `scripts/migrate_*.py` with clear docstring
2. Implement `migrate_config_dict()` for logic (testable)
3. Implement `migrate_config_file()` for I/O (backup, load, migrate, save)
4. Add `main()` with CLI argument parsing
5. Print clear instructions after migration

### Step 3: Test Migration

1. Write unit tests for `migrate_config_dict()`
2. Test idempotency (running twice is safe)
3. Test edge cases (missing sections, already migrated)
4. Manually test on real config file
5. Verify migrated config loads successfully

### Step 4: Document Migration

1. Update CLAUDE.md with migration instructions
2. Add migration notes to config.yaml comments
3. Reference migration script in deprecation warnings
4. Update README if needed

### Step 5: Deploy

1. Commit schema changes + migration script together
2. Announce deprecation to users
3. Provide migration timeline (e.g., "deprecated in v2.0, removed in v3.0")
4. Keep backward compatibility for at least one major version

## Real-World Example: The cli_tools → adapters Migration

**Context**: AI Counsel needed to support both CLI and HTTP adapters, requiring type discrimination.

**Changes Made**:

1. **Schema Evolution** (`models/config.py`):
   - Created `CLIAdapterConfig` and `HTTPAdapterConfig` with `type` discriminator
   - Made `adapters` and `cli_tools` both optional
   - Added validation that at least one exists
   - Added deprecation warning for `cli_tools`

2. **Migration Script** (`scripts/migrate_config.py`):
   - Transforms `cli_tools` → `adapters` with `type: "cli"`
   - Creates backup before modifying
   - Idempotent (safe to run multiple times)
   - Clear user feedback and next steps

3. **Testing**:
   - Unit tests for `migrate_config_dict()`
   - Integration tests for file I/O
   - Manual testing on production config

4. **Documentation**:
   - Updated CLAUDE.md with migration guide
   - Added comments to config.yaml explaining both formats
   - Referenced migration script in deprecation warning

**Result**: Users can migrate seamlessly with one command, and old configs continue working with a warning.

## Common Patterns Summary

| Pattern | Use Case | Key Technique |
|---------|----------|---------------|
| Optional Sections | Add new section while keeping old | `Optional[T]` + validation |
| Discriminated Union | Type discrimination (CLI vs HTTP) | `Literal["type"]` + `Field(discriminator="type")` |
| Env Var Substitution | Inject secrets from environment | `@field_validator` + regex `\$\{VAR\}` |
| Path Resolution | Resolve relative paths | `Path(__file__).parent` + `resolve()` |
| Deprecation Warnings | Signal old patterns | `warnings.warn()` in `model_post_init()` |
| Migration Scripts | Automate config updates | Backup + dict transform + YAML dump |
| Field Validation | Complex field constraints | `@field_validator` + custom logic |

## File References

- **Config Models**: `/Users/harrison/Github/ai-counsel/models/config.py`
- **Migration Script**: `/Users/harrison/Github/ai-counsel/scripts/migrate_config.py`
- **Config File**: `/Users/harrison/Github/ai-counsel/config.yaml`
- **Project Docs**: `/Users/harrison/Github/ai-counsel/CLAUDE.md`

## Key Takeaways

1. **Never break existing configs** - always provide migration path
2. **Automate migrations** - don't force manual editing
3. **Use Pydantic validators** - catch errors at load time
4. **Support env vars** - never hardcode secrets
5. **Test thoroughly** - unit + integration + manual testing
6. **Document clearly** - in code, CLAUDE.md, and warnings
7. **Version carefully** - deprecate → warn → remove (over multiple versions)

---

When you detect config schema evolution needs, activate this skill and follow these patterns to ensure smooth, backward-compatible migrations.

Quick Install

$npx ai-builder add skill blueman82/config-schema-migrator

Details

Type
skill
Author
blueman82
Slug
blueman82/config-schema-migrator
Created
5d ago