Name: data-modeler
Author: matteocervelli
## Purpose

The data-modeler skill provides comprehensive guidance for designing robust data models using Pydantic, Python's most popular data validation library. This skill helps the Architecture Designer agent create type-safe, validated data structures that serve as the foundation for feature implementations.

This skill emphasizes:
- **Type Safety:** Complete type annotations for all fields
- **Validation:** Comprehensive validators for business rules
- **Documentation:** Clear field descriptions and constraints
- **Relationships:** Proper modeling of entity relationships
- **Serialization:** Correct handling of JSON/dict conversion

The data-modeler skill ensures that data models are not just simple data containers, but intelligent objects that enforce business rules, validate data integrity, and provide clear contracts for data interchange.

## When to Use

This skill auto-activates when the agent describes:
- "Design data models for..."
- "Create Pydantic schemas for..."
- "Define data structures with..."
- "Model the data with..."
- "Create validation rules for..."
- "Define entity relationships..."
- "Specify field constraints for..."
- "Design request/response schemas..."

## Provided Capabilities

### 1. Pydantic Schema Design

**What it provides:**
- BaseModel class structure
- Field definitions with types and constraints
- Default values and factory functions
- Optional vs required fields
- Nested model composition
- Model inheritance patterns

**Guidance:**
- Use `Field()` for metadata and constraints
- Provide `description` for all fields
- Set appropriate `default` or `default_factory`
- Use `Optional[T]` for nullable fields
- Validate field names follow conventions

**Example:**
```python
from pydantic import BaseModel, Field, validator
from typing import Optional, List
from datetime import datetime
from enum import Enum

class UserRole(str, Enum):
    """User role enumeration."""
    ADMIN = "admin"
    USER = "user"
    GUEST = "guest"

class Address(BaseModel):
    """Nested address model."""
    street: str = Field(..., description="Street address", min_length=1, max_length=200)
    city: str = Field(..., description="City name", min_length=1, max_length=100)
    state: str = Field(..., description="State/province code", min_length=2, max_length=2)
    postal_code: str = Field(..., description="Postal/ZIP code", regex=r"^\d{5}(-\d{4})?$")
    country: str = Field(default="US", description="Country code (ISO 3166-1 alpha-2)")

    class Config:
        schema_extra = {
            "example": {
                "street": "123 Main St",
                "city": "Springfield",
                "state": "IL",
                "postal_code": "62701",
                "country": "US"
            }
        }

class User(BaseModel):
    """User data model with comprehensive validation."""

    # Identity fields
    id: Optional[int] = Field(None, description="User ID (auto-generated)")
    username: str = Field(..., description="Unique username", min_length=3, max_length=50)
    email: str = Field(..., description="Email address (validated)")

    # Profile fields
    full_name: str = Field(..., description="User's full name", min_length=1, max_length=200)
    role: UserRole = Field(default=UserRole.USER, description="User role")
    is_active: bool = Field(default=True, description="Account active status")

    # Nested model
    address: Optional[Address] = Field(None, description="Mailing address")

    # Lists
    tags: List[str] = Field(default_factory=list, description="User tags")

    # Timestamps
    created_at: datetime = Field(default_factory=datetime.utcnow, description="Creation timestamp")
    updated_at: Optional[datetime] = Field(None, description="Last update timestamp")

    class Config:
        """Pydantic model configuration."""
        # Allow ORM models to be parsed
        orm_mode = True

        # Use enum values in JSON
        use_enum_values = True

        # Example for documentation
        schema_extra = {
            "example": {
                "username": "johndoe",
                "email": "john@example.com",
                "full_name": "John Doe",
                "role": "user",
                "address": {
                    "street": "123 Main St",
                    "city": "Springfield",
                    "state": "IL",
                    "postal_code": "62701"
                },
                "tags": ["verified", "premium"]
            }
        }
```

### 2. Field-Level Validators

**What it provides:**
- `@validator` decorator usage
- Value transformation
- Cross-field validation
- Custom error messages
- Pre and post validation

**Validation Types:**
- **Format validation:** Email, URL, phone, regex
- **Range validation:** min/max for numbers, length for strings
- **Business rules:** Custom logic validation
- **Referential integrity:** Cross-field checks

**Example:**
```python
from pydantic import BaseModel, Field, validator, root_validator
import re

class UserRegistration(BaseModel):
    """User registration with comprehensive validation."""

    username: str = Field(..., min_length=3, max_length=50)
    email: str = Field(...)
    password: str = Field(..., min_length=8)
    password_confirm: str = Field(..., min_length=8)
    age: int = Field(..., ge=13, le=120)
    phone: Optional[str] = Field(None)

    @validator('username')
    def validate_username(cls, v):
        """Validate username format."""
        if not re.match(r'^[a-zA-Z0-9_-]+$', v):
            raise ValueError('Username must contain only letters, numbers, hyphens, and underscores')

        # Check against reserved names
        reserved = ['admin', 'root', 'system']
        if v.lower() in reserved:
            raise ValueError(f'Username "{v}" is reserved')

        return v.lower()  # Normalize to lowercase

    @validator('email')
    def validate_email(cls, v):
        """Validate email format."""
        email_regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
        if not re.match(email_regex, v):
            raise ValueError('Invalid email format')

        return v.lower()  # Normalize to lowercase

    @validator('password')
    def validate_password_strength(cls, v):
        """Validate password strength."""
        if not re.search(r'[A-Z]', v):
            raise ValueError('Password must contain at least one uppercase letter')
        if not re.search(r'[a-z]', v):
            raise ValueError('Password must contain at least one lowercase letter')
        if not re.search(r'\d', v):
            raise ValueError('Password must contain at least one digit')
        if not re.search(r'[!@#$%^&*(),.?":{}|<>]', v):
            raise ValueError('Password must contain at least one special character')

        return v

    @validator('phone')
    def validate_phone(cls, v):
        """Validate phone number format."""
        if v is None:
            return v

        # Remove all non-digit characters
        digits = re.sub(r'\D', '', v)

        if len(digits) != 10:
            raise ValueError('Phone number must be 10 digits')

        # Return formatted phone
        return f'({digits[:3]}) {digits[3:6]}-{digits[6:]}'

    @root_validator
    def validate_passwords_match(cls, values):
        """Validate that passwords match (cross-field validation)."""
        password = values.get('password')
        password_confirm = values.get('password_confirm')

        if password != password_confirm:
            raise ValueError('Passwords do not match')

        return values
```

### 3. Model-Level Validators

**What it provides:**
- `@root_validator` for cross-field validation
- Pre-validation transformations
- Post-validation checks
- Complex business rule enforcement

**Example:**
```python
from pydantic import BaseModel, Field, root_validator
from datetime import date, datetime
from typing import Optional

class EventBooking(BaseModel):
    """Event booking with complex validation."""

    event_name: str = Field(...)
    start_date: date = Field(...)
    end_date: date = Field(...)
    attendees: int = Field(..., ge=1, le=1000)
    room_capacity: int = Field(..., ge=1)
    is_catering: bool = Field(default=False)
    catering_headcount: Optional[int] = Field(None, ge=1)

    @root_validator(pre=True)
    def convert_date_strings(cls, values):
        """Pre-validation: Convert date strings to date objects."""
        for field in ['start_date', 'end_date']:
            if field in values and isinstance(values[field], str):
                values[field] = datetime.strptime(values[field], '%Y-%m-%d').date()
        return values

    @root_validator
    def validate_dates(cls, values):
        """Validate date logic."""
        start = values.get('start_date')
        end = values.get('end_date')

        if start and end:
            # End must be after start
            if end < start:
                raise ValueError('End date must be after start date')

            # Maximum event duration: 30 days
            if (end - start).days > 30:
                raise ValueError('Event duration cannot exceed 30 days')

            # Must be future dates
            if start < date.today():
                raise ValueError('Event cannot be in the past')

        return values

    @root_validator
    def validate_capacity(cls, values):
        """Validate room capacity vs attendees."""
        attendees = values.get('attendees')
        capacity = values.get('room_capacity')

        if attendees and capacity:
            if attendees > capacity:
                raise ValueError(f'Attendees ({attendees}) exceeds room capacity ({capacity})')

        return values

    @root_validator
    def validate_catering(cls, values):
        """Validate catering requirements."""
        is_catering = values.get('is_catering')
        catering_headcount = values.get('catering_headcount')
        attendees = values.get('attendees')

        if is_catering:
            # Catering headcount required if catering enabled
            if not catering_headcount:
                raise ValueError('Catering headcount required when catering is enabled')

            # Catering headcount cannot exceed attendees
            if catering_headcount > attendees:
                raise ValueError('Catering headcount cannot exceed number of attendees')
        else:
            # No catering headcount if catering disabled
            if catering_headcount:
                raise ValueError('Catering headcount specified but catering is disabled')

        return values
```

### 4. Type Annotations and Constraints

**What it provides:**
- Proper use of typing module
- Generic types (List, Dict, Set, Tuple)
- Union types and Optional
- Literal types for constants
- Custom types

**Example:**
```python
from pydantic import BaseModel, Field, constr, conint, confloat, conlist
from typing import List, Dict, Set, Optional, Union, Literal, Any
from datetime import datetime

# Custom constrained types
Username = constr(regex=r'^[a-zA-Z0-9_-]+$', min_length=3, max_length=50)
PositiveInt = conint(gt=0)
Percentage = confloat(ge=0.0, le=100.0)
NonEmptyList = conlist(str, min_items=1)

class ProductStatus(str, Enum):
    """Product status enum."""
    DRAFT = "draft"
    ACTIVE = "active"
    ARCHIVED = "archived"

class Product(BaseModel):
    """Product model with advanced type annotations."""

    # Basic types with constraints
    id: Optional[int] = None
    name: constr(min_length=1, max_length=200)
    sku: constr(regex=r'^[A-Z]{3}-\d{6}$')  # Format: ABC-123456

    # Numeric types with constraints
    price: confloat(gt=0.0, le=1000000.0)
    discount_percentage: Percentage = 0.0
    stock_quantity: PositiveInt

    # Enum
    status: ProductStatus = ProductStatus.DRAFT

    # Collections
    tags: List[str] = Field(default_factory=list)
    categories: Set[str] = Field(default_factory=set)
    attributes: Dict[str, Any] = Field(default_factory=dict)

    # Union types
    metadata: Union[Dict[str, str], None] = None

    # Literal type (specific values only)
    measurement_unit: Literal["kg", "lb", "oz", "g"]

    # Nested models
    dimensions: Optional['ProductDimensions'] = None

    # Timestamps
    created_at: datetime = Field(default_factory=datetime.utcnow)
    updated_at: Optional[datetime] = None

class ProductDimensions(BaseModel):
    """Product dimensions (nested model)."""
    length: confloat(gt=0)
    width: confloat(gt=0)
    height: confloat(gt=0)
    unit: Literal["cm", "in", "m"]

    @property
    def volume(self) -> float:
        """Calculate volume."""
        return self.length * self.width * self.height

# Enable forward reference
Product.update_forward_refs()
```

### 5. Relationship Mappings

**What it provides:**
- One-to-one relationships
- One-to-many relationships
- Many-to-many relationships
- Foreign key references
- Embedded vs referenced documents

**Relationship Patterns:**

**One-to-One:**
```python
class UserProfile(BaseModel):
    """User profile (one-to-one with User)."""
    user_id: int = Field(..., description="Foreign key to User")
    bio: Optional[str] = Field(None, max_length=500)
    avatar_url: Optional[str] = None

class User(BaseModel):
    """User with one-to-one profile."""
    id: int
    username: str
    profile: Optional[UserProfile] = None  # Embedded relationship
```

**One-to-Many:**
```python
class Comment(BaseModel):
    """Comment (many comments per post)."""
    id: int
    post_id: int = Field(..., description="Foreign key to Post")
    content: str
    created_at: datetime

class Post(BaseModel):
    """Post with many comments."""
    id: int
    title: str
    content: str
    comments: List[Comment] = Field(default_factory=list)  # Embedded list
```

**Many-to-Many:**
```python
class Tag(BaseModel):
    """Tag entity."""
    id: int
    name: str

class Article(BaseModel):
    """Article with many tags."""
    id: int
    title: str
    tag_ids: List[int] = Field(default_factory=list)  # Reference by ID
    # OR
    tags: List[Tag] = Field(default_factory=list)  # Embedded tags
```

### 6. Serialization Strategies

**What it provides:**
- JSON serialization/deserialization
- `dict()` conversion with exclusions
- `json()` output with formatting
- Custom serializers for complex types
- Alias usage for field naming

**Example:**
```python
from pydantic import BaseModel, Field
from datetime import datetime
from typing import Optional

class ApiResponse(BaseModel):
    """API response with serialization control."""

    id: int
    name: str
    internal_code: str = Field(..., alias="code")  # Use 'code' in JSON
    created_at: datetime
    secret_key: Optional[str] = None  # Should not be exposed
    _internal_state: str = "processing"  # Private field (not serialized)

    class Config:
        # Allow field aliases
        allow_population_by_field_name = True

        # Custom JSON encoders
        json_encoders = {
            datetime: lambda v: v.isoformat()
        }

# Usage
response = ApiResponse(
    id=1,
    name="Test",
    code="ABC123",
    created_at=datetime.utcnow(),
    secret_key="secret"
)

# Serialize to dict (exclude secret)
data = response.dict(exclude={'secret_key'})
# {'id': 1, 'name': 'Test', 'internal_code': 'ABC123', 'created_at': datetime(...)}

# Serialize to JSON with alias
json_str = response.json(by_alias=True, exclude={'secret_key'})
# {"id": 1, "name": "Test", "code": "ABC123", "created_at": "2025-10-29T..."}

# Include/exclude specific fields
data = response.dict(include={'id', 'name'})
# {'id': 1, 'name': 'Test'}
```

## Usage Guide

### Step 1: Identify Data Entities
```
Requirements → Entities → Attributes → Relationships
```

### Step 2: Define Base Models
```
Create BaseModel → Add fields → Set types → Add descriptions
```

### Step 3: Add Constraints
```
Field(...) → min/max → regex → custom constraints
```

### Step 4: Implement Validators
```
@validator → business rules → error messages → transformations
```

### Step 5: Model Relationships
```
Identify relationships → Choose embedding vs reference → Add foreign keys
```

### Step 6: Configure Serialization
```
Config class → JSON encoders → Aliases → ORM mode
```

### Step 7: Add Examples
```
schema_extra → Example data → Documentation
```

### Step 8: Test Models
```
Create instances → Validate data → Test edge cases → Check errors
```

## Best Practices

1. **Use Field() for Metadata**
   - Always provide descriptions
   - Set constraints (min/max, regex)
   - Provide examples in Config

2. **Comprehensive Validation**
   - Validate at field level when possible
   - Use root_validator for cross-field logic
   - Provide clear error messages

3. **Type Safety**
   - Use specific types, not Any
   - Use Optional[T] for nullable fields
   - Use Enum for fixed choices

4. **Documentation**
   - Description for every field
   - Examples in schema_extra
   - Docstrings for complex logic

5. **Serialization Control**
   - Use aliases for API compatibility
   - Exclude sensitive fields
   - Custom encoders for complex types

6. **Model Organization**
   - Group related models in same file
   - Use inheritance for shared fields
   - Keep models focused and cohesive

## Resources

### data-model-guide.md
Comprehensive data modeling guide including:
- Entity-relationship principles
- Normalization guidelines
- Pydantic model structure best practices
- Validation strategies and patterns
- Type hints and annotations guide
- Field constraints catalog

### pydantic-patterns.md
Pydantic-specific patterns and examples:
- BaseModel advanced usage
- Custom validators (field and model level)
- Field validators with dependencies
- Root validators for complex logic
- Config options and settings
- Nested models and composition
- Serialization and deserialization
- ORM mode and database integration
- Complete code examples

## Example Usage

### Input (from Architecture Designer agent):
```
"Design data models for a user management system with user profiles, addresses, roles, and audit logging."
```

### Output (data-modeler skill provides):
```python
# Comprehensive data models with validation

from pydantic import BaseModel, Field, validator, root_validator
from typing import Optional, List
from datetime import datetime
from enum import Enum

class UserRole(str, Enum):
    """User roles."""
    ADMIN = "admin"
    MANAGER = "manager"
    USER = "user"

class Address(BaseModel):
    """User address."""
    street: str = Field(..., min_length=1, max_length=200)
    city: str = Field(..., min_length=1, max_length=100)
    state: str = Field(..., min_length=2, max_length=2)
    postal_code: str = Field(..., regex=r'^\d{5}(-\d{4})?$')

class UserProfile(BaseModel):
    """User profile information."""
    bio: Optional[str] = Field(None, max_length=500)
    avatar_url: Optional[str] = None
    phone: Optional[str] = Field(None, regex=r'^\(\d{3}\) \d{3}-\d{4}$')

    @validator('avatar_url')
    def validate_avatar_url(cls, v):
        if v and not v.startswith(('http://', 'https://')):
            raise ValueError('Avatar URL must be HTTP/HTTPS')
        return v

class User(BaseModel):
    """Main user model."""
    id: Optional[int] = None
    username: str = Field(..., min_length=3, max_length=50)
    email: str = Field(...)
    full_name: str = Field(..., min_length=1, max_length=200)
    role: UserRole = UserRole.USER
    is_active: bool = True

    # Relationships
    profile: Optional[UserProfile] = None
    addresses: List[Address] = Field(default_factory=list)

    # Audit fields
    created_at: datetime = Field(default_factory=datetime.utcnow)
    created_by: Optional[int] = None
    updated_at: Optional[datetime] = None
    updated_by: Optional[int] = None

    @validator('email')
    def validate_email(cls, v):
        import re
        if not re.match(r'^[^@]+@[^@]+\.[^@]+$', v):
            raise ValueError('Invalid email format')
        return v.lower()

    class Config:
        orm_mode = True
        schema_extra = {
            "example": {
                "username": "johndoe",
                "email": "john@example.com",
                "full_name": "John Doe",
                "role": "user",
                "profile": {
                    "bio": "Software engineer",
                    "phone": "(555) 123-4567"
                },
                "addresses": [{
                    "street": "123 Main St",
                    "city": "Springfield",
                    "state": "IL",
                    "postal_code": "62701"
                }]
            }
        }
```

## Integration

### Used By:
- **@architecture-designer** (Primary) - Phase 2 sub-agent for architecture design

### Integrates With:
- **architecture-planner** skill - Data models designed after component structure
- **api-designer** skill - Data models used in API request/response schemas

### Workflow Position:
1. Analysis Specialist completes requirements analysis
2. Architecture Designer receives analysis
3. architecture-planner skill designs component structure (Step 3)
4. **data-modeler skill** designs data models (Step 4)
5. api-designer skill designs API contracts using these models (Step 5)
6. Results synthesized into PRP

---

**Version:** 2.0.0
**Auto-Activation:** Yes
**Phase:** 2 - Design & Planning
**Created:** 2025-10-29
data-modeler

Quick Install

Details

Tasks

Used In

More by matteocervelli

Related skills