skillby LorenFrankLab
safe-refactoring
Change code structure without changing behavior, with zero tolerance for behavioral changes
Installs: 0
Used in: 1 repos
Updated: 2d ago
$
npx ai-builder add skill LorenFrankLab/safe-refactoringInstalls to .claude/skills/safe-refactoring/
# Safe Refactoring for Scientific Code
## Overview
Change code structure without changing behavior. Zero tolerance for behavioral changes during refactoring.
**Core principle:** Establish baseline, refactor, verify exact match (within floating-point noise).
**Announce at start:** "I'm using the safe-refactoring skill to restructure this code."
## When to Use This Skill
**Use for:**
- Improving code readability without changing logic
- Extracting reusable functions
- Renaming variables/functions for clarity
- Reorganizing code structure
- Performance optimization (without changing numerical behavior)
**Don't use for:**
- Changing behavior or algorithms (use scientific-tdd instead)
- Adding new features (use scientific-tdd instead)
- Fixing bugs (use scientific-tdd or fix directly with tests)
## Process Checklist
Copy to TodoWrite:
```
Safe Refactoring Progress:
- [ ] Run full test suite (establish baseline)
- [ ] Run snapshot tests (establish baseline)
- [ ] Capture coverage report
- [ ] Perform refactoring
- [ ] Run full test suite (must match baseline exactly)
- [ ] Run snapshot tests (must match baseline exactly)
- [ ] Compare coverage (should stay same or improve)
- [ ] Run quality checks (ruff + black)
- [ ] Verify no numerical differences
- [ ] Commit refactoring
```
## Strict Rules
**ZERO tolerance for:**
- Any test that passed before and fails after
- Any test that failed before and passes after (suggests test was broken)
- Any snapshot differences (not even floating-point noise)
- Decreased test coverage
- Any behavioral changes
**If any of these occur:** Revert and investigate why.
## Detailed Steps
### Step 1: Run Full Test Suite (Baseline)
```bash
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -v 2>&1 | tee /tmp/baseline_tests.txt
```
**Record:**
- Total tests: `grep "passed" /tmp/baseline_tests.txt`
- Any failures (if refactoring existing code with known issues)
- Test execution time
**Expected:** All tests pass (or document any known failures)
### Step 2: Run Snapshot Tests (Baseline)
```bash
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -m snapshot -v 2>&1 | tee /tmp/baseline_snapshots.txt
```
**CRITICAL:** Snapshots must match exactly after refactoring.
**Expected:** All snapshot tests pass
### Step 3: Capture Coverage Report
```bash
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest --cov=non_local_detector --cov-report=term --cov-report=json:coverage_baseline.json
```
**Record:** Coverage percentage for files being refactored
**Why:** Coverage should not decrease during refactoring (ideally improves)
### Step 4: Perform Refactoring
**Refactoring techniques:**
1. **Extract function:**
```python
# Before
def complex_function():
# ... 50 lines of code
result = x * 2 + y
# ... more code
return final_result
# After
def complex_function():
# ... code
result = _calculate_intermediate(x, y)
# ... code
return final_result
def _calculate_intermediate(x, y):
return x * 2 + y
```
2. **Rename for clarity:**
```python
# Before
def f(x):
return x * 2
# After
def calculate_doubled_value(value):
return value * 2
```
3. **Reorganize structure:**
```python
# Before: All in one file
# After: Separated into modules
# - core_logic.py
# - utilities.py
# - validation.py
```
4. **Optimize performance (numerically equivalent):**
```python
# Before
for i in range(n):
result[i] = f(x[i])
# After (JAX)
result = jax.vmap(f)(x)
```
**During refactoring:**
- Make small, incremental changes
- Test after each change if possible
- Keep numerical operations identical
- Maintain exact same algorithms
### Step 5: Run Full Test Suite (Verify Match)
```bash
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -v 2>&1 | tee /tmp/refactored_tests.txt
```
**Compare to baseline:**
```bash
diff /tmp/baseline_tests.txt /tmp/refactored_tests.txt
```
**MUST verify:**
- Same number of tests run
- Same tests pass
- Same tests fail (if any)
- Similar execution time (within 20%)
**If differences:**
- Any new test failures: REVERT IMMEDIATELY
- Any new test passes: Investigate (test was broken?)
- Different test count: Investigate (tests missing or duplicated?)
### Step 6: Run Snapshot Tests (Verify Match)
```bash
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -m snapshot -v 2>&1 | tee /tmp/refactored_snapshots.txt
```
**CRITICAL:** Must match baseline EXACTLY.
**Expected:** All snapshot tests pass, no differences
**If snapshot differences:**
1. **DO NOT UPDATE SNAPSHOTS**
2. Investigate why behavior changed
3. This is NOT a refactoring if behavior changed
4. Revert and reconsider approach
### Step 7: Compare Coverage
```bash
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest --cov=non_local_detector --cov-report=term --cov-report=json:coverage_refactored.json
```
**Compare:**
```bash
# If you have jq installed
jq '.totals.percent_covered' coverage_baseline.json
jq '.totals.percent_covered' coverage_refactored.json
```
**Expected:**
- Coverage stays same or improves
- Never decreases
**If coverage decreased:**
- Some code paths no longer tested
- Investigate and fix or revert
### Step 8: Run Quality Checks
```bash
/Users/edeno/miniconda3/envs/spectral_connectivity/bin/ruff check src/
/Users/edeno/miniconda3/envs/spectral_connectivity/bin/ruff format src/
/Users/edeno/miniconda3/envs/non_local_detector/bin/black src/
```
**Expected:** All checks pass
**Fix any issues:** Refactoring is good opportunity to improve code quality
### Step 9: Verify No Numerical Differences
For mathematical code, verify numerical equivalence:
```bash
# Run golden regression
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest \
src/non_local_detector/tests/test_golden_regression.py -v
```
**Expected:** Exact match (or differences < 1e-14)
**If differences > 1e-14:**
- This is NOT a pure refactoring
- Behavior has changed
- Use numerical-validation skill instead
### Step 10: Commit Refactoring
**Only commit if ALL checks pass:**
```bash
git add <refactored_files> <test_files>
git commit -m "refactor: improve <component> code structure
- Extract <function> for reusability
- Rename <variable> for clarity
- Reorganize <module> structure
No behavioral changes:
- All tests pass (N tests)
- Snapshots unchanged
- Coverage: X% ā Y%
š¤ Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>"
```
## Performance Optimization Refactoring
When optimizing for performance:
1. **Capture performance baseline:**
```bash
pytest --durations=10 > /tmp/baseline_durations.txt
```
2. **Make optimization**
3. **Verify numerical equivalence** (use numerical-validation skill)
4. **Measure performance improvement:**
```bash
pytest --durations=10 > /tmp/optimized_durations.txt
```
5. **Document improvement:**
```
Optimization: Use JAX vmap instead of for loop
Speedup: 3.2x (450ms ā 140ms)
Numerical difference: < 1e-14 (verified)
```
## Integration with Other Skills
- **Before refactoring:** Consider if change actually needs new behavior (use scientific-tdd instead)
- **With numerical-validation:** If refactoring mathematical code, use numerical-validation to verify equivalence
- **With jax skill:** When optimizing JAX code, use jax skill for best practices
## Example Workflow
**Task:** Extract position decoding logic into reusable function
```
1. Baseline:
- Run pytest: 427 passed, 0 failed
- Run snapshots: 15 passed, 0 failed
- Coverage: 69%
2. Refactor:
- Extract _decode_position_from_posterior() function
- Update 3 call sites to use new function
- No logic changes, just extraction
3. Verify:
- Run pytest: 427 passed, 0 failed ā
- Run snapshots: 15 passed, 0 failed ā
- Coverage: 69% (unchanged) ā
4. Quality:
- Ruff: All checks pass ā
- Black: Formatted ā
5. Commit:
"refactor: extract position decoding into reusable function"
```
## Red Flags
**STOP and revert if:**
- Any test changes status (pass ā fail or fail ā pass)
- Any snapshot differences appear
- Coverage decreases
- Numerical differences > 1e-14
- You're tempted to update snapshots
- You're adding new logic (use scientific-tdd instead)
**Safe to proceed if:**
- All tests match baseline exactly
- No snapshot changes
- Coverage same or better
- Code quality improves
- No new functionality added
## Common Mistakes
**"It's just a small behavioral change"**
- No such thing in refactoring
- Any behavioral change = not refactoring
- Use scientific-tdd for behavioral changes
**"I'll update the snapshots since the new output is better"**
- That's not refactoring, it's changing behavior
- Refactoring = zero snapshot changes
- Use scientific-tdd if output should change
**"Tests are slow, I'll skip them"**
- Never skip tests during refactoring
- Tests are your safety net
- Without tests, you can't verify it's a refactoring
**"Coverage went down but the code is better"**
- Better code shouldn't lose coverage
- Investigate why coverage decreased
- Fix or revertQuick Install
$
npx ai-builder add skill LorenFrankLab/safe-refactoringDetails
- Type
- skill
- Author
- LorenFrankLab
- Slug
- LorenFrankLab/safe-refactoring
- Created
- 6d ago