Modernize Scientific Computing Stack
This skill provides guidance for migrating legacy Python 2 scientific computing code to modern Python 3 with contemporary libraries and best practices.
When to Use This Skill
Apply this skill when:
-
Migrating Python 2 scientific scripts to Python 3
-
Updating legacy data processing code using outdated patterns
-
Modernizing scripts that use deprecated file handling, string encoding, or numerical libraries
-
Converting scripts from csv module to pandas for data analysis
-
Replacing os.path with pathlib for path manipulation
Approach
Phase 1: Complete Code Discovery
Before making any changes, ensure complete understanding of the existing codebase:
Read all source files completely - If a file read is truncated, request the full content before proceeding. Never assume file contents based on partial reads.
Identify all dependencies - Check for:
-
Import statements (standard library and third-party)
-
Configuration files (JSON, YAML, INI)
-
Data files (CSV, Excel, pickle)
-
Environment requirements
Map the data flow - Understand:
-
Input file formats and encodings
-
Data transformations applied
-
Output format requirements
-
Any intermediate files or caches
Phase 2: Identify Migration Requirements
Common Python 2 to Python 3 migration patterns in scientific code:
Legacy Pattern Modern Replacement
print "text"
print("text")
unicode() / str()
str() with explicit encoding
open(file)
open(file, encoding='utf-8')
os.path.join()
pathlib.Path()
csv module pandas.read_csv()
for key in dict.keys()
for key in dict
dict.has_key(x)
x in dict
Manual file iteration Context managers (with statements)
xrange()
range()
Integer division /
Explicit // or float division
Phase 3: Implementation Strategy
Create the modernized script with these priorities:
-
UTF-8 encoding for all file operations
-
pathlib.Path for all file path manipulations
-
pandas for CSV/data processing
-
Type hints where beneficial
-
Context managers for resource handling
Handle configuration files - Check for file existence before reading:
config_path = Path("config.json") if config_path.exists(): config = json.loads(config_path.read_text(encoding='utf-8'))
Create requirements.txt - Include all dependencies with version constraints
Phase 4: Verification Protocol
Critical: Always verify file operations
After writing any file, read it back to confirm:
-
The complete content was written (not truncated)
-
The syntax is valid
-
All imports are present
Testing sequence:
Syntax validation - Run Python syntax check:
python -m py_compile script.py
Import verification - Test all imports resolve:
python -c "from script import *"
Functional test - Run the script and compare output to expected results
Output validation - Verify output format matches requirements exactly
Common Pitfalls to Avoid
Truncated file content - Never proceed with partial file reads. If a response shows ... [truncated] or incomplete content, request the full file before continuing.
Unverified writes - After using a write operation, always read the file back to confirm the complete content was written correctly.
Encoding issues - Always specify encoding='utf-8' explicitly in file operations. Legacy scripts often have implicit ASCII assumptions.
Path string concatenation - Replace all os.path.join() and string concatenation for paths with pathlib.Path operations.
Missing edge case handling:
-
Empty data files or datasets
-
Missing required files
-
Invalid data types in CSV columns
-
Stations/entities with no matching data
Environment setup repetition - When setting up environments (venv, PATH), verify the setup persists rather than repeating in each command.
Verification Checklist
Before marking the task complete, confirm:
-
All source files were read completely (no truncation)
-
Written files were verified by reading back
-
All Python 2 patterns have been converted
-
File encodings are explicitly specified
-
pathlib is used for all path operations
-
pandas is used for data processing (where appropriate)
-
requirements.txt includes all dependencies
-
Script runs without errors
-
Output matches expected format exactly
-
Edge cases are handled (empty data, missing files)
Output Validation
When the task specifies an expected output format, verify the output matches exactly:
-
Run the modernized script
-
Capture the output
-
Compare against expected format character-by-character if needed
-
Pay attention to:
-
Decimal precision in numerical output
-
Whitespace and formatting
-
Order of output items
-
Units and labels