modernize-scientific-stack

Modernize Scientific Computing Stack

This skill provides guidance for migrating legacy Python 2 scientific computing code to modern Python 3 with contemporary libraries and best practices.

When to Use This Skill

Apply this skill when:

Migrating Python 2 scientific scripts to Python 3
Updating legacy data processing code using outdated patterns
Modernizing scripts that use deprecated file handling, string encoding, or numerical libraries
Converting scripts from csv module to pandas for data analysis
Replacing os.path with pathlib for path manipulation

Approach

Phase 1: Complete Code Discovery

Before making any changes, ensure complete understanding of the existing codebase:

Read all source files completely - If a file read is truncated, request the full content before proceeding. Never assume file contents based on partial reads.

Identify all dependencies - Check for:

Import statements (standard library and third-party)
Configuration files (JSON, YAML, INI)
Data files (CSV, Excel, pickle)
Environment requirements

Map the data flow - Understand:

Input file formats and encodings
Data transformations applied
Output format requirements
Any intermediate files or caches

Phase 2: Identify Migration Requirements

Common Python 2 to Python 3 migration patterns in scientific code:

Legacy Pattern Modern Replacement

print "text"

print("text")

unicode() / str()

str() with explicit encoding

open(file)

open(file, encoding='utf-8')

os.path.join()

pathlib.Path()

csv module pandas.read_csv()

for key in dict.keys()

for key in dict

dict.has_key(x)

x in dict

Manual file iteration Context managers (with statements)

xrange()

range()

Integer division /

Explicit // or float division

Phase 3: Implementation Strategy

Create the modernized script with these priorities:

UTF-8 encoding for all file operations
pathlib.Path for all file path manipulations
pandas for CSV/data processing
Type hints where beneficial
Context managers for resource handling

Handle configuration files - Check for file existence before reading:

config_path = Path("config.json") if config_path.exists(): config = json.loads(config_path.read_text(encoding='utf-8'))

Create requirements.txt - Include all dependencies with version constraints

Phase 4: Verification Protocol

Critical: Always verify file operations

After writing any file, read it back to confirm:

The complete content was written (not truncated)
The syntax is valid
All imports are present

Testing sequence:

Syntax validation - Run Python syntax check:

python -m py_compile script.py

Import verification - Test all imports resolve:

python -c "from script import *"

Functional test - Run the script and compare output to expected results

Output validation - Verify output format matches requirements exactly

Common Pitfalls to Avoid

Truncated file content - Never proceed with partial file reads. If a response shows ... [truncated] or incomplete content, request the full file before continuing.

Unverified writes - After using a write operation, always read the file back to confirm the complete content was written correctly.

Encoding issues - Always specify encoding='utf-8' explicitly in file operations. Legacy scripts often have implicit ASCII assumptions.

Path string concatenation - Replace all os.path.join() and string concatenation for paths with pathlib.Path operations.

Missing edge case handling:

Empty data files or datasets
Missing required files
Invalid data types in CSV columns
Stations/entities with no matching data

Environment setup repetition - When setting up environments (venv, PATH), verify the setup persists rather than repeating in each command.

Verification Checklist

Before marking the task complete, confirm:

All source files were read completely (no truncation)
Written files were verified by reading back
All Python 2 patterns have been converted
File encodings are explicitly specified
pathlib is used for all path operations
pandas is used for data processing (where appropriate)
requirements.txt includes all dependencies
Script runs without errors
Output matches expected format exactly
Edge cases are handled (empty data, missing files)

Output Validation

When the task specifies an expected output format, verify the output matches exactly:

Run the modernized script
Capture the output
Compare against expected format character-by-character if needed
Pay attention to:
Decimal precision in numerical output
Whitespace and formatting
Order of output items
Units and labels

modernize-scientific-stack

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

extracting-pdf-text

video-processing

google-workspace

portfolio-optimization