python-performance-optimization

Python Performance Optimization

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "python-performance-optimization" with this command: npx skills add wshobson/agents/wshobson-agents-python-performance-optimization

Python Performance Optimization

Comprehensive guide to profiling, analyzing, and optimizing Python code for better performance, including CPU profiling, memory optimization, and implementation best practices.

When to Use This Skill

  • Identifying performance bottlenecks in Python applications

  • Reducing application latency and response times

  • Optimizing CPU-intensive operations

  • Reducing memory consumption and memory leaks

  • Improving database query performance

  • Optimizing I/O operations

  • Speeding up data processing pipelines

  • Implementing high-performance algorithms

  • Profiling production applications

Core Concepts

  1. Profiling Types
  • CPU Profiling: Identify time-consuming functions

  • Memory Profiling: Track memory allocation and leaks

  • Line Profiling: Profile at line-by-line granularity

  • Call Graph: Visualize function call relationships

  1. Performance Metrics
  • Execution Time: How long operations take

  • Memory Usage: Peak and average memory consumption

  • CPU Utilization: Processor usage patterns

  • I/O Wait: Time spent on I/O operations

  1. Optimization Strategies
  • Algorithmic: Better algorithms and data structures

  • Implementation: More efficient code patterns

  • Parallelization: Multi-threading/processing

  • Caching: Avoid redundant computation

  • Native Extensions: C/Rust for critical paths

Quick Start

Basic Timing

import time

def measure_time(): """Simple timing measurement.""" start = time.time()

# Your code here
result = sum(range(1000000))

elapsed = time.time() - start
print(f"Execution time: {elapsed:.4f} seconds")
return result

Better: use timeit for accurate measurements

import timeit

execution_time = timeit.timeit( "sum(range(1000000))", number=100 ) print(f"Average time: {execution_time/100:.6f} seconds")

Profiling Tools

Pattern 1: cProfile - CPU Profiling

import cProfile import pstats from pstats import SortKey

def slow_function(): """Function to profile.""" total = 0 for i in range(1000000): total += i return total

def another_function(): """Another function.""" return [i**2 for i in range(100000)]

def main(): """Main function to profile.""" result1 = slow_function() result2 = another_function() return result1, result2

Profile the code

if name == "main": profiler = cProfile.Profile() profiler.enable()

main()

profiler.disable()

# Print stats
stats = pstats.Stats(profiler)
stats.sort_stats(SortKey.CUMULATIVE)
stats.print_stats(10)  # Top 10 functions

# Save to file for later analysis
stats.dump_stats("profile_output.prof")

Command-line profiling:

Profile a script

python -m cProfile -o output.prof script.py

View results

python -m pstats output.prof

In pstats:

sort cumtime

stats 10

Pattern 2: line_profiler - Line-by-Line Profiling

Install: pip install line-profiler

Add @profile decorator (line_profiler provides this)

@profile def process_data(data): """Process data with line profiling.""" result = [] for item in data: processed = item * 2 result.append(processed) return result

Run with:

kernprof -l -v script.py

Manual line profiling:

from line_profiler import LineProfiler

def process_data(data): """Function to profile.""" result = [] for item in data: processed = item * 2 result.append(processed) return result

if name == "main": lp = LineProfiler() lp.add_function(process_data)

data = list(range(100000))

lp_wrapper = lp(process_data)
lp_wrapper(data)

lp.print_stats()

Pattern 3: memory_profiler - Memory Usage

Install: pip install memory-profiler

from memory_profiler import profile

@profile def memory_intensive(): """Function that uses lots of memory.""" # Create large list big_list = [i for i in range(1000000)]

# Create large dict
big_dict = {i: i**2 for i in range(100000)}

# Process data
result = sum(big_list)

return result

if name == "main": memory_intensive()

Run with:

python -m memory_profiler script.py

Pattern 4: py-spy - Production Profiling

Install: pip install py-spy

Profile a running Python process

py-spy top --pid 12345

Generate flamegraph

py-spy record -o profile.svg --pid 12345

Profile a script

py-spy record -o profile.svg -- python script.py

Dump current call stack

py-spy dump --pid 12345

Optimization Patterns

Pattern 5: List Comprehensions vs Loops

import timeit

Slow: Traditional loop

def slow_squares(n): """Create list of squares using loop.""" result = [] for i in range(n): result.append(i**2) return result

Fast: List comprehension

def fast_squares(n): """Create list of squares using comprehension.""" return [i**2 for i in range(n)]

Benchmark

n = 100000

slow_time = timeit.timeit(lambda: slow_squares(n), number=100) fast_time = timeit.timeit(lambda: fast_squares(n), number=100)

print(f"Loop: {slow_time:.4f}s") print(f"Comprehension: {fast_time:.4f}s") print(f"Speedup: {slow_time/fast_time:.2f}x")

Even faster for simple operations: map

def faster_squares(n): """Use map for even better performance.""" return list(map(lambda x: x**2, range(n)))

Pattern 6: Generator Expressions for Memory

import sys

def list_approach(): """Memory-intensive list.""" data = [i**2 for i in range(1000000)] return sum(data)

def generator_approach(): """Memory-efficient generator.""" data = (i**2 for i in range(1000000)) return sum(data)

Memory comparison

list_data = [i for i in range(1000000)] gen_data = (i for i in range(1000000))

print(f"List size: {sys.getsizeof(list_data)} bytes") print(f"Generator size: {sys.getsizeof(gen_data)} bytes")

Generators use constant memory regardless of size

Pattern 7: String Concatenation

import timeit

def slow_concat(items): """Slow string concatenation.""" result = "" for item in items: result += str(item) return result

def fast_concat(items): """Fast string concatenation with join.""" return "".join(str(item) for item in items)

def faster_concat(items): """Even faster with list.""" parts = [str(item) for item in items] return "".join(parts)

items = list(range(10000))

Benchmark

slow = timeit.timeit(lambda: slow_concat(items), number=100) fast = timeit.timeit(lambda: fast_concat(items), number=100) faster = timeit.timeit(lambda: faster_concat(items), number=100)

print(f"Concatenation (+): {slow:.4f}s") print(f"Join (generator): {fast:.4f}s") print(f"Join (list): {faster:.4f}s")

Pattern 8: Dictionary Lookups vs List Searches

import timeit

Create test data

size = 10000 items = list(range(size)) lookup_dict = {i: i for i in range(size)}

def list_search(items, target): """O(n) search in list.""" return target in items

def dict_search(lookup_dict, target): """O(1) search in dict.""" return target in lookup_dict

target = size - 1 # Worst case for list

Benchmark

list_time = timeit.timeit( lambda: list_search(items, target), number=1000 ) dict_time = timeit.timeit( lambda: dict_search(lookup_dict, target), number=1000 )

print(f"List search: {list_time:.6f}s") print(f"Dict search: {dict_time:.6f}s") print(f"Speedup: {list_time/dict_time:.0f}x")

Pattern 9: Local Variable Access

import timeit

Global variable (slow)

GLOBAL_VALUE = 100

def use_global(): """Access global variable.""" total = 0 for i in range(10000): total += GLOBAL_VALUE return total

def use_local(): """Use local variable.""" local_value = 100 total = 0 for i in range(10000): total += local_value return total

Local is faster

global_time = timeit.timeit(use_global, number=1000) local_time = timeit.timeit(use_local, number=1000)

print(f"Global access: {global_time:.4f}s") print(f"Local access: {local_time:.4f}s") print(f"Speedup: {global_time/local_time:.2f}x")

Pattern 10: Function Call Overhead

import timeit

def calculate_inline(): """Inline calculation.""" total = 0 for i in range(10000): total += i * 2 + 1 return total

def helper_function(x): """Helper function.""" return x * 2 + 1

def calculate_with_function(): """Calculation with function calls.""" total = 0 for i in range(10000): total += helper_function(i) return total

Inline is faster due to no call overhead

inline_time = timeit.timeit(calculate_inline, number=1000) function_time = timeit.timeit(calculate_with_function, number=1000)

print(f"Inline: {inline_time:.4f}s") print(f"Function calls: {function_time:.4f}s")

For advanced optimization techniques including NumPy vectorization, caching, memory management, parallelization, async I/O, database optimization, and benchmarking tools, see references/advanced-patterns.md

Best Practices

  • Profile before optimizing - Measure to find real bottlenecks

  • Focus on hot paths - Optimize code that runs most frequently

  • Use appropriate data structures - Dict for lookups, set for membership

  • Avoid premature optimization - Clarity first, then optimize

  • Use built-in functions - They're implemented in C

  • Cache expensive computations - Use lru_cache

  • Batch I/O operations - Reduce system calls

  • Use generators for large datasets

  • Consider NumPy for numerical operations

  • Profile production code - Use py-spy for live systems

Common Pitfalls

  • Optimizing without profiling

  • Using global variables unnecessarily

  • Not using appropriate data structures

  • Creating unnecessary copies of data

  • Not using connection pooling for databases

  • Ignoring algorithmic complexity

  • Over-optimizing rare code paths

  • Not considering memory usage

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

typescript-advanced-types

No summary provided by upstream source.

Repository SourceNeeds Review
37.6K-wshobson
Coding

python-testing-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
17.9K-wshobson
Coding

code-review-excellence

No summary provided by upstream source.

Repository SourceNeeds Review
14.6K-wshobson
Coding

python-design-patterns

No summary provided by upstream source.

Repository SourceNeeds Review