Python Performance

Master Python optimization techniques, profiling, memory management, and high-performance computing

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "Python Performance" with this command: npx skills add pluginagentmarketplace/custom-plugin-python/pluginagentmarketplace-custom-plugin-python-python-performance

Python Performance Optimization

Overview

Master performance optimization in Python. Learn to profile code, identify bottlenecks, optimize algorithms, manage memory efficiently, and leverage high-performance libraries for compute-intensive tasks.

Learning Objectives

  • Profile Python code to identify bottlenecks
  • Optimize algorithms and data structures
  • Manage memory efficiently
  • Use compiled extensions (Cython, NumPy)
  • Implement caching strategies
  • Parallelize CPU-bound operations
  • Benchmark and measure improvements

Core Topics

1. Profiling & Benchmarking

  • timeit module for micro-benchmarks
  • cProfile for function-level profiling
  • line_profiler for line-by-line analysis
  • memory_profiler for memory usage
  • py-spy for production profiling
  • Flame graphs and visualization

Code Example:

import timeit
import cProfile
import pstats

# 1. timeit for micro-benchmarks
def list_comprehension():
    return [x**2 for x in range(1000)]

def map_function():
    return list(map(lambda x: x**2, range(1000)))

# Compare performance
time_lc = timeit.timeit(list_comprehension, number=10000)
time_map = timeit.timeit(map_function, number=10000)
print(f"List comprehension: {time_lc:.4f}s")
print(f"Map function: {time_map:.4f}s")

# 2. cProfile for function profiling
def process_data():
    data = []
    for i in range(100000):
        data.append(i ** 2)
    return sum(data)

profiler = cProfile.Profile()
profiler.enable()
result = process_data()
profiler.disable()

stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)

# 3. Line profiling (requires line_profiler package)
# @profile decorator (add manually for line_profiler)
def slow_function():
    total = 0
    for i in range(1000000):
        total += i ** 2
    return total

# Run with: kernprof -l -v script.py

# 4. Memory profiling
from memory_profiler import profile

@profile
def memory_intensive():
    large_list = [i for i in range(1000000)]
    large_dict = {i: i**2 for i in range(1000000)}
    return len(large_list) + len(large_dict)

# Run with: python -m memory_profiler script.py

2. Algorithm & Data Structure Optimization

  • Choosing efficient data structures
  • Time complexity analysis
  • Generator expressions vs lists
  • Set operations for lookups
  • Deque for queue operations
  • Bisect for sorted lists

Code Example:

import bisect
from collections import deque, Counter, defaultdict
import time

# 1. List vs Set for membership testing
# Bad: O(n) lookup
def find_in_list(items, target):
    return target in items  # Linear search

# Good: O(1) lookup
def find_in_set(items, target):
    items_set = set(items)
    return target in items_set

items = list(range(100000))
# List: 0.001s, Set: 0.000001s (1000x faster!)

# 2. Generator expressions for memory efficiency
# Bad: Creates entire list in memory
squares_list = [x**2 for x in range(1000000)]  # ~4MB

# Good: Generates on-demand
squares_gen = (x**2 for x in range(1000000))   # ~128 bytes

# 3. Deque for efficient queue operations
# Bad: O(n) pop from beginning
queue_list = list(range(10000))
queue_list.pop(0)  # Slow

# Good: O(1) pop from both ends
queue_deque = deque(range(10000))
queue_deque.popleft()  # Fast

# 4. Bisect for maintaining sorted lists
# Bad: O(n) insertion into sorted list
sorted_list = []
for i in [5, 2, 8, 1, 9]:
    sorted_list.append(i)
    sorted_list.sort()

# Good: O(log n) insertion
sorted_list = []
for i in [5, 2, 8, 1, 9]:
    bisect.insort(sorted_list, i)

# 5. Counter for frequency counting
# Bad: Manual counting
word_count = {}
for word in words:
    if word in word_count:
        word_count[word] += 1
    else:
        word_count[word] = 1

# Good: Counter
word_count = Counter(words)
most_common = word_count.most_common(10)

3. Memory Management

  • Memory allocation and garbage collection
  • Object pooling
  • Slots for memory-efficient classes
  • Reference counting
  • Weak references
  • Memory leaks detection

Code Example:

import gc
import sys
from weakref import WeakValueDictionary

# 1. __slots__ for memory-efficient classes
# Bad: Regular class (56 bytes per instance)
class RegularPoint:
    def __init__(self, x, y):
        self.x = x
        self.y = y

# Good: Slots class (32 bytes per instance - 43% smaller!)
class SlottedPoint:
    __slots__ = ['x', 'y']

    def __init__(self, x, y):
        self.x = x
        self.y = y

print(sys.getsizeof(RegularPoint(1, 2)))  # 56 bytes
print(sys.getsizeof(SlottedPoint(1, 2)))  # 32 bytes

# 2. Object pooling for expensive objects
class ObjectPool:
    def __init__(self, factory, max_size=10):
        self.factory = factory
        self.max_size = max_size
        self.pool = []

    def acquire(self):
        if self.pool:
            return self.pool.pop()
        return self.factory()

    def release(self, obj):
        if len(self.pool) < self.max_size:
            self.pool.append(obj)

# Usage
db_pool = ObjectPool(lambda: DatabaseConnection(), max_size=5)
conn = db_pool.acquire()
# Use connection
db_pool.release(conn)

# 3. Weak references to prevent memory leaks
class Cache:
    def __init__(self):
        self._cache = WeakValueDictionary()

    def get(self, key):
        return self._cache.get(key)

    def set(self, key, value):
        self._cache[key] = value

# 4. Manual garbage collection for large operations
def process_large_dataset():
    for batch in large_data:
        process_batch(batch)
        # Force garbage collection after each batch
        gc.collect()

# 5. Context managers for resource cleanup
class ManagedResource:
    def __enter__(self):
        self.resource = allocate_resource()
        return self.resource

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.resource.cleanup()
        return False

4. High-Performance Computing

  • NumPy vectorization
  • Numba JIT compilation
  • Cython for C extensions
  • Multiprocessing for parallelism
  • Concurrent.futures
  • Performance comparison

Code Example:

import numpy as np
from numba import jit
import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor

# 1. NumPy vectorization
# Bad: Python loops (slow)
def python_sum(n):
    total = 0
    for i in range(n):
        total += i ** 2
    return total

# Good: NumPy vectorization (100x faster!)
def numpy_sum(n):
    arr = np.arange(n)
    return np.sum(arr ** 2)

# Benchmark: python_sum(1000000) = 0.15s
#           numpy_sum(1000000)  = 0.002s

# 2. Numba JIT compilation
@jit(nopython=True)  # Compile to machine code
def fast_function(n):
    total = 0
    for i in range(n):
        total += i ** 2
    return total

# First call: compilation + execution
# Subsequent calls: 50x faster than pure Python!

# 3. Multiprocessing for CPU-bound tasks
def cpu_intensive_task(n):
    return sum(i * i for i in range(n))

# Single process
result = cpu_intensive_task(10000000)

# Multiple processes
with ProcessPoolExecutor(max_workers=4) as executor:
    ranges = [2500000, 2500000, 2500000, 2500000]
    results = executor.map(cpu_intensive_task, ranges)
    total = sum(results)

# 4x speedup on 4 cores!

# 4. Caching for expensive computations
from functools import lru_cache

@lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# fibonacci(100) without cache: ~forever
# fibonacci(100) with cache: instant

# 5. Memory views for zero-copy operations
def process_array(data):
    # Bad: Creates copy
    subset = data[1000:2000]

    # Good: Zero-copy view
    view = memoryview(data)[1000:2000]

Hands-On Practice

Project 1: Performance Profiler

Build a comprehensive profiling tool.

Requirements:

  • CPU profiling with cProfile
  • Memory profiling
  • Line-by-line analysis
  • Visualization (flame graphs)
  • HTML report generation
  • Bottleneck identification

Key Skills: Profiling tools, visualization, analysis

Project 2: Data Processing Pipeline

Optimize data processing pipeline.

Requirements:

  • Load large CSV files (1GB+)
  • Transform and clean data
  • Aggregate statistics
  • Compare Python/NumPy/Pandas approaches
  • Measure memory usage
  • Optimize to <2GB RAM

Key Skills: NumPy, memory optimization, benchmarking

Project 3: Parallel Computing

Implement parallel algorithms.

Requirements:

  • Matrix multiplication
  • Image processing
  • Monte Carlo simulation
  • Compare threading/multiprocessing/asyncio
  • Measure speedup
  • Handle shared state

Key Skills: Parallelism, performance measurement

Assessment Criteria

  • Profile code to identify bottlenecks
  • Choose appropriate data structures
  • Optimize algorithms for time complexity
  • Manage memory efficiently
  • Use vectorization where applicable
  • Implement effective caching
  • Parallelize CPU-bound operations

Resources

Official Documentation

Learning Platforms

Tools

Next Steps

After mastering Python performance, explore:

  • Cython - C extensions for Python
  • PyPy - Alternative Python interpreter
  • Dask - Parallel computing library
  • CUDA - GPU programming with Python

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

pandas data analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

machine learning

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

python fundamentals

No summary provided by upstream source.

Repository SourceNeeds Review