pandas-construction-analysis

Pandas Construction Data Analysis

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "pandas-construction-analysis" with this command: npx skills add datadrivenconstruction/ddc_skills_for_ai_agents_in_construction/datadrivenconstruction-ddc-skills-for-ai-agents-in-construction-pandas-construction-analysis

Pandas Construction Data Analysis

Overview

Based on DDC methodology (Chapter 2.3), this skill provides comprehensive Pandas operations for construction data processing. Pandas is the Swiss Army knife for data analysts - handling everything from simple data filtering to complex aggregations across millions of rows.

Book Reference: "Pandas DataFrame и LLM ChatGPT" / "Pandas DataFrame and LLM ChatGPT"

"Используя Pandas, вы можете управлять и анализировать наборы данных, намного превосходящие возможности Excel. В то время как Excel способен обрабатывать до 1 миллиона строк данных, Pandas может без труда работать с наборами данных, содержащими десятки миллионов строк." — DDC Book, Chapter 2.3

Quick Start

import pandas as pd

Read construction data

df = pd.read_excel("bim_export.xlsx")

Basic operations

print(df.head()) # First 5 rows print(df.info()) # Column types and memory print(df.describe()) # Statistics for numeric columns

Filter structural elements

structural = df[df['Category'] == 'Structural']

Calculate total volume

total_volume = df['Volume'].sum() print(f"Total volume: {total_volume:.2f} m³")

DataFrame Fundamentals

Creating DataFrames

import pandas as pd

From dictionary (construction elements)

elements = pd.DataFrame({ 'ElementId': ['E001', 'E002', 'E003', 'E004'], 'Category': ['Wall', 'Floor', 'Wall', 'Column'], 'Material': ['Concrete', 'Concrete', 'Brick', 'Steel'], 'Volume_m3': [45.5, 120.0, 32.0, 8.5], 'Level': ['Level 1', 'Level 1', 'Level 2', 'Level 1'] })

From CSV

df_csv = pd.read_csv("construction_data.csv")

From Excel

df_excel = pd.read_excel("project_data.xlsx", sheet_name="Elements")

From multiple Excel sheets

all_sheets = pd.read_excel("project.xlsx", sheet_name=None) # Dict of DataFrames

Data Types in Construction

Common data types for construction

df = pd.DataFrame({ 'element_id': pd.Series(['W001', 'W002'], dtype='string'), 'quantity': pd.Series([10, 20], dtype='int64'), 'volume': pd.Series([45.5, 32.0], dtype='float64'), 'is_structural': pd.Series([True, False], dtype='bool'), 'created_date': pd.to_datetime(['2024-01-15', '2024-01-16']), 'category': pd.Categorical(['Wall', 'Slab']) })

Check data types

print(df.dtypes)

Convert types

df['quantity'] = df['quantity'].astype('float64') df['volume'] = pd.to_numeric(df['volume'], errors='coerce')

Filtering and Selection

Basic Filtering

Single condition

walls = df[df['Category'] == 'Wall']

Multiple conditions (AND)

large_concrete = df[(df['Material'] == 'Concrete') & (df['Volume_m3'] > 50)]

Multiple conditions (OR)

walls_or_floors = df[(df['Category'] == 'Wall') | (df['Category'] == 'Floor')]

Using isin for multiple values

structural = df[df['Category'].isin(['Wall', 'Column', 'Beam', 'Foundation'])]

String contains

insulated = df[df['Description'].str.contains('insulated', case=False, na=False)]

Null value filtering

incomplete = df[df['Cost'].isna()] complete = df[df['Cost'].notna()]

Advanced Selection

Select columns

volumes = df[['ElementId', 'Category', 'Volume_m3']]

Query syntax (SQL-like)

result = df.query("Category == 'Wall' and Volume_m3 > 30")

Loc and iloc

specific_row = df.loc[0] # By label range_rows = df.iloc[0:10] # By position specific_cell = df.loc[0, 'Volume_m3'] # Row and column subset = df.loc[0:5, ['Category', 'Volume_m3']] # Range with columns

Grouping and Aggregation

GroupBy Operations

Basic groupby

by_category = df.groupby('Category')['Volume_m3'].sum()

Multiple aggregations

summary = df.groupby('Category').agg({ 'Volume_m3': ['sum', 'mean', 'count'], 'Cost': ['sum', 'mean'] })

Named aggregations (cleaner output)

summary = df.groupby('Category').agg( total_volume=('Volume_m3', 'sum'), avg_volume=('Volume_m3', 'mean'), element_count=('ElementId', 'count'), total_cost=('Cost', 'sum') ).reset_index()

Multiple grouping columns

by_level_cat = df.groupby(['Level', 'Category']).agg({ 'Volume_m3': 'sum', 'Cost': 'sum' }).reset_index()

Pivot Tables

Create pivot table

pivot = pd.pivot_table( df, values='Volume_m3', index='Level', columns='Category', aggfunc='sum', fill_value=0, margins=True, # Add totals margins_name='Total' )

Multiple values

pivot_detailed = pd.pivot_table( df, values=['Volume_m3', 'Cost'], index='Level', columns='Category', aggfunc={'Volume_m3': 'sum', 'Cost': 'mean'} )

Data Transformation

Adding Calculated Columns

Simple calculation

df['Cost_Total'] = df['Volume_m3'] * df['Unit_Price']

Conditional column

df['Size_Category'] = df['Volume_m3'].apply( lambda x: 'Large' if x > 50 else ('Medium' if x > 20 else 'Small') )

Using np.where for binary conditions

import numpy as np df['Is_Large'] = np.where(df['Volume_m3'] > 50, True, False)

Using cut for binning

df['Volume_Bin'] = pd.cut( df['Volume_m3'], bins=[0, 10, 50, 100, float('inf')], labels=['XS', 'S', 'M', 'L'] )

String Operations

Extract from strings

df['Level_Number'] = df['Level'].str.extract(r'(\d+)').astype(int)

Split and expand

df[['Building', 'Floor']] = df['Location'].str.split('-', expand=True)

Clean strings

df['Category'] = df['Category'].str.strip().str.lower().str.title()

Replace values

df['Material'] = df['Material'].str.replace('Reinforced Concrete', 'RC')

Date Operations

Parse dates

df['Start_Date'] = pd.to_datetime(df['Start_Date'])

Extract components

df['Year'] = df['Start_Date'].dt.year df['Month'] = df['Start_Date'].dt.month df['Week'] = df['Start_Date'].dt.isocalendar().week df['DayOfWeek'] = df['Start_Date'].dt.day_name()

Calculate duration

df['Duration_Days'] = (df['End_Date'] - df['Start_Date']).dt.days

Filter by date range

recent = df[df['Start_Date'] >= '2024-01-01']

Merging and Joining

Merge DataFrames

Elements data

elements = pd.DataFrame({ 'ElementId': ['E001', 'E002', 'E003'], 'Category': ['Wall', 'Floor', 'Column'], 'Volume_m3': [45.5, 120.0, 8.5] })

Unit prices

prices = pd.DataFrame({ 'Category': ['Wall', 'Floor', 'Column', 'Beam'], 'Unit_Price': [150, 80, 450, 200] })

Inner join (only matching)

merged = elements.merge(prices, on='Category', how='inner')

Left join (keep all elements)

merged = elements.merge(prices, on='Category', how='left')

Join on different column names

result = df1.merge(df2, left_on='elem_id', right_on='ElementId')

Concatenating DataFrames

Vertical concatenation (stacking)

all_floors = pd.concat([floor1_df, floor2_df, floor3_df], ignore_index=True)

Horizontal concatenation

combined = pd.concat([quantities, costs, schedule], axis=1)

Append new rows

new_elements = pd.DataFrame({'ElementId': ['E004'], 'Category': ['Beam']}) df = pd.concat([df, new_elements], ignore_index=True)

Construction-Specific Analyses

Quantity Take-Off (QTO)

def generate_qto_report(df): """Generate Quantity Take-Off summary by category""" qto = df.groupby(['Category', 'Material']).agg( count=('ElementId', 'count'), total_volume=('Volume_m3', 'sum'), total_area=('Area_m2', 'sum'), avg_volume=('Volume_m3', 'mean') ).round(2)

# Add percentage column
qto['volume_pct'] = (qto['total_volume'] /
                      qto['total_volume'].sum() * 100).round(1)

return qto.sort_values('total_volume', ascending=False)

Usage

qto_report = generate_qto_report(df) qto_report.to_excel("qto_report.xlsx")

Cost Estimation

def calculate_project_cost(elements_df, prices_df, markup=0.15): """Calculate total project cost with markup""" # Merge with prices df = elements_df.merge(prices_df, on='Category', how='left')

# Calculate base cost
df['Base_Cost'] = df['Volume_m3'] * df['Unit_Price']

# Apply markup
df['Total_Cost'] = df['Base_Cost'] * (1 + markup)

# Summary by category
summary = df.groupby('Category').agg(
    volume=('Volume_m3', 'sum'),
    base_cost=('Base_Cost', 'sum'),
    total_cost=('Total_Cost', 'sum')
).round(2)

return df, summary, summary['total_cost'].sum()

Usage

detailed, summary, total = calculate_project_cost(elements, prices) print(f"Project Total: ${total:,.2f}")

Material Summary

def material_summary(df): """Summarize materials across project""" summary = df.groupby('Material').agg({ 'Volume_m3': 'sum', 'Weight_kg': 'sum', 'ElementId': 'nunique' }).rename(columns={'ElementId': 'Element_Count'})

summary['Volume_Pct'] = (summary['Volume_m3'] /
                          summary['Volume_m3'].sum() * 100).round(1)

return summary.sort_values('Volume_m3', ascending=False)

Level-by-Level Analysis

def analyze_by_level(df): """Analyze construction quantities by building level""" level_summary = df.pivot_table( values=['Volume_m3', 'Cost'], index='Level', columns='Category', aggfunc='sum', fill_value=0 )

level_summary['Total_Volume'] = level_summary['Volume_m3'].sum(axis=1)
level_summary['Total_Cost'] = level_summary['Cost'].sum(axis=1)

return level_summary

Data Export

Export to Excel with Multiple Sheets

def export_to_excel_formatted(df, summary, filepath): """Export with multiple sheets""" with pd.ExcelWriter(filepath, engine='openpyxl') as writer: df.to_excel(writer, sheet_name='Details', index=False) summary.to_excel(writer, sheet_name='Summary')

    pivot = pd.pivot_table(df, values='Volume_m3',
                           index='Level', columns='Category')
    pivot.to_excel(writer, sheet_name='By_Level')

Usage

export_to_excel_formatted(elements, qto_summary, "project_report.xlsx")

Export to CSV

Basic export

df.to_csv("output.csv", index=False)

With encoding for special characters

df.to_csv("output.csv", index=False, encoding='utf-8-sig')

Specific columns

df[['ElementId', 'Category', 'Volume_m3']].to_csv("volumes.csv", index=False)

Performance Tips

Use categories for string columns with few unique values

df['Category'] = df['Category'].astype('category')

Read only needed columns

df = pd.read_csv("large_file.csv", usecols=['ElementId', 'Category', 'Volume'])

Use chunking for very large files

chunks = pd.read_csv("huge_file.csv", chunksize=100000) result = pd.concat([chunk[chunk['Category'] == 'Wall'] for chunk in chunks])

Check memory usage

print(df.memory_usage(deep=True).sum() / 1024**2, "MB")

Quick Reference

Operation Code

Read Excel pd.read_excel("file.xlsx")

Read CSV pd.read_csv("file.csv")

Filter rows df[df['Column'] == 'Value']

Select columns df[['Col1', 'Col2']]

Group and sum df.groupby('Cat')['Vol'].sum()

Pivot table pd.pivot_table(df, values='Vol', index='Level')

Merge df1.merge(df2, on='key')

Add column df['New'] = df['A'] * df['B']

Export Excel df.to_excel("out.xlsx", index=False)

Resources

Next Steps

  • See llm-data-automation for generating Pandas code with AI

  • See qto-report for specialized QTO calculations

  • See cost-estimation-resource for detailed cost calculations

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

weather-impact-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

data-evolution-analysis

No summary provided by upstream source.

Repository SourceNeeds Review
Research

bid-analysis-comparator

No summary provided by upstream source.

Repository SourceNeeds Review
Research

5000-projects-analysis

No summary provided by upstream source.

Repository SourceNeeds Review