PyThesisPlot
Python scientific plotting workflow tool supporting the complete process from data upload to figure generation for academic publications.
Workflow
[User Uploads Data] → [Auto-save to output dir] → [Data Analysis]
↓
[Generate Images to output dir] ← [Code Generation] ← [User Confirms Scheme]
Required Steps
- Data Reception: User uploads data file (txt/md/xlsx/csv)
- Auto-save: Rename to
timestamp-original_filename, save tooutput/YYYYMMDD-filename/ - Data Analysis: Analyze dimensions, types, statistical features, column relationships
- Chart Recommendations: Recommend chart schemes based on data characteristics (type, quantity, layout)
- User Confirmation: Display analysis report, must wait for user confirmation before generation
- Generation & Delivery: Python code + chart images, save to same output directory
Core Scripts
1. Main Workflow Script
python scripts/workflow.py --input data.csv --output-dir output/
2. Data Analysis
python scripts/data_analyzer.py --input data.csv
Output: Data characteristics report + chart recommendation scheme
3. Chart Generation
python scripts/plot_generator.py --config plot_config.json --output-dir output/
File Management Standards
Directory Structure
output/
└── 20250312-145230-data.csv/ # Named with timestamp + filename
├── 20250312-145230-data.csv # Original data file (renamed)
├── analysis_report.md # Data analysis report
├── plot_config.json # Chart configuration (generated after user confirmation)
├── 20250312-145230_plot.py # Generated Python code
├── 20250312-145230_fig1_line.png # Chart (PNG image)
└── 20250312-145230_fig2_bar.png
Naming Conventions
| File Type | Naming Format | Example |
|---|---|---|
| Data File | {timestamp}-{original} | 20250312-145230-data.csv |
| Analysis Report | analysis_report.md | analysis_report.md |
| Python Code | {timestamp}_plot.py | 20250312-145230_plot.py |
| Chart PNG | {timestamp}_fig{n}_{type}.png | 20250312-145230_fig1_line.png |
Usage
Scenario 1: Complete Workflow
When user uploads a data file:
-
Auto-save File
# Rename and save to output/{timestamp}-{filename}/ save_uploaded_file(input_file, output_base="output/") -
Execute Data Analysis
# Analyze data characteristics, generate report python scripts/data_analyzer.py --input output/20250312-data/data.csv -
Display Analysis Report to User
## Data Analysis Report ### Data Overview - File: data.csv - Dimensions: 120 rows × 5 columns - Types: 3 numeric + 2 categorical columns ### Column Details | Column | Type | Description | |-----|------|-----| | date | datetime | 2023-01 to 2023-12 | | sales | numeric | mean=1250, std=320 | | region | categorical | 4 categories: N/S/E/W | ### Chart Recommendations Based on data characteristics, the following schemes are recommended: **Scheme 1: Time Trend Analysis** ⭐Recommended - Chart Type: Line plot - Content: Sales trend over time - Reason: Time series data, most intuitive for showing trends **Scheme 2: Regional Comparison** - Chart Type: Grouped bar chart - Content: Sales comparison across regions - Reason: Categorical comparison, suitable for showing differences **Scheme 3: Comprehensive Dashboard** - Chart Type: 2×2 subplot layout - Includes: Trend line + Bar chart + Box plot + Correlation heatmap - Reason: Rich data dimensions, comprehensive display Please tell me what you want: - "Generate schemes 1 and 2" - "Generate all" - "Modify scheme 3..." (provide your modification suggestions) -
Wait for User Confirmation ⚠️ Critical Step
- User may say: "Generate scheme 1" / "Generate all" / "Modify XX..."
- Must wait for explicit instruction before entering generation phase
-
Generate and Save
# Generate Python code python scripts/plot_generator.py --config plot_config.json # Output to same directory output/20250312-data/ ├── 20250312-145230_plot.py # Code ├── 20250312-145230_fig1_line.png # Chart └── 20250312-145230_fig2_bar.png
Scenario 2: Data Analysis Only
python scripts/data_analyzer.py --input data.csv --output report.md
Scenario 3: Generate from Config
python scripts/plot_generator.py --config config.json --output-dir ./
Chart Recommendation Logic
| Data Characteristics | Recommended Chart | Application |
|---|---|---|
| Time series + Numeric | Line plot | Trend display |
| Categorical + Single numeric | Bar chart | Category comparison |
| Categorical + Distribution | Box/Violin plot | Distribution display |
| Two numeric (correlated) | Scatter (+regression) | Correlation analysis |
| Multiple numeric (correlated) | Heatmap | Correlation matrix |
| Single numeric distribution | Histogram/Density | Distribution characteristics |
| Multi-dimensional rich data | 2×2 subplots | Comprehensive display |
Supported File Formats
- CSV:
.csv(Recommended) - Excel:
.xlsx,.xls - Text:
.txt,.md(table format)
Dependencies
pandas >= 1.3.0
matplotlib >= 3.5.0
seaborn >= 0.11.0
openpyxl >= 3.0.0 # Excel support
numpy >= 1.20.0
scipy >= 1.7.0
Reference Documents
- Workflow Guide - Complete workflow instructions
- Chart Types - Detailed chart type descriptions
- Style Guide - Color schemes, fonts, size standards
- Examples - Complete code examples
Important Notes
- User confirmation is mandatory: Must wait for user confirmation after analysis, cannot generate directly
- Unified file management: All output files saved to same output/{timestamp}-{filename}/ directory
- High-resolution output: Generate PNG at 300 DPI (suitable for publication)
- Code traceability: Generated Python code also saved to same directory for user modification
- Academic style: Charts follow top journal standards (Nature/Science/Lancet style)