gcp-bq-data-export

Use this skill when exporting data from BigQuery to Cloud Storage or local files.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "gcp-bq-data-export" with this command: npx skills add funnelenvy/agents_webinar_demos/funnelenvy-agents-webinar-demos-gcp-bq-data-export

BigQuery Data Export

Use this skill when exporting data from BigQuery to Cloud Storage or local files.

Basic Extract Command

bq extract
--location=LOCATION
--destination_format=FORMAT
--compression=COMPRESSION
PROJECT:DATASET.TABLE
gs://bucket/file.ext

Extract to CSV

Basic CSV Export

bq extract
--destination_format=CSV
--print_header=true
dataset.table
gs://bucket/export.csv

CSV with Options

bq extract
--destination_format=CSV
--compression=GZIP
--field_delimiter=','
--print_header=true
dataset.table
gs://bucket/export.csv.gz

CSV flags:

  • --field_delimiter=','

  • Column separator

  • --print_header=true/false

  • Include header row

Tab-Delimited Export

bq extract
--destination_format=CSV
--field_delimiter=$'\t'
dataset.table
gs://bucket/export.tsv

Extract to JSON

Newline-Delimited JSON

bq extract
--destination_format=NEWLINE_DELIMITED_JSON
--compression=GZIP
dataset.table
gs://bucket/export.json.gz

Output format:

{"id": 1, "name": "Alice", "amount": 100.50} {"id": 2, "name": "Bob", "amount": 250.75}

Extract to Avro

bq extract
--destination_format=AVRO
--compression=SNAPPY
dataset.table
gs://bucket/export.avro

Benefits:

  • Preserves schema

  • Efficient binary format

  • Fast re-import to BigQuery

Extract to Parquet

bq extract
--destination_format=PARQUET
dataset.table
gs://bucket/export.parquet

Benefits:

  • Columnar format

  • Good compression

  • Compatible with many analytics tools

Compression Options

Available compression:

  • GZIP

  • Good compression, slower (CSV, JSON, Avro)

  • SNAPPY

  • Fast, moderate compression (Avro, Parquet)

  • DEFLATE

  • Similar to GZIP (Avro)

  • NONE

  • No compression (fastest)

Example:

bq extract
--destination_format=CSV
--compression=GZIP
dataset.table
gs://bucket/export.csv.gz

Large Table Exports

Using Wildcards (>1 GB)

BigQuery limitation: 1 GB per file

Solution: Use wildcard in destination

bq extract
--destination_format=CSV
dataset.large_table
'gs://bucket/export-*.csv'

Output:

gs://bucket/export-000000000000.csv gs://bucket/export-000000000001.csv gs://bucket/export-000000000002.csv ...

Shard Pattern

Create sharded exports

bq extract
--destination_format=AVRO
dataset.large_table
'gs://bucket/shard/data-*.avro'

Note: Number of files depends on data size, not configurable.

Export Specific Partitions

Single Partition

Export 2024-01-15 partition only

bq extract
--destination_format=CSV
dataset.partitioned_table$20240115
gs://bucket/export_20240115.csv

Date Range (use WHERE in EXPORT DATA)

See EXPORT DATA section below.

EXPORT DATA SQL Statement

Basic EXPORT DATA

EXPORT DATA OPTIONS( uri='gs://bucket/export-*.csv', format='CSV', overwrite=true, header=true, field_delimiter=',' ) AS SELECT * FROM project.dataset.table WHERE date >= '2024-01-01';

Export Query Results

EXPORT DATA OPTIONS( uri='gs://bucket/aggregated-.parquet', format='PARQUET', overwrite=true ) AS SELECT customer_id, DATE(order_timestamp) as order_date, SUM(amount) as total_amount, COUNT() as order_count FROM project.dataset.orders WHERE DATE(order_timestamp) >= '2024-01-01' GROUP BY customer_id, order_date;

Format Options

CSV:

EXPORT DATA OPTIONS( uri='gs://bucket/*.csv', format='CSV', header=true, field_delimiter=',', compression='GZIP' ) AS SELECT ...;

JSON:

EXPORT DATA OPTIONS( uri='gs://bucket/*.json', format='JSON', compression='GZIP' ) AS SELECT ...;

Avro:

EXPORT DATA OPTIONS( uri='gs://bucket/*.avro', format='AVRO', compression='SNAPPY' ) AS SELECT ...;

Parquet:

EXPORT DATA OPTIONS( uri='gs://bucket/*.parquet', format='PARQUET' ) AS SELECT ...;

Export to Local Files (Not Recommended)

Small Results via Query

For small datasets only

bq query
--format=csv
--max_rows=10000
--use_legacy_sql=false
'SELECT * FROM project.dataset.table LIMIT 10000' \

local_export.csv

Limitation: Not suitable for large datasets. Use GCS for production.

Export Scheduled (Automation)

Using Cloud Scheduler + EXPORT DATA

Create scheduled query

bq mk --transfer_config
--target_dataset=dataset
--display_name='Daily Export'
--schedule='every 24 hours'
--params='{"query":"EXPORT DATA OPTIONS(uri='''gs://bucket/daily-*.csv''', format='''CSV''') AS SELECT * FROM dataset.table WHERE date = CURRENT_DATE()"}'
--data_source=scheduled_query

Using Cloud Composer (Airflow)

from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator

export_task = BigQueryInsertJobOperator( task_id='export_to_gcs', configuration={ 'extract': { 'sourceTable': { 'projectId': 'project', 'datasetId': 'dataset', 'tableId': 'table' }, 'destinationUris': ['gs://bucket/export-*.csv'], 'destinationFormat': 'CSV' } } )

Monitoring Exports

Check Extract Jobs

bq ls --jobs --max_results=10

Job Details

bq show -j JOB_ID

Failed Exports

SELECT job_id, user_email, error_result.message as error_message, creation_time FROM region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE job_type = 'EXTRACT' AND state = 'DONE' AND error_result IS NOT NULL AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR) ORDER BY creation_time DESC;

Export Best Practices

Format Selection

CSV:

  • ✅ Human-readable

  • ✅ Universal compatibility

  • ❌ Larger file size

  • ❌ No schema preservation

JSON:

  • ✅ Human-readable

  • ✅ Preserves nested structures

  • ❌ Larger file size

Avro:

  • ✅ Preserves schema

  • ✅ Efficient binary format

  • ✅ Fast BigQuery re-import

  • ❌ Not human-readable

Parquet:

  • ✅ Columnar format

  • ✅ Good compression

  • ✅ Analytics tool compatible

  • ❌ Not human-readable

Compression Recommendations

For long-term storage: GZIP (best compression) For processing pipelines: SNAPPY (fast) For network transfer: GZIP (smaller size) For speed: NONE (no compression overhead)

Wildcards for Large Exports

Always use wildcards for:

  • Tables >500 MB

  • Unknown data size

  • Distributed processing

Example:

bq extract dataset.large_table 'gs://bucket/export-*.avro'

Cost Considerations

Export Costs

  • BigQuery extract: FREE

  • GCS storage: Standard GCS pricing

  • Network egress: Free within same region

Optimization

Reduce costs:

  • Export only needed columns (use EXPORT DATA with SELECT)

  • Filter rows before export (WHERE clause)

  • Use compression (smaller files)

  • Export to GCS in same region as BigQuery

Example - filtered export:

EXPORT DATA OPTIONS( uri='gs://bucket/*.parquet', format='PARQUET' ) AS SELECT customer_id, order_date, amount -- Only needed columns FROM project.dataset.orders WHERE date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) -- Last 30 days only AND amount > 0; -- Filter out zero amounts

Common Patterns

Daily Export

#!/bin/bash DATE=$(date +%Y%m%d) bq extract
--destination_format=CSV
--compression=GZIP
dataset.table$$DATE
gs://bucket/exports/daily_export_$DATE.csv.gz

Incremental Export

-- Create temp table with new data CREATE TEMP TABLE new_data AS SELECT * FROM project.dataset.table WHERE updated_at >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR);

-- Export only new data EXPORT DATA OPTIONS( uri='gs://bucket/incremental/data-*.parquet', format='PARQUET' ) AS SELECT * FROM new_data;

Export with Transformation

EXPORT DATA OPTIONS( uri='gs://bucket/transformed-*.csv', format='CSV' ) AS SELECT customer_id, UPPER(customer_name) as customer_name, ROUND(amount, 2) as amount, FORMAT_DATE('%Y-%m-%d', order_date) as order_date FROM project.dataset.orders WHERE order_date >= '2024-01-01';

Troubleshooting

"Permission denied"

Problem: No write access to GCS bucket Solution: Grant BigQuery service account Storage Object Creator role

"Table too large"

Problem: Export exceeds 1GB without wildcard Solution: Use wildcard pattern gs://bucket/export-*.csv

"Invalid URI"

Problem: Incorrect GCS path format Solution: Use gs://bucket/path/file format, not https://

"Quota exceeded"

Problem: Too many extract jobs Solution: Batch exports or increase quota

Quick Reference

Format recommendations:

  • Re-import to BigQuery → Avro

  • Analytics tools → Parquet

  • Data exchange → CSV

  • API consumption → JSON

Compression guide:

  • Best ratio → GZIP

  • Fastest → SNAPPY or NONE

  • Balance → SNAPPY

Size limits:

  • 1 GB per file (use wildcards)

  • 10 TB per extract job

  • 50,000 URIs per export

Syntax patterns:

Single file

gs://bucket/file.csv

Wildcard (recommended)

'gs://bucket/prefix-*.csv'

Sharded with path

'gs://bucket/path/to/shard-*.parquet'

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

n8n-workflow

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

bq-query-optimization

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

playwright-browser

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

hubspot-crm

No summary provided by upstream source.

Repository SourceNeeds Review