Goldsky Dataset Reference
Reference tables for blockchain datasets available in Turbo pipelines.
For quick dataset questions (e.g., "what dataset for Solana transfers?"), answer directly: identify the chain prefix (see Popular Chain Prefixes below), identify the dataset type (see Common Datasets), and return a YAML snippet like:
sources:
my_source:
type: dataset
dataset_name: <chain>.<dataset>
version: 1.0.0
start_at: earliest
Tip: Use
goldsky turbo validateto verify a dataset exists (fast, ~3 seconds). Avoidgoldsky dataset listwhich is slow (30-60+ seconds).
Dataset Reference Files
Detailed dataset and chain information is in the
data/folder.
| File | Contents |
|---|---|
verified-datasets.json | All validated datasets with versions, schemas, and use cases |
chain-prefixes.json | All chain prefixes, chain IDs, and common mistakes |
Data location: data/ (relative to this skill's directory)
Quick Reference
| Action | Command | Notes |
|---|---|---|
| Validate dataset | goldsky turbo validate file.yaml | Preferred - fast (3s) |
| Search for dataset | goldsky dataset list | grep "name" | Slow (30-60s), use sparingly |
| List all datasets | goldsky dataset list | Very slow - avoid |
Common Datasets
| What You Need | Dataset | Example |
|---|---|---|
| Token transfers (ERC-20) | <chain>.erc20_transfers | base.erc20_transfers (v1.2.0) |
| NFT transfers (ERC-721) | <chain>.erc721_transfers | ethereum.erc721_transfers (v1.0.0) |
| Transactions | <chain>.raw_transactions | ethereum.raw_transactions (v1.0.0) |
| Event logs | <chain>.raw_logs | base.raw_logs (v1.0.0) |
| Solana tokens | solana.token_transfers | v1.0.0 |
| Bitcoin transactions | bitcoin.raw.transactions | v1.0.0 |
| Stellar transfers | stellar_mainnet.transfers | v1.1.0 |
Important: Use
raw_transactions, NOTtransactions
Popular Chain Prefixes
| Chain | Prefix | Note |
|---|---|---|
| Ethereum | ethereum | |
| Base | base | |
| Polygon | matic | NOT polygon |
| Arbitrum | arbitrum | |
| Optimism | optimism | |
| BSC | bsc | |
| Avalanche | avalanche | |
| Solana | solana | Uses start_block not start_at |
| Bitcoin | bitcoin.raw | Uses start_at like EVM |
| Stellar | stellar_mainnet | Uses start_at like EVM |
| Sui | sui | Uses start_at like EVM |
| NEAR | near | Uses start_at like EVM |
| Starknet | starknet | Uses start_at like EVM |
| Fogo | fogo | Uses start_at like EVM |
See data/chain-prefixes.json for complete list with chain IDs.
Common Dataset Types
EVM Chains
| Dataset Type | Description | Use Case |
|---|---|---|
blocks | Block headers with metadata | Block explorers, timing analysis |
raw_transactions | Transaction data | Wallet activity, gas analysis |
raw_logs | Raw event logs | Custom event filtering |
raw_traces | Internal transaction traces | MEV analysis, contract interactions |
erc20_transfers | Fungible token transfers | Token tracking, DeFi analytics |
erc721_transfers | NFT transfers | NFT marketplaces, ownership tracking |
erc1155_transfers | Multi-token transfers | Gaming, multi-token standards |
decoded_logs | ABI-decoded event logs | Specific contract events |
Important: Use
raw_transactions, NOTtransactions. Useraw_logs, NOTlogs(thoughlogsworks as an alias on some chains).
Solana
| Dataset Type | Description | Use Case |
|---|---|---|
blocks | Block data with leader info | Chain analysis |
transactions | Transaction data with balances | Wallet activity |
transactions_with_instructions | Transactions + nested instructions | Multi-instruction analysis |
instructions | Individual instructions | Program-specific analysis |
token_transfers | SPL token transfers | Token tracking |
native_balances | SOL balance changes | Whale tracking |
token_balances | SPL token balance changes | Portfolio tracking |
rewards | Validator rewards | Staking analysis |
Bitcoin
| Dataset Type | Description | Use Case |
|---|---|---|
bitcoin.raw.blocks | Block data (hash, difficulty, size) | Network analysis |
bitcoin.raw.transactions | Transactions (inputs, outputs, values) | Payment tracking |
Stellar
All datasets use version 1.1.0:
| Dataset Type | Description | Use Case |
|---|---|---|
stellar_mainnet.transactions | All network transactions | Account monitoring |
stellar_mainnet.transfers | All transfer events | Asset tracking |
stellar_mainnet.events | All events (contract + operation) | Contract monitoring |
stellar_mainnet.operations | Operations within transactions | Action tracking |
stellar_mainnet.ledger_entries | Ledger state changes | State analysis |
stellar_mainnet.ledgers | Ledger metadata | Network analysis |
stellar_mainnet.balances | Account balance changes | Balance tracking |
Sui
| Dataset Type | Description | Use Case |
|---|---|---|
sui.checkpoints | Checkpoint data | Chain analysis |
sui.transactions | Transaction data | Activity monitoring |
sui.events | Move contract events | dApp event tracking |
sui.packages | Deployed Move packages | Package discovery |
sui.epochs | Epoch data with validators | Staking/validator analysis |
NEAR
| Dataset Type | Description | Use Case |
|---|---|---|
near.receipts | Execution receipts | Contract interaction tracking |
near.transactions | Signed transactions | Activity monitoring |
near.execution_outcomes | Execution results | Success/failure analysis |
Starknet
| Dataset Type | Description | Use Case |
|---|---|---|
starknet.blocks | Block data | Chain analysis |
starknet.transactions | Transaction data | Activity monitoring |
starknet.events | Contract events | dApp event tracking |
starknet.messages | L1↔L2 messages | Bridge monitoring |
Fogo
| Dataset Type | Description | Use Case |
|---|---|---|
fogo.transactions_with_instructions | Transactions with instructions | Full activity tracking |
fogo.rewards | Validator rewards | Staking analysis |
fogo.blocks | Block data | Chain analysis |
Dataset Schemas
Source: docs.goldsky.com. Do not use field names not listed here — ask the user to run
goldsky dataset listto inspect unknown schemas.
Solana
solana.transactions
| Field | Type | Notes |
|---|---|---|
id | string | |
index | integer | tx position in block |
block_slot | integer | slot number |
block_hash | string | |
block_timestamp | timestamp | |
signature | string | transaction signature |
recent_block_hash | string | |
fee | integer | in lamports |
status | integer | 1 = success |
err | string | null | error if failed |
accounts | string[] | all involved accounts |
balance_changes | object[] | {account, before, after} in lamports |
log_messages | string[] | program execution logs |
compute_units_consumed | integer |
No
from_addressorto_addresson Solana transactions — useaccountsarray instead.
solana.transactions_with_instructions
All fields from solana.transactions plus:
| Field | Type | Notes |
|---|---|---|
pre_token_balances | object[] | token balances before tx |
post_token_balances | object[] | token balances after tx |
instructions | object[] | see below |
Instruction object fields: id, index, parent_index, block_slot, block_timestamp, block_hash, tx_fee, tx_index, program_id, data (base58), accounts (string[]), status, err
solana.instructions
| Field | Type | Notes |
|---|---|---|
id | string | |
index | integer | position in tx |
parent_index | integer | null | for inner instructions |
block_slot | integer | |
block_timestamp | timestamp | |
block_hash | string | |
program_id | string | executing program address |
data | string | base58 encoded |
accounts | string[] | instruction accounts |
status | integer | |
err | string | null |
solana.token_transfers
| Field | Type | Notes |
|---|---|---|
id | string | |
token_mint_address | string | mint address |
from_token_account | string | source token account |
to_token_account | string | dest token account |
amount | number | raw amount |
decimals | integer | token decimals |
block_slot | integer | |
block_timestamp | timestamp | |
signature | string | tx signature |
solana.native_balances
| Field | Type | Notes |
|---|---|---|
id | string | |
block_slot | integer | slot number |
block_hash | string | |
block_timestamp | timestamp | |
tx_index | integer | transaction position in block |
signature | string | transaction signature |
account | string | account pubkey |
amount_before | integer | lamports |
amount_after | integer | lamports |
_gs_op | string | Goldsky internal operation type |
solana.blocks
| Field | Type | Notes |
|---|---|---|
id | string | |
slot | integer | |
parent_slot | integer | |
hash | string | |
timestamp | timestamp | |
height | integer | |
previous_block_hash | string | |
transaction_count | integer | |
leader | string | validator pubkey |
leader_reward | integer | lamports |
skipped | boolean |
solana.rewards
| Field | Type | Notes |
|---|---|---|
id | string | |
block_slot | integer | |
block_hash | string | |
block_timestamp | timestamp | |
pub_key | string | validator pubkey |
lamports | integer | reward amount |
post_balance | integer | balance after reward |
reward_type | string | |
commission | integer |
solana.token_balances
Schema not fully documented — do not guess field names. Inspect with
goldsky dataset list | grep solana.token_balances.
EVM Chains
<chain>.raw_logs / <chain>.logs
| Field | Type | Notes |
|---|---|---|
id | string | |
block_number | integer | |
block_hash | string | |
transaction_hash | string | |
transaction_index | integer | |
log_index | integer | |
address | string | contract address (lowercase) |
data | string | hex encoded event data |
topics | string | comma-separated hex topic hashes |
block_timestamp | integer | unix timestamp |
topicsis a comma-separated string, not an array. Topic 0 is the event signature hash.
<chain>.raw_transactions
| Field | Type | Notes |
|---|---|---|
id | string | |
hash | string | |
nonce | integer | |
block_hash | string | |
block_number | integer | |
transaction_index | integer | |
from_address | string | |
to_address | string | |
value | decimal | ETH value in wei |
gas | decimal | |
gas_price | decimal | |
input | string | hex calldata |
transaction_type | integer | |
block_timestamp | integer | unix timestamp |
receipt_gas_used | decimal | |
receipt_contract_address | string | null | if contract creation |
receipt_status | integer | 1 = success |
receipt_effective_gas_price | decimal |
L2 chains also include:
receipt_l1_fee,receipt_l1_gas_used,receipt_l1_gas_price,receipt_l1_fee_scalar
<chain>.blocks
| Field | Type | Notes |
|---|---|---|
id | string | |
number | integer | block number |
hash | string | |
parent_hash | string | |
miner | string | |
gas_limit | integer | |
gas_used | integer | |
timestamp | integer | unix timestamp |
transaction_count | integer | |
base_fee_per_gas | integer | |
difficulty | double |
<chain>.erc20_transfers
| Field | Type | Notes |
|---|---|---|
id | string | |
sender | string | from address |
recipient | string | to address |
amount | decimal | token amount |
address | string | token contract address |
block_number | integer | |
block_timestamp | integer | unix timestamp |
block_hash | string | |
transaction_hash | string | |
transaction_index | integer | |
log_index | integer |
<chain>.erc721_transfers
| Field | Type | Notes |
|---|---|---|
id | string | |
from_address | string | |
to_address | string | |
token_id | decimal | |
address | string | NFT contract address |
block_number | integer | |
block_timestamp | integer | unix timestamp |
block_hash | string | |
transaction_hash | string | |
transaction_index | integer | |
log_index | integer |
Dataset Name Format
All datasets follow the pattern: <chain_prefix>.<dataset_type>
Examples:
ethereum.erc20_transfers- ERC-20 transfers on Ethereum mainnetbase.logs- All event logs on Basematic.blocks- Block data on Polygonsolana.token_transfers- SPL token transfers on Solana
Finding Dataset Versions
Datasets are versioned. To find available versions:
goldsky dataset list | grep "base.erc20"
Common versions:
1.0.0- Initial version1.2.0- Enhanced schema (common for ERC-20 transfers)
When in doubt, use the latest version shown in goldsky dataset list.
Common Discovery Patterns
"I want to track USDC transfers on Base"
- Dataset:
base.erc20_transfers - Filter by contract address in your pipeline transform:
transforms:
usdc_only:
type: sql
primary_key: id
sql: |
SELECT * FROM source_name
WHERE address = lower('0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913')
"I want all NFT activity on Ethereum"
Dataset: ethereum.erc721_transfers
"I want to monitor a specific smart contract"
- Dataset:
<chain>.logsfor raw events, or<chain>.decoded_logsfor decoded events - Filter by contract address in your transform
"I need multi-chain data"
Use multiple sources in your pipeline:
sources:
eth_transfers:
type: dataset
dataset_name: ethereum.erc20_transfers
version: 1.0.0
start_at: latest
base_transfers:
type: dataset
dataset_name: base.erc20_transfers
version: 1.2.0
start_at: latest
Troubleshooting
Dataset not found
Error: Source 'my_source' references unknown dataset 'invalid.dataset'
Fix:
- Check the chain prefix is correct (e.g.,
maticnotpolygon) - Check the dataset type exists (e.g.,
erc20_transfersnoterc20) - Run
goldsky dataset listto see all available options
Chain not listed
If you can't find a chain in the tables above:
goldsky dataset list | grep -i "<chain_name>"
Some chains use non-obvious prefixes (e.g., Polygon uses matic).
Version mismatch
Error: Version '2.0.0' not found for dataset 'base.erc20_transfers'
Fix: Check available versions:
goldsky dataset list | grep "base.erc20_transfers"
Use a version that exists in the output.
Related
/turbo-builder— Interactive wizard to build pipelines using these datasets