datasets

Use this skill when the user needs to look up or verify Goldsky blockchain dataset names, chain prefixes, dataset types, or versions. Triggers on questions like 'what\'s the dataset name for X?', 'what prefix does Goldsky use for chain Y?', 'what version should I use for Z?', or 'what datasets are available for Solana/Stellar/Arbitrum/etc?'. Also use for chain-specific dataset questions (e.g., polygon vs matic prefix, stellarnet balance datasets, solana token transfer dataset names). Do NOT trigger for questions about CLI commands, pipeline setup, or general Goldsky architecture unless the core question is about finding the right dataset name or chain prefix.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "datasets" with this command: npx skills add goldsky-io/goldsky-agent/goldsky-io-goldsky-agent-datasets

Goldsky Dataset Reference

Reference tables for blockchain datasets available in Turbo pipelines.

For quick dataset questions (e.g., "what dataset for Solana transfers?"), answer directly: identify the chain prefix (see Popular Chain Prefixes below), identify the dataset type (see Common Datasets), and return a YAML snippet like:

sources:
  my_source:
    type: dataset
    dataset_name: <chain>.<dataset>
    version: 1.0.0
    start_at: earliest

Tip: Use goldsky turbo validate to verify a dataset exists (fast, ~3 seconds). Avoid goldsky dataset list which is slow (30-60+ seconds).


Dataset Reference Files

Detailed dataset and chain information is in the data/ folder.

FileContents
verified-datasets.jsonAll validated datasets with versions, schemas, and use cases
chain-prefixes.jsonAll chain prefixes, chain IDs, and common mistakes

Data location: data/ (relative to this skill's directory)


Quick Reference

ActionCommandNotes
Validate datasetgoldsky turbo validate file.yamlPreferred - fast (3s)
Search for datasetgoldsky dataset list | grep "name"Slow (30-60s), use sparingly
List all datasetsgoldsky dataset listVery slow - avoid

Common Datasets

What You NeedDatasetExample
Token transfers (ERC-20)<chain>.erc20_transfersbase.erc20_transfers (v1.2.0)
NFT transfers (ERC-721)<chain>.erc721_transfersethereum.erc721_transfers (v1.0.0)
Transactions<chain>.raw_transactionsethereum.raw_transactions (v1.0.0)
Event logs<chain>.raw_logsbase.raw_logs (v1.0.0)
Solana tokenssolana.token_transfersv1.0.0
Bitcoin transactionsbitcoin.raw.transactionsv1.0.0
Stellar transfersstellar_mainnet.transfersv1.1.0

Important: Use raw_transactions, NOT transactions


Popular Chain Prefixes

ChainPrefixNote
Ethereumethereum
Basebase
PolygonmaticNOT polygon
Arbitrumarbitrum
Optimismoptimism
BSCbsc
Avalancheavalanche
SolanasolanaUses start_block not start_at
Bitcoinbitcoin.rawUses start_at like EVM
Stellarstellar_mainnetUses start_at like EVM
SuisuiUses start_at like EVM
NEARnearUses start_at like EVM
StarknetstarknetUses start_at like EVM
FogofogoUses start_at like EVM

See data/chain-prefixes.json for complete list with chain IDs.


Common Dataset Types

EVM Chains

Dataset TypeDescriptionUse Case
blocksBlock headers with metadataBlock explorers, timing analysis
raw_transactionsTransaction dataWallet activity, gas analysis
raw_logsRaw event logsCustom event filtering
raw_tracesInternal transaction tracesMEV analysis, contract interactions
erc20_transfersFungible token transfersToken tracking, DeFi analytics
erc721_transfersNFT transfersNFT marketplaces, ownership tracking
erc1155_transfersMulti-token transfersGaming, multi-token standards
decoded_logsABI-decoded event logsSpecific contract events

Important: Use raw_transactions, NOT transactions. Use raw_logs, NOT logs (though logs works as an alias on some chains).

Solana

Dataset TypeDescriptionUse Case
blocksBlock data with leader infoChain analysis
transactionsTransaction data with balancesWallet activity
transactions_with_instructionsTransactions + nested instructionsMulti-instruction analysis
instructionsIndividual instructionsProgram-specific analysis
token_transfersSPL token transfersToken tracking
native_balancesSOL balance changesWhale tracking
token_balancesSPL token balance changesPortfolio tracking
rewardsValidator rewardsStaking analysis

Bitcoin

Dataset TypeDescriptionUse Case
bitcoin.raw.blocksBlock data (hash, difficulty, size)Network analysis
bitcoin.raw.transactionsTransactions (inputs, outputs, values)Payment tracking

Stellar

All datasets use version 1.1.0:

Dataset TypeDescriptionUse Case
stellar_mainnet.transactionsAll network transactionsAccount monitoring
stellar_mainnet.transfersAll transfer eventsAsset tracking
stellar_mainnet.eventsAll events (contract + operation)Contract monitoring
stellar_mainnet.operationsOperations within transactionsAction tracking
stellar_mainnet.ledger_entriesLedger state changesState analysis
stellar_mainnet.ledgersLedger metadataNetwork analysis
stellar_mainnet.balancesAccount balance changesBalance tracking

Sui

Dataset TypeDescriptionUse Case
sui.checkpointsCheckpoint dataChain analysis
sui.transactionsTransaction dataActivity monitoring
sui.eventsMove contract eventsdApp event tracking
sui.packagesDeployed Move packagesPackage discovery
sui.epochsEpoch data with validatorsStaking/validator analysis

NEAR

Dataset TypeDescriptionUse Case
near.receiptsExecution receiptsContract interaction tracking
near.transactionsSigned transactionsActivity monitoring
near.execution_outcomesExecution resultsSuccess/failure analysis

Starknet

Dataset TypeDescriptionUse Case
starknet.blocksBlock dataChain analysis
starknet.transactionsTransaction dataActivity monitoring
starknet.eventsContract eventsdApp event tracking
starknet.messagesL1↔L2 messagesBridge monitoring

Fogo

Dataset TypeDescriptionUse Case
fogo.transactions_with_instructionsTransactions with instructionsFull activity tracking
fogo.rewardsValidator rewardsStaking analysis
fogo.blocksBlock dataChain analysis

Dataset Schemas

Source: docs.goldsky.com. Do not use field names not listed here — ask the user to run goldsky dataset list to inspect unknown schemas.

Solana

solana.transactions

FieldTypeNotes
idstring
indexintegertx position in block
block_slotintegerslot number
block_hashstring
block_timestamptimestamp
signaturestringtransaction signature
recent_block_hashstring
feeintegerin lamports
statusinteger1 = success
errstring | nullerror if failed
accountsstring[]all involved accounts
balance_changesobject[]{account, before, after} in lamports
log_messagesstring[]program execution logs
compute_units_consumedinteger

No from_address or to_address on Solana transactions — use accounts array instead.

solana.transactions_with_instructions

All fields from solana.transactions plus:

FieldTypeNotes
pre_token_balancesobject[]token balances before tx
post_token_balancesobject[]token balances after tx
instructionsobject[]see below

Instruction object fields: id, index, parent_index, block_slot, block_timestamp, block_hash, tx_fee, tx_index, program_id, data (base58), accounts (string[]), status, err

solana.instructions

FieldTypeNotes
idstring
indexintegerposition in tx
parent_indexinteger | nullfor inner instructions
block_slotinteger
block_timestamptimestamp
block_hashstring
program_idstringexecuting program address
datastringbase58 encoded
accountsstring[]instruction accounts
statusinteger
errstring | null

solana.token_transfers

FieldTypeNotes
idstring
token_mint_addressstringmint address
from_token_accountstringsource token account
to_token_accountstringdest token account
amountnumberraw amount
decimalsintegertoken decimals
block_slotinteger
block_timestamptimestamp
signaturestringtx signature

solana.native_balances

FieldTypeNotes
idstring
block_slotintegerslot number
block_hashstring
block_timestamptimestamp
tx_indexintegertransaction position in block
signaturestringtransaction signature
accountstringaccount pubkey
amount_beforeintegerlamports
amount_afterintegerlamports
_gs_opstringGoldsky internal operation type

solana.blocks

FieldTypeNotes
idstring
slotinteger
parent_slotinteger
hashstring
timestamptimestamp
heightinteger
previous_block_hashstring
transaction_countinteger
leaderstringvalidator pubkey
leader_rewardintegerlamports
skippedboolean

solana.rewards

FieldTypeNotes
idstring
block_slotinteger
block_hashstring
block_timestamptimestamp
pub_keystringvalidator pubkey
lamportsintegerreward amount
post_balanceintegerbalance after reward
reward_typestring
commissioninteger

solana.token_balances

Schema not fully documented — do not guess field names. Inspect with goldsky dataset list | grep solana.token_balances.


EVM Chains

<chain>.raw_logs / <chain>.logs

FieldTypeNotes
idstring
block_numberinteger
block_hashstring
transaction_hashstring
transaction_indexinteger
log_indexinteger
addressstringcontract address (lowercase)
datastringhex encoded event data
topicsstringcomma-separated hex topic hashes
block_timestampintegerunix timestamp

topics is a comma-separated string, not an array. Topic 0 is the event signature hash.

<chain>.raw_transactions

FieldTypeNotes
idstring
hashstring
nonceinteger
block_hashstring
block_numberinteger
transaction_indexinteger
from_addressstring
to_addressstring
valuedecimalETH value in wei
gasdecimal
gas_pricedecimal
inputstringhex calldata
transaction_typeinteger
block_timestampintegerunix timestamp
receipt_gas_useddecimal
receipt_contract_addressstring | nullif contract creation
receipt_statusinteger1 = success
receipt_effective_gas_pricedecimal

L2 chains also include: receipt_l1_fee, receipt_l1_gas_used, receipt_l1_gas_price, receipt_l1_fee_scalar

<chain>.blocks

FieldTypeNotes
idstring
numberintegerblock number
hashstring
parent_hashstring
minerstring
gas_limitinteger
gas_usedinteger
timestampintegerunix timestamp
transaction_countinteger
base_fee_per_gasinteger
difficultydouble

<chain>.erc20_transfers

FieldTypeNotes
idstring
senderstringfrom address
recipientstringto address
amountdecimaltoken amount
addressstringtoken contract address
block_numberinteger
block_timestampintegerunix timestamp
block_hashstring
transaction_hashstring
transaction_indexinteger
log_indexinteger

<chain>.erc721_transfers

FieldTypeNotes
idstring
from_addressstring
to_addressstring
token_iddecimal
addressstringNFT contract address
block_numberinteger
block_timestampintegerunix timestamp
block_hashstring
transaction_hashstring
transaction_indexinteger
log_indexinteger

Dataset Name Format

All datasets follow the pattern: <chain_prefix>.<dataset_type>

Examples:

  • ethereum.erc20_transfers - ERC-20 transfers on Ethereum mainnet
  • base.logs - All event logs on Base
  • matic.blocks - Block data on Polygon
  • solana.token_transfers - SPL token transfers on Solana

Finding Dataset Versions

Datasets are versioned. To find available versions:

goldsky dataset list | grep "base.erc20"

Common versions:

  • 1.0.0 - Initial version
  • 1.2.0 - Enhanced schema (common for ERC-20 transfers)

When in doubt, use the latest version shown in goldsky dataset list.


Common Discovery Patterns

"I want to track USDC transfers on Base"

  1. Dataset: base.erc20_transfers
  2. Filter by contract address in your pipeline transform:
transforms:
  usdc_only:
    type: sql
    primary_key: id
    sql: |
      SELECT * FROM source_name
      WHERE address = lower('0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913')

"I want all NFT activity on Ethereum"

Dataset: ethereum.erc721_transfers

"I want to monitor a specific smart contract"

  1. Dataset: <chain>.logs for raw events, or <chain>.decoded_logs for decoded events
  2. Filter by contract address in your transform

"I need multi-chain data"

Use multiple sources in your pipeline:

sources:
  eth_transfers:
    type: dataset
    dataset_name: ethereum.erc20_transfers
    version: 1.0.0
    start_at: latest
  base_transfers:
    type: dataset
    dataset_name: base.erc20_transfers
    version: 1.2.0
    start_at: latest

Troubleshooting

Dataset not found

Error: Source 'my_source' references unknown dataset 'invalid.dataset'

Fix:

  1. Check the chain prefix is correct (e.g., matic not polygon)
  2. Check the dataset type exists (e.g., erc20_transfers not erc20)
  3. Run goldsky dataset list to see all available options

Chain not listed

If you can't find a chain in the tables above:

goldsky dataset list | grep -i "<chain_name>"

Some chains use non-obvious prefixes (e.g., Polygon uses matic).

Version mismatch

Error: Version '2.0.0' not found for dataset 'base.erc20_transfers'

Fix: Check available versions:

goldsky dataset list | grep "base.erc20_transfers"

Use a version that exists in the output.


Related

  • /turbo-builder — Interactive wizard to build pipelines using these datasets

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

turbo-pipelines

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

turbo-monitor-debug

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

auth-setup

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

turbo-doctor

No summary provided by upstream source.

Repository SourceNeeds Review