openalex-paper-search

Academic Paper Search (OpenAlex)

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "openalex-paper-search" with this command: npx skills add kortix-ai/kortix-registry/kortix-ai-kortix-registry-openalex-paper-search

Academic Paper Search (OpenAlex)

Search 240M+ scholarly works using the OpenAlex API -- completely free, no API key required, no SDK needed. Just curl or bash with URL construction.

Full docs: https://docs.openalex.org

Quick Start

OpenAlex is a REST API. You query it by constructing URLs and fetching them with curl . All responses are JSON.

Search for papers about "transformer architecture"

curl -s "https://api.openalex.org/works?search=transformer+architecture&per_page=5&mailto=agent@kortix.ai" | python3 -m json.tool

Important: Always include mailto=agent@kortix.ai (or any valid email) in every request. Without it, you're limited to 1 request/second. With it, you get 10 requests/second (the "polite pool").

Core Concepts

Entities

OpenAlex has these entity types (all queryable):

Entity Endpoint Count Description

Works /works

240M+ Papers, articles, books, datasets, theses

Authors /authors

90M+ People who create works

Sources /sources

250K+ Journals, repositories, conferences

Institutions /institutions

110K+ Universities, research orgs

Topics /topics

4K+ Research topics (hierarchical)

Work Object -- Key Fields

When you fetch a work, these are the most useful fields:

id OpenAlex ID (e.g., "https://openalex.org/W2741809807") doi DOI URL title / display_name Paper title publication_year Year published publication_date Full date (YYYY-MM-DD) cited_by_count Number of incoming citations fwci Field-Weighted Citation Impact (normalized) type article, preprint, review, book, dataset, etc. language ISO 639-1 code (e.g., "en") is_retracted Boolean open_access.is_oa Boolean -- is it freely accessible? open_access.oa_url Direct URL to free version authorships List of authors with names, institutions, ORCIDs abstract_inverted_index Abstract as inverted index (needs reconstruction) referenced_works List of OpenAlex IDs this work cites (outgoing) related_works Algorithmically related works cited_by_api_url API URL to get works that cite this one (incoming) topics Assigned research topics with scores keywords Extracted keywords with scores primary_location Where the work is published (journal, repo) best_oa_location Best open access location with PDF link

Reconstructing Abstracts

OpenAlex stores abstracts as inverted indexes for legal reasons. To get plaintext, reconstruct:

import json, sys

Read the abstract_inverted_index from a work object

inv_idx = work["abstract_inverted_index"] if inv_idx: words = [""] * (max(max(positions) for positions in inv_idx.values()) + 1) for word, positions in inv_idx.items(): for pos in positions: words[pos] = word abstract = " ".join(words)

Or in bash with python3 -c :

Pipe a work JSON into this to extract the abstract

echo "$WORK_JSON" | python3 -c " import json,sys w=json.load(sys.stdin) idx=w.get('abstract_inverted_index',{}) if idx: words=['']*( max(max(p) for p in idx.values())+1 ) for word,positions in idx.items(): for pos in positions: words[pos]=word print(' '.join(words)) "

Searching for Papers

Basic Keyword Search

Searches across titles, abstracts, and fulltext. Uses stemming and stop-word removal.

Simple search

curl -s "https://api.openalex.org/works?search=large+language+models&mailto=agent@kortix.ai"

With per_page limit

curl -s "https://api.openalex.org/works?search=CRISPR+gene+editing&per_page=10&mailto=agent@kortix.ai"

Boolean Search

Use uppercase AND , OR , NOT with parentheses and quoted phrases:

Complex boolean query

curl -s "https://api.openalex.org/works?search=(reinforcement+learning+AND+%22robot+control%22)+NOT+simulation&mailto=agent@kortix.ai"

Exact phrase match (use double quotes, URL-encoded as %22)

curl -s "https://api.openalex.org/works?search=%22attention+is+all+you+need%22&mailto=agent@kortix.ai"

Search Specific Fields

Title only

curl -s "https://api.openalex.org/works?filter=title.search:transformer&mailto=agent@kortix.ai"

Abstract only

curl -s "https://api.openalex.org/works?filter=abstract.search:protein+folding&mailto=agent@kortix.ai"

Title and abstract combined

curl -s "https://api.openalex.org/works?filter=title_and_abstract.search:neural+scaling+laws&mailto=agent@kortix.ai"

Fulltext search (subset of works)

curl -s "https://api.openalex.org/works?filter=fulltext.search:climate+tipping+points&mailto=agent@kortix.ai"

Filtering

Filters are the most powerful feature. Combine them with commas (AND) or pipes (OR).

Most Useful Filters

By publication year

?filter=publication_year:2024 ?filter=publication_year:2020-2024 ?filter=publication_year:>2022

By citation count

?filter=cited_by_count:>100 # highly cited ?filter=cited_by_count:>1000 # landmark papers

By open access

?filter=is_oa:true # only open access ?filter=oa_status:gold # gold OA only

By type

?filter=type:article # journal articles ?filter=type:preprint # preprints ?filter=type:review # review articles

By language

?filter=language:en # English only

Not retracted

?filter=is_retracted:false

Has abstract

?filter=has_abstract:true

Has downloadable PDF

?filter=has_content.pdf:true

By author (OpenAlex ID)

?filter=author.id:A5023888391

By institution (OpenAlex ID)

?filter=institutions.id:I27837315 # e.g., University of Michigan

By DOI

?filter=doi:https://doi.org/10.1038/s41586-021-03819-2

By indexed source

?filter=indexed_in:arxiv # arXiv papers ?filter=indexed_in:pubmed # PubMed papers ?filter=indexed_in:crossref # Crossref papers

Combining Filters

AND: comma-separated

?filter=publication_year:>2022,cited_by_count:>50,is_oa:true,type:article

OR: pipe-separated within a filter

?filter=publication_year:2023|2024

NOT: prefix with !

?filter=type:!preprint

Combined example: highly-cited OA articles from 2023-2024, not preprints

curl -s "https://api.openalex.org/works?filter=publication_year:2023-2024,cited_by_count:>50,is_oa:true,type:!preprint&search=machine+learning&per_page=10&mailto=agent@kortix.ai"

Sorting

Most cited first

?sort=cited_by_count:desc

Most recent first

?sort=publication_date:desc

Most relevant first (only when using search)

?sort=relevance_score:desc

Multiple sort keys

?sort=publication_year:desc,cited_by_count:desc

Pagination

Two modes: basic paging (for browsing) and cursor paging (for collecting all results).

Basic paging (limited to 10,000 results)

?page=1&per_page=25 ?page=2&per_page=25

Cursor paging (unlimited, for collecting everything)

?per_page=100&cursor=* # first page ?per_page=100&cursor=IlsxNjk0ODc... # next page (cursor from previous response meta)

The cursor for the next page is in response.meta.next_cursor . When it's null , you've reached the end.

Select Fields

Reduce response size by selecting only the fields you need:

Only get IDs, titles, citation counts, and DOIs

?select=id,display_name,cited_by_count,doi,publication_year

Minimal metadata for scanning

?select=id,display_name,publication_year,cited_by_count,open_access

Citation Graph Traversal

Find what a paper cites (outgoing references)

Get works cited BY a specific paper

curl -s "https://api.openalex.org/works?filter=cited_by:W2741809807&per_page=25&mailto=agent@kortix.ai"

Find what cites a paper (incoming citations)

Get works that CITE a specific paper

curl -s "https://api.openalex.org/works?filter=cites:W2741809807&sort=cited_by_count:desc&per_page=25&mailto=agent@kortix.ai"

Find related works

Get related works (algorithmic, based on shared concepts)

curl -s "https://api.openalex.org/works?filter=related_to:W2741809807&per_page=25&mailto=agent@kortix.ai"

Citation chain: follow the references

  • Get a seminal paper by DOI

  • Find its referenced_works (what it cites)

  • Find who cites it (filter=cites:WORK_ID )

  • For the most cited citers, repeat

This is how you build a literature graph around a topic.

Author Lookup

Search for an author

curl -s "https://api.openalex.org/authors?search=Yann+LeCun&mailto=agent@kortix.ai"

Get an author's works (by OpenAlex author ID)

curl -s "https://api.openalex.org/works?filter=author.id:A5064850633&sort=cited_by_count:desc&per_page=10&mailto=agent@kortix.ai"

Get an author by ORCID

curl -s "https://api.openalex.org/authors/orcid:0000-0001-6187-6610?mailto=agent@kortix.ai"

Lookup by External ID

By DOI

curl -s "https://api.openalex.org/works/doi:10.1038/s41586-021-03819-2?mailto=agent@kortix.ai"

By PubMed ID

curl -s "https://api.openalex.org/works/pmid:14907713?mailto=agent@kortix.ai"

By arXiv ID (via DOI)

curl -s "https://api.openalex.org/works/doi:10.48550/arXiv.2303.08774?mailto=agent@kortix.ai"

Batch lookup: up to 50 IDs at once

curl -s "https://api.openalex.org/works?filter=doi:https://doi.org/10.1234/a|https://doi.org/10.1234/b|https://doi.org/10.1234/c&mailto=agent@kortix.ai"

Open Access & PDF Access

Find OA papers with direct PDF links

curl -s "https://api.openalex.org/works?search=quantum+computing&filter=is_oa:true,has_content.pdf:true&select=id,display_name,open_access,best_oa_location&per_page=5&mailto=agent@kortix.ai"

The best_oa_location.pdf_url field gives a direct PDF link when available. The open_access.oa_url gives the best available OA landing page or PDF.

Practical Workflows

Literature Survey on a Topic

1. Find the most-cited papers on a topic

curl -s "https://api.openalex.org/works?search=retrieval+augmented+generation&sort=cited_by_count:desc&filter=publication_year:>2020,type:article,has_abstract:true&per_page=20&select=id,display_name,publication_year,cited_by_count,doi,authorships,abstract_inverted_index&mailto=agent@kortix.ai"

2. For the top papers, explore their citation graphs

curl -s "https://api.openalex.org/works?filter=cites:W4285719527&sort=cited_by_count:desc&per_page=10&select=id,display_name,publication_year,cited_by_count,doi&mailto=agent@kortix.ai"

3. Find recent papers building on this work

curl -s "https://api.openalex.org/works?filter=cites:W4285719527,publication_year:>2023&sort=publication_date:desc&per_page=10&mailto=agent@kortix.ai"

Find Landmark/Seminal Papers

Highly cited + search term

curl -s "https://api.openalex.org/works?search=attention+mechanism+neural+networks&filter=cited_by_count:>500,type:article&sort=cited_by_count:desc&per_page=10&select=id,display_name,publication_year,cited_by_count,doi&mailto=agent@kortix.ai"

Find Recent Preprints

Latest preprints on a topic

curl -s "https://api.openalex.org/works?search=multimodal+large+language+models&filter=type:preprint,publication_year:2025&sort=publication_date:desc&per_page=15&mailto=agent@kortix.ai"

Find Review Articles

Review/survey papers on a topic

curl -s "https://api.openalex.org/works?search=federated+learning&filter=type:review,cited_by_count:>20&sort=cited_by_count:desc&per_page=10&mailto=agent@kortix.ai"

Author Analysis

1. Find the author

curl -s "https://api.openalex.org/authors?search=Geoffrey+Hinton&select=id,display_name,works_count,cited_by_count,last_known_institutions&mailto=agent@kortix.ai"

2. Get their most influential papers

curl -s "https://api.openalex.org/works?filter=author.id:A5068082743&sort=cited_by_count:desc&per_page=10&select=id,display_name,publication_year,cited_by_count,doi&mailto=agent@kortix.ai"

3. Get their recent work

curl -s "https://api.openalex.org/works?filter=author.id:A5068082743,publication_year:>2023&sort=publication_date:desc&per_page=10&mailto=agent@kortix.ai"

Saving Results to Disk

When doing deep research, save paper data to disk for later processing:

Save search results as JSON

curl -s "https://api.openalex.org/works?search=topic&per_page=50&mailto=agent@kortix.ai" > research/papers/topic-search.json

Extract and save a clean summary

curl -s "https://api.openalex.org/works?search=topic&per_page=50&select=id,display_name,publication_year,cited_by_count,doi,authorships&mailto=agent@kortix.ai" | python3 -c " import json, sys data = json.load(sys.stdin) for w in data.get('results', []): authors = ', '.join(a['author']['display_name'] for a in w.get('authorships', [])[:3]) if len(w.get('authorships', [])) > 3: authors += ' et al.' print(f"[{w.get('cited_by_count',0)} cites] {w['display_name']} ({w.get('publication_year','?')}) - {authors}") if w.get('doi'): print(f" DOI: {w['doi']}") print() " > research/papers/topic-summary.txt

For deep research, save individual paper metadata to your sources-index.md and raw data to sources/ :

Save a paper's full metadata

curl -s "https://api.openalex.org/works/W2741809807?mailto=agent@kortix.ai" > research/sources/001-paper-title.json

Rate Limits

Pool Rate How to get it

Common 1 req/sec No email provided

Polite 10 req/sec Add mailto=your@email.com to requests

Premium Higher Paid API key via api_key param

Always use the polite pool. Add &mailto=agent@kortix.ai to every request.

Tips

  • Use select aggressively to reduce response size and speed up requests

  • Use per_page=100 (max) when collecting lots of results to minimize request count

  • Use cursor paging (cursor=* ) when you need more than 10,000 results

  • Batch DOI lookups with OR syntax: filter=doi:DOI1|DOI2|DOI3 (up to 50)

  • Reconstruct abstracts using the inverted index -- don't skip this, abstracts are gold

  • Follow citation chains to find seminal works and recent developments

  • Filter by has_abstract:true when you need abstracts (not all works have them)

  • Filter by indexed_in:arxiv or indexed_in:pubmed to target specific repositories

  • Sort by cited_by_count:desc to find the most influential papers first

  • Combine search + filters for precise results: search gives relevance, filters give precision

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Research

paper-creator

No summary provided by upstream source.

Repository SourceNeeds Review
Research

deep-research

No summary provided by upstream source.

Repository SourceNeeds Review
Research

domain-research

No summary provided by upstream source.

Repository SourceNeeds Review
General

elevenlabs

No summary provided by upstream source.

Repository SourceNeeds Review