Introducing an MCP Server for ORION-DBs
Navigating Open Research Information Resources on BigQuery with LLMs
orion-mcp, an experiemental MCP server that lets LLMs like Claude explore schemas, draft SQL, estimate costs, and run queries, providing a practical entry point for users less familiar with open research information resources and BigQuery.
Introduction
The ORION-DBs community maintains open research information resources such as OpenAlex, Crossref, ORCID, DataCite, and more on Google BigQuery. Its collections currently span six different BigQuery projects, 52 datasets with various tables, and over 15 TB of data. However, the providers, including Sesame Open Science, MultiObs Campinas, CWTS, Digital Science and the SUB Göttingen, use different database schemas and pre-processing routines, which can make it hard to navigate the resources and retrieve data from them.
To broaden use of ORION-DBs, the experimental orion-mcp gives language models guided and database-aware access to ORION-DBs through AI apps like Claude Desktop. orion-mcp follows a growing number of Model Context Protocol (MCP) servers combining scholarly resources with language models, many of them offered by commercial vendors and publishers. It provides the language model enough context to write SQL that is executed on BigQuery rather than in the conversation, which reduces token use and keeps data retrieval controllable. BigQuery handles large datasets well and only returns results to the language model, typically at low cost. This allows users to discuss and refine retrieval strategies based on open data in natural language.
This post walks through a use case navigating open research information on BigQuery with orion-mcp. The tool is at an early stage, so feedback is welcome.
What it does
Explore schemas
These tools work without a BigQuery account, using pre-fetched schemas the ORION-DBs website when orion-mcp is started:
orion_list_datasets— list all available ORION-DBs datasetsorion_list_tables— list tables in a specific datasetorion_get_db_schema— inspect the full schema of a table
Query BigQuery
Once you know what you want to query, the LLM writes and executes SQL. To avoid surprise costs, a dry-run cost estimate is always shown before any query runs. SELECT * queries are blocked to prevent unnecessary large scans.
orion_estimate_query_cost— estimate bytes scanned and cost before runningorion_run_bq_query— execute the confirmed query
Use case
This screencast demonstrates a typical session.
First, I ask whether OpenAlex is available and which version is the most recent. Then, I ask Claude to compare the version provided by MultiObs with the version provided by SUB Göttingen. Having gained this overview, I ask Claude to retrieve the number of diamond open access articles from first authors from Germany between 2021 and 2025. Throughout, Claude provides me with the estimated query costs and presents the SQL for the queries.
You may wish to be more explicit about how the results are presented. Often, a dynamic chart is unnecessary.
Installation
Full instructions are in the GitHub repo README.
In summary, the server runs in a Docker container connected to Claude Desktop via its MCP config file. Authentication uses Google’s Application Default Credentials, so your local gcloud credentials are used directly. No service account keys needed. A Google Cloud account includes 1 TB of free queries per month.
Requirements: Docker and the Google Cloud CLI (gcloud). The server is implemented in R using {mcptools} and {ellmer}.
Responsible use
LLMs make mistakes. Always verify that queries return the results you intended before using them in any analysis. If you plan to use this in a publication, check the outlet’s policy on AI-assisted work and document your process accordingly. Please acknowledge resources used.
Reuse
Citation
@online{jahn2026,
author = {Jahn, Najko},
title = {Introducing an {MCP} {Server} for {ORION-DBs}},
date = {2026-04-07},
url = {https://orion-dbs.community/blog/posts/orion-mcp-welcome/},
langid = {en},
abstract = {ORION-DBs is a collection of open research information
resources hosted on BigQuery. Their heterogeneous schemas make
exploration and querying difficult. When properly guardrailed, LLMs
can be genuinely useful for this kind of work, because they are
reasonably good at writing SQL once they have enough context about
the data. This post presents `orion-mcp`, an experiemental MCP
server that lets LLMs like Claude explore schemas, draft SQL,
estimate costs, and run queries, providing a practical entry point
for users less familiar with open research information resources and
BigQuery.}
}