Introducing an MCP Server for ORION-DBs

Navigating Open Research Information Resources on BigQuery with LLMs

mcp
news
tools
ai
Author

Najko Jahn

Published

April 7, 2026

Abstract
ORION-DBs is a collection of open research information resources hosted on BigQuery. Their heterogeneous schemas make exploration and querying difficult. When properly guardrailed, LLMs can be genuinely useful for this kind of work, because they are reasonably good at writing SQL once they have enough context about the data. This post presents orion-mcp, an experiemental MCP server that lets LLMs like Claude explore schemas, draft SQL, estimate costs, and run queries, providing a practical entry point for users less familiar with open research information resources and BigQuery.

Introduction

The ORION-DBs community maintains open research information resources such as OpenAlex, Crossref, ORCID, DataCite, and more on Google BigQuery. Its collections currently span six different BigQuery projects, 52 datasets with various tables, and over 15 TB of data. However, the providers, including Sesame Open Science, MultiObs Campinas, CWTS, Digital Science and the SUB Göttingen, use different database schemas and pre-processing routines, which can make it hard to navigate the resources and retrieve data from them.

To broaden use of ORION-DBs, the experimental orion-mcp gives language models guided and database-aware access to ORION-DBs through AI apps like Claude Desktop. orion-mcp follows a growing number of Model Context Protocol (MCP) servers combining scholarly resources with language models, many of them offered by commercial vendors and publishers. It provides the language model enough context to write SQL that is executed on BigQuery rather than in the conversation, which reduces token use and keeps data retrieval controllable. BigQuery handles large datasets well and only returns results to the language model, typically at low cost. This allows users to discuss and refine retrieval strategies based on open data in natural language.

This post walks through a use case navigating open research information on BigQuery with orion-mcp. The tool is at an early stage, so feedback is welcome.

What it does

Explore schemas

These tools work without a BigQuery account, using pre-fetched schemas the ORION-DBs website when orion-mcp is started:

  • orion_list_datasets — list all available ORION-DBs datasets
  • orion_list_tables — list tables in a specific dataset
  • orion_get_db_schema — inspect the full schema of a table

Query BigQuery

Once you know what you want to query, the LLM writes and executes SQL. To avoid surprise costs, a dry-run cost estimate is always shown before any query runs. SELECT * queries are blocked to prevent unnecessary large scans.

  • orion_estimate_query_cost — estimate bytes scanned and cost before running
  • orion_run_bq_query — execute the confirmed query

Use case

This screencast demonstrates a typical session.

First, I ask whether OpenAlex is available and which version is the most recent. Then, I ask Claude to compare the version provided by MultiObs with the version provided by SUB Göttingen. Having gained this overview, I ask Claude to retrieve the number of diamond open access articles from first authors from Germany between 2021 and 2025. Throughout, Claude provides me with the estimated query costs and presents the SQL for the queries.

You may wish to be more explicit about how the results are presented. Often, a dynamic chart is unnecessary.

Installation

Full instructions are in the GitHub repo README.

In summary, the server runs in a Docker container connected to Claude Desktop via its MCP config file. Authentication uses Google’s Application Default Credentials, so your local gcloud credentials are used directly. No service account keys needed. A Google Cloud account includes 1 TB of free queries per month.

Requirements: Docker and the Google Cloud CLI (gcloud). The server is implemented in R using {mcptools} and {ellmer}.

Responsible use

LLMs make mistakes. Always verify that queries return the results you intended before using them in any analysis. If you plan to use this in a publication, check the outlet’s policy on AI-assisted work and document your process accordingly. Please acknowledge resources used.

Reuse

Citation

BibTeX citation:
@online{jahn2026,
  author = {Jahn, Najko},
  title = {Introducing an {MCP} {Server} for {ORION-DBs}},
  date = {2026-04-07},
  url = {https://orion-dbs.community/blog/posts/orion-mcp-welcome/},
  langid = {en},
  abstract = {ORION-DBs is a collection of open research information
    resources hosted on BigQuery. Their heterogeneous schemas make
    exploration and querying difficult. When properly guardrailed, LLMs
    can be genuinely useful for this kind of work, because they are
    reasonably good at writing SQL once they have enough context about
    the data. This post presents `orion-mcp`, an experiemental MCP
    server that lets LLMs like Claude explore schemas, draft SQL,
    estimate costs, and run queries, providing a practical entry point
    for users less familiar with open research information resources and
    BigQuery.}
}
For attribution, please cite this work as:
Jahn, Najko. 2026. “Introducing an MCP Server for ORION-DBs.” ORION-DBs Blog, April 7. https://orion-dbs.community/blog/posts/orion-mcp-welcome/.