Open Research Information on BigQuery

Sharing the load: Building an open research information collective

A growing, daily updated collection of publicly available datasets on Google BigQuery. Independent groups have come together to share the load of storage, preprocessing, and documentation, making open research information combinable and actionable at scale.

Get Started

  1. Browse Collections – Explore the available datasets from our contributing projects
  2. Contribute – Want to share your data? Check our contributing guide

First steps

Note

To query datasets, you need a Google Cloud project. You can get started with the BigQuery sandbox, which requires no billing account and includes 1 TB of free queries per month.

This example joins data from two different collections – article metadata associated with transformative agreements as provided by SUB Göttingen and a Crossref truth table from Sesame Open Science – to check how many articles from the DEAL Wiley agreement (2019–23) have funder metadata in Crossref.

-- Articles from the DEAL Wiley agreement (2019-23) with funder metadata in Crossref
SELECT cr_truth.has_funders, COUNT(DISTINCT jct_articles.doi) AS articles
FROM `subugoe-collaborative.openbib.jct_articles` AS jct_articles
LEFT JOIN sos-datasources.truthtables.crossref_truthtable_20260131 AS cr_truth
  ON lower(jct_articles.doi) = lower(cr_truth.doi)
WHERE jct_articles.esac_id = "wiley2019deal"
GROUP BY has_funders
library(bigrquery)
library(DBI)

# Replace `your-gcp-project` in the examples below with your own project ID.
my_con <- dbConnect(bigquery(), project = "your-gcp-project")

my_sql <- "
SELECT cr_truth.has_funders, COUNT(DISTINCT jct_articles.doi) AS articles
FROM `subugoe-collaborative.openbib.jct_articles` AS jct_articles
LEFT JOIN sos-datasources.truthtables.crossref_truthtable_20260131 AS cr_truth
  ON lower(jct_articles.doi) = lower(cr_truth.doi)
WHERE jct_articles.esac_id = 'wiley2019deal'
GROUP BY has_funders
"

my_df <- dbGetQuery(my_con, my_sql)
from google.cloud import bigquery

# Replace `your-gcp-project` in the examples below with your own project ID.
client = bigquery.Client(project="your-gcp-project")

sql = """
SELECT cr_truth.has_funders, COUNT(DISTINCT jct_articles.doi) AS articles
FROM `subugoe-collaborative.openbib.jct_articles` AS jct_articles
LEFT JOIN sos-datasources.truthtables.crossref_truthtable_20260131 AS cr_truth
  ON lower(jct_articles.doi) = lower(cr_truth.doi)
WHERE jct_articles.esac_id = 'wiley2019deal'
GROUP BY has_funders
"""

result = client.query(sql).to_dataframe()