Skip to main content

EuMINe DataBridge Hackathon 2026

Federating AI, Data & Communities for Accelerated Materials Discovery

Stage 1: Open May 11 | Deadline June 22

Stage 2: @ EuMINe GM Cluj-Napoca, July 2026


The Challenge

Predict materials properties from heterogeneous multi-database data.
Integrate Materials Project, JARVIS, AFLOW and more.

The Federation

Not just the best model: the best SHAREABLE model.
Your solution must implement the MatFed API v1 standard.

The Community

The top teams will present IN PERSON at the EuMINe General Meeting
in Cluj-Napoca. Travel & accommodation covered.


Timeline

May 11: Registration opens & data release
May 26: Registration closes
June 22: Submission deadline (23:59 CEST)
July 7: Finalists announced
July 2026: Stage 2 @ EuMINe GM, Cluj-Napoca

The Challenge

The Scientific Problem

Materials data is everywhere. But it is fragmented!

Millions of computed and experimental entries exist across public databases such as
Materials Project, AFLOW, NOMAD, OQMD, and JARVIS-DFT. Yet each database uses different
computational settings, different exchange-correlation functionals, different convergence
criteria. The same compound can have a band gap of 0.5 eV in one database and 2.1 eV in
another. Models trained on a single source routinely fail when applied outside it.

This is the problem the EuMINe DataBridge Hackathon asks you to tackle.

Your task: predict two key properties of inorganic crystalline materials

Formation energy per atom (eV/atom): a measure of thermodynamic stability, indicating
whether a compound is likely to form and persist.

Electronic band gap (eV): the energy separation between valence and conduction bands,
governing a material’s electronic and optical behaviour. It is notoriously difficult to
predict accurately with standard DFT.

These two properties are available across multiple databases, are physically interrelated,
and matter for real applications: energy storage, photovoltaics, semiconductor design,
catalysis.

The Bridge Dataset

We provide the EuMINe Bridge Dataset: ~1000 inorganic materials for which both Materials
Project and JARVIS-DFT have computed entries. For each material, both sets of values are
included, exposing the discrepancies directly. Your data integration strategy is part of
the challenge.

You must use at least two of the following open sources in your solution:

DatabaseTypeCoverage
Materials ProjectDFT (PBE)~150,000 entries
AFLOWDFT (various)~3.5 million entries
NOMADDFT + experimentDiverse methodologies
OQMDDFT (PBE)~1 million entries
JARVIS-DFTDFT (OptB88vdW)Strong 2D/vdW coverage

A held-out test set (150 structures, no labels) is provided for final evaluation.
Its ground-truth labels are never released and are used exclusively for scoring.

A baseline model: a random forest trained on Materials Project data using MAGPIE
featurization, is provided as a reference. Your model must outperform it on at least one
property to qualify for Stage 2.

Handling the inter-database discrepancies well (not ignoring them!) is what separates
good submissions from great ones!

MatFed API v1

This hackathon is not only about building the best model. It is about building a model
that can work together with other models.

Every submission must implement MatFed API v1, a lightweight standardised Python
interface. Think of it as a minimal contract: any model that
implements it can be loaded, queried, and composed with any other compliant model, regardless of its internal architecture.

Here is what the interface requires:


class MatFedPredictor:

def load_model(self, model_path: str) -> None:
    """Load model weights or parameters from disk."""
    ...

def predict(self, structures: List[Structure]) -> List[Dict]:
    """
    Given a list of pymatgen Structure objects, return a list of dicts,
    each containing at minimum:
      - "formation_energy_per_atom": float  (eV/atom)
      - "band_gap": float                   (eV)
      - "model_id": str
      - "data_sources_used": list of str
    Uncertainty estimates are optional but encouraged.
    """
    ...

def describe(self) -> Dict:
    """Return model metadata: team name, model type, data sources, API version."""

After the submission deadline, the organizers will run a community federation experiment: all qualifying models will be ensembled, and the combined model will be evaluated on the test set. Teams whose model contributes positively to the community ensemble receive additional points in the Federation Readiness criterion.

This is not a formality. It is the core idea of the hackathon: in a real European research network, groups work with different data and different tools. The value of a model is not just its accuracy in isolation, it is whether others can build on it.

→ Get started: fork the GitHub repository and use the MatFed API template
https://github.com/EuMINe-COST/eumine_hackaton_2026

The template includes the abstract base class, a JSON schema validator, a working example implementation, and a compliance test suite (pytest tests/). Your submission must pass all tests.

What to submit

All submissions consist of four mandatory deliverables plus one optional.


1 — Prediction Model (mandatory)

A Python package (or Jupyter notebook) implementing MatFed API v1 and producing predictions for both properties given crystal structures in CIF format. Must include requirements.txt and a README with step-by-step reproduction instructions. Must run on CPU (GPU allowed but not required).


2 — Data Integration Report (mandatory, max 4 pages)

Using the provided LaTeX template, describe:

  • Which databases you used and why
  • How you handled duplicates, biases, and unit inconsistencies
  • A data card table: source → entries used → properties → preprocessing

Reviewed by the evaluation committee. Counts for 25 points.


3 — Technical Report (mandatory, max 4 pages)

Same LaTeX template. Describe:

  • Model architecture and reasoning
  • Training procedure and key hyperparameter choices
  • Validation results, ideally broken down by material class
  • A reflection on federation readiness

4 — GitHub Repository (mandatory)

A public repository containing all code, both report PDFs, an open-source LICENSE, and a CITATION.cff file. This is the single reproducible source of your submission.


5 — Community Contribution (optional — bonus points)

Any one of the following:

  • A cleaned and harmonised sub-dataset published on Zenodo with a DOI
  • A reusable API connector for one of the supported databases
  • A visualisation tool to help others understand the Bridge Dataset discrepancies

How to submit:

  1. Push your final code to your public GitHub repository
  2. Open a Pull Request on the hackathon repo adding your predictions file:
    • github.com/EuMINe-COST/eumine_hackaton_2026
    • Path: submissions/YourTeamName/predictions_test.json
    • Full instructions: submissions/README.md in the repository.
  3. Fill in the registration form with your team details and GitHub repo URL:
    • https://forms.gle/LGLKhqExksLCUUyWA

Maximum 5 PRs per team. The last valid merged submission is scored.

Resources

Datasets & Templates (Google Drive)
https://drive.google.com/drive/folders/1YlRLQeH-ypT_N4yW-b5hKaRvlGZ7tDiX

GitHub Repository (code, baseline, MatFed template, submission instructions)
https://github.com/EuMINe-COST/eumine_hackaton_2026

Leaderboard
https://github.com/EuMINe-COST/eumine_hackaton_2026/blob/main/LEADERBOARD.md

Registration / Submission Form
https://forms.gle/LGLKhqExksLCUUyWA

Registered Teams

TBA

Who can participate?

Researchers from EuMINe-affiliated institutions. Teams must have 2–5 members
from at least 2 different institutions.

Do I need a Google account to access the data?

No. The Google Drive folder is publicly accessible with the link, no login required.

Can I be on multiple teams?

No. Each participant can be a member of exactly one team.

What compute resources do I need?

A standard laptop is sufficient. The baseline model runs on CPU in minutes.
You may use cloud resources, but GPU is not required.

How is Stage 2 structured?

Finalist teams present their work in person at the EuMINe General Meeting in Cluj-Napoca (July 2026). In the afternoon, teams collaborate to build a federated ensemble using MatFed API. Travel and accommodation are covered for up to 3 representatives per team.

Contacts

Email: euminecost@gmail.com

Slack: https://eumineeuropea-gkc8254.slack.com/archives/C0AVD05KJCD

GitHub Q&A: https://github.com/EuMINe-COST/eumine_hackaton_2026/discussions