EuMINe DataBridge Hackathon 2026
Federating AI, Data & Communities for Accelerated Materials Discovery
Stage 1: Open May 11 | Deadline June 22
Stage 2: @ EuMINe GM Cluj-Napoca, July 2026
The Challenge
Predict materials properties from heterogeneous multi-database data.
Integrate Materials Project, JARVIS, AFLOW and more.
The Federation
Not just the best model: the best SHAREABLE model.
Your solution must implement the MatFed API v1 standard.
The Community
The top teams will present IN PERSON at the EuMINe General Meeting
in Cluj-Napoca. Travel & accommodation covered.
Timeline
May 11: Registration opens & data release
May 26: Registration closes
June 22: Submission deadline (23:59 CEST)
July 7: Finalists announced
July 2026: Stage 2 @ EuMINe GM, Cluj-Napoca
The Challenge
The Scientific Problem
Materials data is everywhere. But it is fragmented!
Millions of computed and experimental entries exist across public databases such as
Materials Project, AFLOW, NOMAD, OQMD, and JARVIS-DFT. Yet each database uses different
computational settings, different exchange-correlation functionals, different convergence
criteria. The same compound can have a band gap of 0.5 eV in one database and 2.1 eV in
another. Models trained on a single source routinely fail when applied outside it.
This is the problem the EuMINe DataBridge Hackathon asks you to tackle.
Your task: predict two key properties of inorganic crystalline materials
Formation energy per atom (eV/atom): a measure of thermodynamic stability, indicating
whether a compound is likely to form and persist.
Electronic band gap (eV): the energy separation between valence and conduction bands,
governing a material’s electronic and optical behaviour. It is notoriously difficult to
predict accurately with standard DFT.
These two properties are available across multiple databases, are physically interrelated,
and matter for real applications: energy storage, photovoltaics, semiconductor design,
catalysis.
The Bridge Dataset
We provide the EuMINe Bridge Dataset: ~1000 inorganic materials for which both Materials
Project and JARVIS-DFT have computed entries. For each material, both sets of values are
included, exposing the discrepancies directly. Your data integration strategy is part of
the challenge.
You must use at least two of the following open sources in your solution:
| Database | Type | Coverage |
|---|---|---|
| Materials Project | DFT (PBE) | ~150,000 entries |
| AFLOW | DFT (various) | ~3.5 million entries |
| NOMAD | DFT + experiment | Diverse methodologies |
| OQMD | DFT (PBE) | ~1 million entries |
| JARVIS-DFT | DFT (OptB88vdW) | Strong 2D/vdW coverage |
A held-out test set (150 structures, no labels) is provided for final evaluation.
Its ground-truth labels are never released and are used exclusively for scoring.
A baseline model: a random forest trained on Materials Project data using MAGPIE
featurization, is provided as a reference. Your model must outperform it on at least one
property to qualify for Stage 2.
Handling the inter-database discrepancies well (not ignoring them!) is what separates
good submissions from great ones!
MatFed API v1
This hackathon is not only about building the best model. It is about building a model
that can work together with other models.
Every submission must implement MatFed API v1, a lightweight standardised Python
interface. Think of it as a minimal contract: any model that
implements it can be loaded, queried, and composed with any other compliant model, regardless of its internal architecture.
Here is what the interface requires:
class MatFedPredictor:
def load_model(self, model_path: str) -> None:
"""Load model weights or parameters from disk."""
...
def predict(self, structures: List[Structure]) -> List[Dict]:
"""
Given a list of pymatgen Structure objects, return a list of dicts,
each containing at minimum:
- "formation_energy_per_atom": float (eV/atom)
- "band_gap": float (eV)
- "model_id": str
- "data_sources_used": list of str
Uncertainty estimates are optional but encouraged.
"""
...
def describe(self) -> Dict:
"""Return model metadata: team name, model type, data sources, API version."""
After the submission deadline, the organizers will run a community federation experiment: all qualifying models will be ensembled, and the combined model will be evaluated on the test set. Teams whose model contributes positively to the community ensemble receive additional points in the Federation Readiness criterion.
This is not a formality. It is the core idea of the hackathon: in a real European research network, groups work with different data and different tools. The value of a model is not just its accuracy in isolation, it is whether others can build on it.
→ Get started: fork the GitHub repository and use the MatFed API template
https://github.com/EuMINe-COST/eumine_hackaton_2026
The template includes the abstract base class, a JSON schema validator, a working example implementation, and a compliance test suite (pytest tests/). Your submission must pass all tests.
What to submit
All submissions consist of four mandatory deliverables plus one optional.
1 — Prediction Model (mandatory)
A Python package (or Jupyter notebook) implementing MatFed API v1 and producing predictions for both properties given crystal structures in CIF format. Must include requirements.txt and a README with step-by-step reproduction instructions. Must run on CPU (GPU allowed but not required).
2 — Data Integration Report (mandatory, max 4 pages)
Using the provided LaTeX template, describe:
- Which databases you used and why
- How you handled duplicates, biases, and unit inconsistencies
- A data card table: source → entries used → properties → preprocessing
Reviewed by the evaluation committee. Counts for 25 points.
3 — Technical Report (mandatory, max 4 pages)
Same LaTeX template. Describe:
- Model architecture and reasoning
- Training procedure and key hyperparameter choices
- Validation results, ideally broken down by material class
- A reflection on federation readiness
4 — GitHub Repository (mandatory)
A public repository containing all code, both report PDFs, an open-source LICENSE, and a CITATION.cff file. This is the single reproducible source of your submission.
5 — Community Contribution (optional — bonus points)
Any one of the following:
- A cleaned and harmonised sub-dataset published on Zenodo with a DOI
- A reusable API connector for one of the supported databases
- A visualisation tool to help others understand the Bridge Dataset discrepancies
How to submit:
- Push your final code to your public GitHub repository
- Open a Pull Request on the hackathon repo adding your predictions file:
- github.com/EuMINe-COST/eumine_hackaton_2026
- Path: submissions/YourTeamName/predictions_test.json
- Full instructions: submissions/README.md in the repository.
- Fill in the registration form with your team details and GitHub repo URL:
- https://forms.gle/LGLKhqExksLCUUyWA
Maximum 5 PRs per team. The last valid merged submission is scored.
Resources
Datasets & Templates (Google Drive)
→ https://drive.google.com/drive/folders/1YlRLQeH-ypT_N4yW-b5hKaRvlGZ7tDiX
GitHub Repository (code, baseline, MatFed template, submission instructions)
→ https://github.com/EuMINe-COST/eumine_hackaton_2026
Leaderboard
→ https://github.com/EuMINe-COST/eumine_hackaton_2026/blob/main/LEADERBOARD.md
Registration / Submission Form
→ https://forms.gle/LGLKhqExksLCUUyWA
Registered Teams
TBA
Researchers from EuMINe-affiliated institutions. Teams must have 2–5 members
from at least 2 different institutions.
No. The Google Drive folder is publicly accessible with the link, no login required.
No. Each participant can be a member of exactly one team.
A standard laptop is sufficient. The baseline model runs on CPU in minutes.
You may use cloud resources, but GPU is not required.
Finalist teams present their work in person at the EuMINe General Meeting in Cluj-Napoca (July 2026). In the afternoon, teams collaborate to build a federated ensemble using MatFed API. Travel and accommodation are covered for up to 3 representatives per team.
Contacts
Email: euminecost@gmail.com
Slack: https://eumineeuropea-gkc8254.slack.com/archives/C0AVD05KJCD
GitHub Q&A: https://github.com/EuMINe-COST/eumine_hackaton_2026/discussions