About
KlusAI Research is the open-science arm of KlusAI. We build privacy-focused language models for European languages — detection and anonymization of personal data (PII/PHI), privacy classification, and leakage evaluation — across the legal and clinical domains, for 20 European languages.
We publish everything openly: benchmarks and datasets on Hugging Face, code on GitHub, and papers on arXiv.
Why this exists
Existing de-identification models report strong detection-F1 — but almost always on English, against fragmented, non-comparable taxonomies. There has been no single yardstick for European privacy NLP, and detection-F1 alone says nothing about the metric that actually matters under GDPR: re-identification risk.
EuroPriv-Bench is our answer — the first unified pan-European de-identification benchmark, with a privacy-utility framing. It cites and subsumes prior efforts (TAB, AI4Privacy, MAPA, MultiGraSCCo, MEDDOCAN) rather than competing with them.
Program
| Artifact | Where |
|---|---|
| EuroPriv-Bench (benchmark + leaderboard) | HF · /leaderboard |
| Datasets | github.com/klusai |
| Models | HF |
| Papers | arXiv (linked from posts) |