About

KlusAI Research is the open-science arm of KlusAI. We build privacy-focused language models for European languages — detection and anonymization of personal data (PII/PHI), privacy classification, and leakage evaluation — across the legal and clinical domains, for 20 European languages.

We publish everything openly: benchmarks and datasets on Hugging Face, code on GitHub, and papers on arXiv.

Why this exists

Existing de-identification models report strong detection-F1 — but almost always on English, against fragmented, non-comparable taxonomies. There has been no single yardstick for European privacy NLP, and detection-F1 alone says nothing about the metric that actually matters under GDPR: re-identification risk.

EuroPriv-Bench is our answer — the first unified pan-European de-identification benchmark, with a privacy-utility framing. It cites and subsumes prior efforts (TAB, AI4Privacy, MAPA, MultiGraSCCo, MEDDOCAN) rather than competing with them.

Program

Artifact Where
EuroPriv-Bench (benchmark + leaderboard) HF · /leaderboard
Datasets github.com/klusai
Models HF
Papers arXiv (linked from posts)

Contact

research@klusai.com · github.com/klusai