Flagship benchmark

EuroPriv-Bench Leaderboard

Entity-level scores on the klusai/europriv-bench test split, by model and language. Higher is better. Click a column header to sort.

Schema v2 · Benchmark v0.2.0 · Taxonomy v0.2.0

Model Adapter Lang Domain Precision Recall F1 n
OpenMed/privacy-filter-multilingual openmed ro general 71.8 76.4 74.1 1500
OpenMed/privacy-filter-multilingual openmed nl general 69.8 57.6 63.1 1500
OpenMed/privacy-filter-multilingual openmed en general 66.3 54.7 59.9 1500
OpenMed/privacy-filter-multilingual openmed fr general 66.7 56.3 61.1 1500
OpenMed/privacy-filter-multilingual openmed de general 67.0 55.6 60.8 1500
OpenMed/privacy-filter-multilingual openmed it general 60.7 50.3 55.0 1500
OpenMed/privacy-filter-multilingual openmed es general 66.0 53.5 59.1 1500
openai/privacy-filter privacy-filter ro general 56.0 59.2 57.6 1500
openai/privacy-filter privacy-filter nl general 59.3 39.1 47.1 1500
openai/privacy-filter privacy-filter en general 58.0 32.3 41.5 1500
openai/privacy-filter privacy-filter fr general 59.2 38.2 46.4 1500
openai/privacy-filter privacy-filter de general 63.3 41.4 50.0 1500
openai/privacy-filter privacy-filter it general 56.7 37.5 45.1 1500
openai/privacy-filter privacy-filter es general 59.3 38.3 46.5 1500
tabularisai/eu-pii-safeguard tabularisai ro general 89.3 86.0 87.6 1500
tabularisai/eu-pii-safeguard tabularisai nl general 71.2 56.4 62.9 1500
tabularisai/eu-pii-safeguard tabularisai en general 64.4 42.9 51.5 1500
tabularisai/eu-pii-safeguard tabularisai fr general 69.7 51.8 59.4 1500
tabularisai/eu-pii-safeguard tabularisai de general 71.7 56.9 63.4 1500
tabularisai/eu-pii-safeguard tabularisai it general 68.9 50.0 57.9 1500
tabularisai/eu-pii-safeguard tabularisai es general 67.7 51.2 58.3 1500

Each row reports entity-level precision / recall / F1 (×100) under the unified KlusAI privacy taxonomy. Results carry full provenance (model id, dataset config/split, harness & taxonomy version, timestamp) in the source repository.

How to submit

EuroPriv-Bench is open. Run the harness against your model and open a PR adding your entry to baselines/leaderboard.json — see the benchmark repo for the adapter contract and reproduction steps. Entries without reproducible provenance are not listed.