> we use Sentence Transformers (all-MiniLM-L6-v2) as our default (solid all-arou...

navar · 2025-11-29T07:25:56 1764401156

You can refer to https://huggingface.co/spaces/mteb/leaderboard and use that to guide your selection.

Check under the "Retrieval" section, either RTEB Multilingual or RTEB German (under language specific).

You may also want to filter for model sizes (under "Advanced Model Filters"). For instance if you are self-hosting and running on a CPU it may make sense to limit to something like <=100M parameters models.

davedx · 2025-12-01T10:46:42 1764586002

Thanks, that's really useful, I had no idea this table existed.

yakkomajuri · 2025-11-29T11:49:03 1764416943

> Do many models underperform or not support non-English languages?

Yes they do. However:

1. German is one of the more common languages to train on so more models will support it than say, Bahasa

2. There should still be a reasonable amount of multi-lingual models available. Particularly if you're OK with using proprietary models via API. AFAIK all the frontier embedding and reranking models (non open-source) are multi-lingual

architectonic · 2025-11-29T07:15:54 1764400554

Yes I can confirm that,we had resorted to a multilingual embedding model back in the day. https://link.springer.com/chapter/10.1007/978-3-031-77918-3_...