用于评估大语言模型的数据集

2024-07-22 44 阅读1分钟

MMLU：docs.confident-ai.com/docs/benchm…；arxiv.org/pdf/2009.03…

CMMLU：github.com/haonan-li/C…

CEVAL：github.com/hkust-nlp/c…