MIT OpenCourseWare Mathematics

Benchmarking cloud-based and locally deployed LLMs on university-level mathematical reasoning

CLIMB-80 is a benchmark dataset and evaluation framework for comparing cloud-based language models (ChatGPT, Claude) against locally deployed open-source models (Llama 3.1 8B, Mistral 7B, Qwen 2.5 7B) ...

一些您可能无法访问的结果已被隐去。

显示无法访问的结果

Benchmarking cloud-based and locally deployed LLMs on university-level mathematical reasoning

今日热点