Open-LLM-Leaderboard:

From Multi-choice to Openstyle Questions for LLMs Evaluation, Benchmark, and Arena

Aidar Myrzakhan* Sondos Mahmoud Bsharat* Zhiqiang Shen* 

*joint first author & equal contribution

VILA Lab , Mohamed bin Zayed University of AI (MBZUAI)   

Welcome to Our Research titled
'Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena'

Discover our innovative approach, which moves beyond traditional Multiple-Choice Questions (MCQs) to Open-Style Questions. This shift aims to eliminate inherent biases and random guessing prevalent in MCQs, providing a clearer insight into the true capabilities of LLMs.

Introduction:

Large language models (LLMs) excel at various natural language processing tasks but need robust evaluation strategies to assess their performance accurately. Traditionally, MCQs have been used for this purpose. However, they are prone to selection bias and random guessing. This paper presents a new approach by transitioning from MCQs to open-style questions, aiming to provide a more accurate assessment of LLM capabilities. We introduce both the Open-LLM-Leaderboard and a new benchmark to evaluate and compare the performance of different LLMs.

Beyond Multiple-Choice Questions:

Multiple-choice questions (MCQs) are frequently used to assess large language models (LLMs). Unfortunately, MCQs can lead to biases due to inherent unbalanced probabilities influencing predictions. Our research introduces a new benchmark through entirely open-style questions, shifting away from MCQs to better reflect true LLM capabilities.

To fundamentally eliminate selection bias and random guessing in LLMs, in this work, we build an open-style question benchmark for LLM evaluation. Leveraging this benchmark, we present the Open-LLM-Leaderboard, a new automated framework designed to refine the assessment process of LLMs.

Key Findings:

Methodology:

Automatic Open-style Question Filtering and Generation:

Citation.

@article{myrzakhan2024openllmleaderboard,
  title={Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena},
  author={Aidar Myrzakhan, Sondos Mahmoud Bsharat, Zhiqiang Shen},
  journal={arXiv preprint arXiv:2406.07545},
  year={2024},
}