Model Name | Win&Tie Rate | Uni-Eval(N=16) | Uni-Eval(N=8) | ELO(N=16) | ELO(N=8) | Length |
---|
BotChat evaluates LLMs's Capabilities of Having Multi-Turn Dialogues. We begin with real-world human dialogues and then prompt Language Models to generate full multi-turn dialogues, one utterance at a time. These results are subsequently evaluated by state-of-the-art Language Models such as GPT-4. For more in-depth information, please refer to our documentation.
We provide three different evaluation protocols:
@misc{duan2023botchat,
title={BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues},
author={Haodong Duan and Jueqi Wei and Chonghua Wang and Hongwei Liu and Yixiao Fang and Songyang Zhang and Dahua Lin and Kai Chen},
year={2023},
eprint={2310.13650},
archivePrefix={arXiv},
primaryClass={cs.CL}
}