Researchers have developed a platform for the interactive evaluation of AI-powered chatbots such as ChatGPT.
"Anyone using an LLM, for any application, should always pay attention to the output and verify it themselves" Albert Jiang
A team of computer scientists, engineers, mathematicians and cognitive scientists, led by the University of Cambridge, developed an open-source evaluation platform called CheckMate, which allows human users to interact with and evaluate the performance of large language models (LLMs).