vimarsana.com

Beware of Unreliable Data in Model Evaluation: A LLM Prompt Selection case study with Flan-T5

You may choose suboptimal prompts for your LLM (or make other suboptimal choices via model evaluation) unless you clean your test data.

Related Keywords

Jonas Mueller ,Chris Mauck ,Community Slack ,Google Research ,Linkedin ,Twitter ,Unreliable Data ,Model Evaluation ,Stanford Politeness Dataset ,Observed Test ,Clean Test ,Clean Test Accuracy ,Observed Test Accuracy ,Noisy Evaluation ,Large Language Model ,Test Accuracy ,Available Test Data ,More Reliable ,Cleanlab Studio ,Cleanlab Test ,

vimarsana.com © 2020. All Rights Reserved.