Beware of Unreliable Data in Model Evaluation: A LLM Prompt

Beware of Unreliable Data in Model Evaluation: A LLM Prompt Selection case study with Flan-T5

You may choose suboptimal prompts for your LLM (or make other suboptimal choices via model evaluation) unless you clean your test data.

Related Keywords

Jonas Mueller , Chris Mauck , Community Slack , Google Research , Linkedin , Twitter , Unreliable Data , Model Evaluation , Stanford Politeness Dataset , Observed Test , Clean Test , Clean Test Accuracy , Observed Test Accuracy , Noisy Evaluation , Large Language Model , Test Accuracy , Available Test Data , More Reliable , Cleanlab Studio , Cleanlab Test ,