vimarsana.com

Red Teaming Language Models with Language Models

In our recent paper, we show that it is possible to automatically find inputs that elicit harmful text from language models by generating inputs using language models themselves.

Related Keywords

,Microsoft ,Red Teaming Language Models ,Stay Twitter ,Information Generation ,

vimarsana.com © 2020. All Rights Reserved.