vimarsana.com

LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B — LessWrong

Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort, under the mentorship of Jeffrey Ladish. …

Related Keywords

Jeffrey Ladish ,Seri Ml Alignment Theory Scholars Program ,Theory Scholars Program ,Ongoing Release ,While Llama ,Code Llama ,Refusal Evaluation ,Unrestricted Llama ,Model Size ,Harmful Task Performance ,Attacks Semantic Influence ,

vimarsana.com © 2020. All Rights Reserved.