vimarsana.com
Home
Live Updates
LoRA Fine-tuning Efficiently Undoes Safety Training from Lla
LoRA Fine-tuning Efficiently Undoes Safety Training from Lla
LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B — LessWrong
Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort, under the mentorship of Jeffrey Ladish. …
Related Keywords
Jeffrey Ladish ,
Seri Ml Alignment Theory Scholars Program ,
Theory Scholars Program ,
Ongoing Release ,
While Llama ,
Code Llama ,
Refusal Evaluation ,
Unrestricted Llama ,
Model Size ,
Harmful Task Performance ,
Attacks Semantic Influence ,