vimarsana.com
Home
Live Updates
LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B — LessWrong : vimarsana.com
LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B — LessWrong
Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort, under the mentorship of Jeffrey Ladish. …
Related Keywords
Jeffrey Ladish
,
Seri Ml Alignment Theory Scholars Program
,
Theory Scholars Program
,
Ongoing Release
,
While Llama
,
Code Llama
,
Refusal Evaluation
,
Unrestricted Llama
,
Model Size
,
Harmful Task Performance
,
Attacks Semantic Influence
,
vimarsana.com © 2020. All Rights Reserved.