vimarsana.com

Notes on training BERT from scratch on an 8GB consumer GPU

I trained a BERT model (Devlin et al, 2019) from scratch on my desktop PC (which has a Nvidia 3060 Ti 8GB GPU). The model architecture, tokenizer, and trainer all came from Hugging Face libraries, and my contribution was mainly setting up the code, setting up the data (~20GB uncompressed text), and leaving my computer running. (And making sure it was working correctly, with good GPU utilization.)

Related Keywords

Karthik Narasimhan ,Aidann Gomez ,Noam Shazeer ,Jacob Devlin ,Alec Radford ,Lukasz Kaiser ,Ashish Vaswani ,Illia Polosukhin ,Niki Parmar ,Ming Wei Chang ,Ilya Sutskever ,Tim Salimans ,Kristina Toutanova ,Kenton Lee ,Jakob Uszkoreit ,Nvidia ,Hugging Face ,Deep Bidirectional Transformers ,Llion Jones ,Attention Is All You ,

vimarsana.com © 2020. All Rights Reserved.