LLM in a flash: Efficient Large Language Model Inference wit