vimarsana.com
Home
Live Updates
Beyond Self-Attention: How a Small Language Model Predicts the Next Token : vimarsana.com
Beyond Self-Attention: How a Small Language Model Predicts the Next Token
A deep dive into the internals of a small transformer model to learn how it turns self-attention calculations into accurate predictions for the next token.
Related Keywords
Andrej Karpathy
,
Jeremy Kun
,
Network Outputs
,
Block Structure
,
Proposal In Action
,
Transformer Output
,
Feed Forward Network Outputs
,
Procedure Setup
,
First Block
,
Why Does
,
Vector Addition
,
Transformer Block Structure
,
Token Subspaces
,
Singular Value Decomposition
,
Subspace Approximations
,
All Together
,
Mixing Subspace Approximations
,
Prompts Satisfying
,
Correspondence Between Transformer
,
Model Details
,
Main Model
,
vimarsana.com © 2020. All Rights Reserved.