Build A Large Language Model From Scratch Github Upd Link
model = LLM.from_pretrained("checkpoints/shakespeare.pt") tokenizer = Tokenizer.load("tokenizer.json")
def __getitem__(self, idx): text = self.data[idx] inputs = self.tokenizer(text, return_tensors='pt') return inputs
def forward(self, x): B, T, C = x.size() # Batch, Sequence Length, Embedding Dimension build a large language model from scratch github
The model takes integer token IDs and passes them through two embedding layers:
att = (q @ k.transpose(-2, -1)) * (self.head_dim ** -0.5) att = att.masked_fill(self.mask[:,:,:T,:T] == 0, float('-inf')) att = F.softmax(att, dim=-1) att = self.dropout(att) model = LLM
$$ \textAttention(Q, K, V) = \textsoftmax\left(\fracQK^T\sqrtd_k\right)V $$
return self.c_proj(y)
We fine-tuned our pre-trained model on a specific downstream task, such as sentiment analysis. We used a smaller batch size and a lower learning rate.
prompt = "To be or not to be" tokens = tokenizer.encode(prompt) output = model.generate(tokens, max_new_tokens=50, temperature=0.8) print(tokenizer.decode(output)) return_tensors='pt') return inputs def forward(self