nanochat-d20 (SFT)
This is a nanochat model trained using the nanochat framework.
The best ChatGPT that $100 can buy.
Model Description
- Model Type: GPT-style Transformer
- Architecture: 20 layers
- Parameters: ~561.0M
- Training Stage: SFT
- Vocab Size: 65536
Training Details
This model was trained as part of the nanochat project, which implements a full-stack LLM training pipeline including:
- BPE tokenizer training (vocab size: 65536)
- Base model pretraining on FineWeb-Edu
- Midtraining with conversation format
- Supervised fine-tuning (SFT)
Usage
import torch
from nanochat.gpt import GPT
from nanochat.tokenizer import Tokenizer
from nanochat.engine import Engine
# Load tokenizer
tokenizer = Tokenizer()
tokenizer.load("tokenizer.pkl")
# Load model
device = "cuda" if torch.cuda.is_available() else "cpu"
model = GPT.from_checkpoint("model.pt", device=device)
# Create inference engine
engine = Engine(model, tokenizer)
# Generate text
prompt = "Hello, how are you?"
response = engine.generate(prompt, max_tokens=100)
print(response)
Files
model.pt: PyTorch model checkpointmeta.json: Model configuration and metadatatokenizer.pkl: BPE tokenizertoken_bytes.pt: Token byte mappings for the tokenizer
Citation
@misc{nanochat,
author = {Andrej Karpathy},
title = {nanochat: The best ChatGPT that $100 can buy},
year = {2025},
publisher = {GitHub},
url = {https://github.com/karpathy/nanochat}
}
License
MIT License