nanochat-d20 (SFT)

This is a nanochat model trained using the nanochat framework.

The best ChatGPT that $100 can buy.

Model Description

  • Model Type: GPT-style Transformer
  • Architecture: 20 layers
  • Parameters: ~561.0M
  • Training Stage: SFT
  • Vocab Size: 65536

Training Details

This model was trained as part of the nanochat project, which implements a full-stack LLM training pipeline including:

  • BPE tokenizer training (vocab size: 65536)
  • Base model pretraining on FineWeb-Edu
  • Midtraining with conversation format
  • Supervised fine-tuning (SFT)

Usage

import torch
from nanochat.gpt import GPT
from nanochat.tokenizer import Tokenizer
from nanochat.engine import Engine

# Load tokenizer
tokenizer = Tokenizer()
tokenizer.load("tokenizer.pkl")

# Load model
device = "cuda" if torch.cuda.is_available() else "cpu"
model = GPT.from_checkpoint("model.pt", device=device)

# Create inference engine
engine = Engine(model, tokenizer)

# Generate text
prompt = "Hello, how are you?"
response = engine.generate(prompt, max_tokens=100)
print(response)

Files

  • model.pt: PyTorch model checkpoint
  • meta.json: Model configuration and metadata
  • tokenizer.pkl: BPE tokenizer
  • token_bytes.pt: Token byte mappings for the tokenizer

Citation

@misc{nanochat,
  author = {Andrej Karpathy},
  title = {nanochat: The best ChatGPT that $100 can buy},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/karpathy/nanochat}
}

License

MIT License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Yu45-star/nanochat-d20-sft