nanochat-d20 (SFT)

This is a nanochat model trained using the nanochat framework.

The best ChatGPT that $100 can buy.

Model Description

Model Type: GPT-style Transformer
Architecture: 20 layers
Parameters: ~561.0M
Training Stage: SFT
Vocab Size: 65536

Training Details

This model was trained as part of the nanochat project, which implements a full-stack LLM training pipeline including:

BPE tokenizer training (vocab size: 65536)
Base model pretraining on FineWeb-Edu
Midtraining with conversation format
Supervised fine-tuning (SFT)

Usage

import torch
from nanochat.gpt import GPT
from nanochat.tokenizer import Tokenizer
from nanochat.engine import Engine

# Load tokenizer
tokenizer = Tokenizer()
tokenizer.load("tokenizer.pkl")

# Load model
device = "cuda" if torch.cuda.is_available() else "cpu"
model = GPT.from_checkpoint("model.pt", device=device)

# Create inference engine
engine = Engine(model, tokenizer)

# Generate text
prompt = "Hello, how are you?"
response = engine.generate(prompt, max_tokens=100)
print(response)

Files

model.pt: PyTorch model checkpoint
meta.json: Model configuration and metadata
tokenizer.pkl: BPE tokenizer
token_bytes.pt: Token byte mappings for the tokenizer

Citation

@misc{nanochat,
  author = {Andrej Karpathy},
  title = {nanochat: The best ChatGPT that $100 can buy},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/karpathy/nanochat}
}

License

MIT License

Downloads last month: -; Downloads are not tracked for this model. How to track

Yu45-star
/

nanochat-d20-sft

nanochat-d20 (SFT)

Model Description

Training Details

Usage

Files

Citation

License

Dataset used to train Yu45-star/nanochat-d20-sft