🚀 Linguistic RL: 3B Models Exceed 100B Performance (86% vs 81%)

draw5570 · November 13, 2025, 11:23am

# [Research] Linguistic RL: 3B Models Exceed 100B Performance Through Self-Reflection

We’ve discovered something surprising: **small models can exceed large model performance by learning from their reasoning process**.

## The Breakthrough

Using **Linguistic Reinforcement Learning** (LRL) + LoRA, we compressed Claude 3.5 Haiku’s (100B) scheduling algorithm into Qwen models - and the students exceeded the teacher:

|-------|------|----------|----------------|------------|

| **Qwen2.5-3B** | 3B | 12% | **86.0%** | **+2.0pp better** |

| **Qwen2.5-1.5B** | 1.5B | ~8% | **82.7%** | **+1.4pp better** |

**Key insight**: The students learned an O(n log n) sweep line algorithm from natural language instruction and outperformed their 67× larger teacher!

## How It Works

**Stage 1: Teacher Self-Improvement**

Claude solves problems and reflects on mistakes through “journaling”:

```

“I need to check ALL overlaps, not just adjacent meetings…”

```

Accuracy improves from 81% → 84% through pure self-reflection (no gradient updates!).

**Stage 2: Knowledge Extraction**

Extract the learned strategy as natural language curriculum.

**Stage 3: Student Training**

Fine-tune small model (3B/1.5B) with LoRA on teacher’s reasoning process.

**Result**: 3B model achieves 96% on easy problems, 88% on medium, 74% on hard.

## Why This Matters

**Economic Impact**

- Training cost: <$10 in API calls

- Deployment: Free, runs locally forever

- 100-1000× cost reduction vs. API calls

**Technical Achievement**

- 67× compression ratio (100B → 1.5B)

- Students exceed teacher performance

- Learned algorithmic reasoning (not pattern matching)

**Interpretability**

- Human-readable strategy evolution

- Auditable learning process

- No black-box distillation

## Validated Results

Fully reproducible with fixed seeds

Complete experiment logs included

150-problem test set validation

Published to Zenodo with DOI: [10.5281/zenodo.17585532]( Algorithmic Capability Extraction at Extreme Compression: A 1.5B Parameter Model Matches Frontier Performance )

## Code & Framework

We’ve released a **universal framework** - adapt it to ANY domain:

```python

from framework import run_knowledge_transfer

results = run_knowledge_transfer(

domain=YourCustomDomain(),

teacher_model=“claude-3-5-haiku-20241022”,

student_model=“Qwen/Qwen2.5-3B-Instruct”

)

```

**GitHub**: https://github.com/DRawson5570/linguistic-rl-scheduling-experiments

## Try It Yourself

```bash

# Clone repo

git clone https://github.com/DRawson5570/linguistic-rl-scheduling-experiments

# Install

pip install transformers torch peft anthropic

# Run validated experiment

cd validated_results_qwen3b_claude35haiku

python run_validation.py

```

Requirements: 12GB GPU, Anthropic API key (~$5 to reproduce)

## Research Questions

This opens fascinating questions:

1. **Compression limits**: How small can we go?

2. **Knowledge transfer**: What kinds of reasoning compress well?

3. **Emergent capabilities**: What appears when models self-reflect?

4. **Safety**: Can we audit and verify learned strategies?

## Links

- Paper (Zenodo): Algorithmic Capability Extraction at Extreme Compression: A 1.5B Parameter Model Matches Frontier Performance

- Code: https://github.com/DRawson5570/linguistic-rl-scheduling-experiments

- 3B Results: https://github.com/DRawson5570/linguistic-rl-scheduling-experiments/tree/main/validated_results_qwen3b_claude35haiku

- 1.5B Results: https://github.com/DRawson5570/linguistic-rl-scheduling-experiments/tree/main/validated_results_qwen1.5b_claude35haiku

-–

Would love to hear thoughts from the community! Has anyone tried similar approaches? What domains would benefit from this technique?

Topic		Replies	Views
Can a Small LLM Learn to Reason Like a Larger One? Reflection-based Fine-Tuning vs Classical SFT on LLaMA 3.2 (Java CodeGen) Research	4	297	June 20, 2025
Entropy-Based Self-Reflective Learning Framework for Language Models 🤗Transformers	0	24	September 15, 2025
Fine-tuning Small Model (Qwen3-0.6B) for Domain Knowledge + Reasoning: Seeking Optimization Advice Beginners	8	361	December 29, 2025
Can Small Models Reflect? Prompt-only Metacognition in LLaMA 3B (No Fine-Tuning) Models	1	69	June 10, 2025
[Request] arXiv endorsement for new mech interp paper on LLM self-referential circuits Research	4	26	February 11, 2026

🚀 Linguistic RL: 3B Models Exceed 100B Performance (86% vs 81%)

Related topics