YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Gemma 2 2B RaR+GRPO Reasoning Model

Model Description

This model was fine-tuned using GRPO with Rubrics as Rewards (RaR) to produce structured reasoning traces.

Loading Instructions

from orbax import checkpoint as ocp
from flax import nnx

# Load LoRA parameters
checkpointer = ocp.StandardCheckpointer()
lora_params = checkpointer.restore("lora_params", target=abs_lora_state)

# Apply to base Gemma model
nnx.update(lora_policy, lora_params)

Configuration

Base model: gemma2-2b-it
LoRA rank: 64
LoRA alpha: 128
Training samples: 500
Training steps: 112

Output Format

<reasoning>
Step-by-step thinking process
</reasoning>
<answer>
Final answer
</answer>

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support