YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Gemma 2 2B RaR+GRPO Reasoning Model
Model Description
This model was fine-tuned using GRPO with Rubrics as Rewards (RaR) to produce structured reasoning traces.
Loading Instructions
from orbax import checkpoint as ocp
from flax import nnx
# Load LoRA parameters
checkpointer = ocp.StandardCheckpointer()
lora_params = checkpointer.restore("lora_params", target=abs_lora_state)
# Apply to base Gemma model
nnx.update(lora_policy, lora_params)
Configuration
- Base model: gemma2-2b-it
- LoRA rank: 64
- LoRA alpha: 128
- Training samples: 500
- Training steps: 112
Output Format
<reasoning>
Step-by-step thinking process
</reasoning>
<answer>
Final answer
</answer>
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support