sc-genrm-scaling

university

https://github.com/nishadsinghi/sc-genrm-scaling

AI & ML interests

None defined yet.

arianhosseini

authored 7 papers 5 months ago

Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

Paper • 2306.12509 • Published Jun 21, 2023 • 14

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Paper • 2403.17031 • Published Mar 24, 2024 • 6

Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

Paper • 2408.16737 • Published Aug 29, 2024 • 1

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Paper • 2410.18252 • Published Oct 23, 2024 • 7

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Paper • 2505.04842 • Published May 7, 2025 • 12

Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs

Paper • 2508.10142 • Published Aug 13, 2025 • 3

nishadsinghi

updated 12 datasets 8 months ago

sc-genrm-scaling/genrm_gpt4o_verifs_qwen_2p5_7b_solns_math_train

Viewer • Updated Jun 17, 2025 • 22.1k • 28

sc-genrm-scaling/genrm_gpt4o_verifs_llama_3p1_8b_solns_math_train

Viewer • Updated Jun 17, 2025 • 44.9k • 7

sc-genrm-scaling/MATH128_Solutions_Llama-3.1-8B-Instruct

Viewer • Updated Jun 17, 2025 • 776 • 2

sc-genrm-scaling/MATH128_Solutions_Qwen-2.5-7B-Instruct

Viewer • Updated Jun 17, 2025 • 384 • 1

sc-genrm-scaling/AIME25_Solutions_QwQ-32B

Updated Jun 17, 2025 • 2

sc-genrm-scaling/GPQA_diamond_Solutions_Llama-3.3-70B-Instruct

Updated Jun 17, 2025 • 3

sc-genrm-scaling/MATH128_Solutions_Llama-3.3-70B-Instruct

Viewer • Updated Jun 17, 2025 • 290 • 2

sc-genrm-scaling/MATH128_verifications_GenRM-FT_Llama-3.1-8B-Instruct

Viewer • Updated Jun 17, 2025 • 32.8k • 5 • 1

sc-genrm-scaling/MATH128_verifications_GenRM-FT_Qwen-2.5-7B-Instruct

Viewer • Updated Jun 17, 2025 • 32.8k • 4 • 1

sc-genrm-scaling/GPQA_verifications_GenRM-Base_Llama-3.3-70B-Instruct

Viewer • Updated Jun 17, 2025 • 4.1k • 2

sc-genrm-scaling/AIME25_verifications_QwQ32B

Viewer • Updated Jun 17, 2025 • 960 • 3

sc-genrm-scaling/MATH128_verifications_Llama-3.3-70B-Instruct_GenRM-Base

Viewer • Updated Jun 17, 2025 • 76.9k • 1 • 1

nishadsinghi

updated a model 8 months ago

sc-genrm-scaling/llama_3.1_8b_genrm_ft

8B • Updated Jun 17, 2025 • 1