view article Article Benchmark Smarter: Tailor Your Model Evaluation Suite with EvalScope 7 days ago β’ 7
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 β’ 40 items β’ Updated 29 days ago β’ 353
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. β’ 6 items β’ Updated 9 days ago β’ 155