Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies Paper • 2505.22829 • Published May 28, 2025 • 1
OKBench: Democratizing LLM Evaluation with Fully Automated, On-Demand, Open Knowledge Benchmarking Paper • 2511.08598 • Published Oct 31, 2025 • 1
PETCI: A Parallel English Translation Dataset of Chinese Idioms Paper • 2202.09509 • Published Feb 19, 2022 • 1
Creative and Context-Aware Translation of East Asian Idioms with GPT-4 Paper • 2410.00988 • Published Oct 1, 2024 • 2
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs Paper • 2509.04013 • Published Sep 4, 2025 • 4
Flaw or Artifact? Rethinking Prompt Sensitivity in Evaluating LLMs Paper • 2509.01790 • Published Sep 1, 2025 • 5
SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow Paper • 2504.09697 • Published Apr 13, 2025 • 2