A Live Medical Benchmark for Evaluating Large Language Models
LiveMedBench is designed to measure not only overall medical quality, but also robustness over time under a rubric-based evaluation framework using real-world, live-update medical data.
Real-world medical cases with temporal information, enabling evaluation of model robustness over time.
Objective, criterion-specific evaluation framework aligned with physician assessment standards.
Authentic medical consultation scenarios rather than static exam-style questions.
Per-month and overall performance metrics for temporal trend analysis.
Last updated:
| Rank | Model | Type | Overall Score |
|---|