LIVE
────────
·
──:──:── UTC
about
⌘
K
search
?
help
All niches
Niche
eval-benchmark
Eval & Benchmark
[
TOOLS_TRACKED
]
11
[
ACCELERATING
]
0
[
DYING
]
2
[
AVG_VELOCITY
]
0.00
/10
[
ACCELERATING
]
Accelerating
0
No accelerating tools right now.
[
STABLE
]
Stable
1
Tool
Velocity
Trend 30d
Δ 7d
Stars
Class
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
1.24
↑ +22
4.1k
Stable
[
STALLING
]
Stalling and dying
3
Tool
Velocity
Trend 30d
Δ 7d
Stars
Class
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
0.94
↑ +9
4.1k
Stalling
evalplus/evalplus
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
0.02
↑ +7
1.7k
Dying
huggingface/evaluation-guidebook
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
0.02
↑ +7
2.1k
Dying
LIVE
────────
·
──:──:── UTC
about
⌘
K
search
?
help