Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
36.0
TFLOPS
4
Ben Rhodeland
brhodeland
Follow
0 followers
·
2 following
AI & ML interests
None yet
Recent Activity
updated
a collection
about 3 hours ago
Ollama Fodder
reacted
to
mike-ravkine
's
post
with 🤯
2 days ago
measuring the information content of a reasoning trace seems like a straightforward reasoning LLM KPI, but how can we achieve this? what if we keep it simple: gzip the resulting text and take the length of the compressed stream... "compressed bytes of information per output token" becomes the KPI if we split across correct answers vs incorrect answers vs truncated answers and group by difficulty, a whole new world of analysis becomes not just possible but visually intuitive and almost trivial: 1) what is the model's overall reasoning efficiency? this is the slope of the scatterplot curve segments (there may be more then one..) 2) is the model able to apply more test-time compute towards more difficult variations of the task? the two on the left are not, the two on the right are. 3) when applying more test-time compute, is that compute useful? this is the curvature of the scatterplot trends - the two in the middle are 'losing their mojo' as answers get longer the information content falls down 4) is the model applying multiple approaches to the task? (right) do those approaches change with difficulty? 5) are truncations because we don't have enough context budget (left) or because the model has lost its mind and gone into a repeat loop (middle two) and does this happen across the board (middle left) or only when the problem is more difficult (middle right) would love to hear your guys feedback on this kind of analysis, is anyone doing similar work? this approach generates 12 plots per model (one for each task) so quite a bit of data and i've been hesitant to publish it so far, consider this post a toe tip.
updated
a collection
2 days ago
Ollama Fodder
View all activity
Organizations
None yet
brhodeland
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
2 models
20 days ago
zai-org/GLM-TTS
Text-to-Speech
•
Updated
19 days ago
•
270
byteshape/Qwen3-4B-Instruct-2507-GGUF
Text Generation
•
4B
•
Updated
24 days ago
•
8.99k
•
21
liked
a model
about 1 month ago
stepfun-ai/Step-Audio-R1
Audio-Text-to-Text
•
33B
•
Updated
Dec 2, 2025
•
815
•
136
liked
a Space
3 months ago
Running
297
GPU Poor LLM Arena
🏆
297
Compact LLM Battle Arena: Frugal AI Face-Off!