John Smith PRO
John6666
AI & ML interests
None yet
Recent Activity
reacted
to
sergiopaniego's
post
with π€
about 1 hour ago
We just released TRL v0.26.0!
It comes packed with updates:
> Agent training with tools in GRPO
> New CISPO & SAPO losses + reasoning rewards
> vLLM quantization in colocate mode
> Dataset shuffling in SFT
> Lots of NEW examples
> Tons of fixes and documentation improvements
updated
a dataset
about 16 hours ago
John6666/forum3
updated
a dataset
about 16 hours ago
John6666/knowledge_base_md_for_rag_1
Organizations
reacted to
sergiopaniego's
post with π€
about 1 hour ago
reacted to
NicoBBQ1's
post with π
about 16 hours ago
Post
707
What do you think of my LLM Chat app so far?
Here are some of the features already included (and more are coming):
- Chat with AI models β Local inference via Ollama
- Reasoning support β View model thinking process (DeepSeek-R1, Qwen-QwQ, etc.)
- Vision models β Analyze images with llava, bakllava, moondream
- Image generation β Local GGUF models with GPU acceleration (CUDA)
- Fullscreen images β Click generated images to view in fullscreen
- Image attachments β File picker or clipboard paste (Ctrl+V)
- DeepSearch β Web search with tool use
- Inference Stats β Token counts, speed, duration (like Ollama verbose)
- Regenerate β Re-run any AI response
- Copy β One-click copy AI responses
Here are some of the features already included (and more are coming):
- Chat with AI models β Local inference via Ollama
- Reasoning support β View model thinking process (DeepSeek-R1, Qwen-QwQ, etc.)
- Vision models β Analyze images with llava, bakllava, moondream
- Image generation β Local GGUF models with GPU acceleration (CUDA)
- Fullscreen images β Click generated images to view in fullscreen
- Image attachments β File picker or clipboard paste (Ctrl+V)
- DeepSearch β Web search with tool use
- Inference Stats β Token counts, speed, duration (like Ollama verbose)
- Regenerate β Re-run any AI response
- Copy β One-click copy AI responses
upvoted
a
collection
about 16 hours ago
reacted to
leonardlin's
post with π₯
about 16 hours ago
Post
564
We just released our latest Shisa V2.1 Japanese multi-lingual models: https://huggingface.co/collections/shisa-ai/shisa-v21
Besides updates to our 14B, and 70B, we have a new LFM2-based 1.2B, Llama 3.2-based 3B, and Qwen 3-based 8B, all with class-leading Japanese language capabilities.
Per usual, lots of details in the Model Cards for those interested.
Besides updates to our 14B, and 70B, we have a new LFM2-based 1.2B, Llama 3.2-based 3B, and Qwen 3-based 8B, all with class-leading Japanese language capabilities.
Per usual, lots of details in the Model Cards for those interested.
upvoted
an
article
about 16 hours ago
Article
I Built a RAG System That Listens to Live BBC News and Answers Questions About "What Happened 10 Minutes Ago"
β’
7
reacted to
RakshitAralimatti's
post with π₯
about 16 hours ago
Post
1684
I built something crazy you never saw before.
Please check - https://huggingface.co/blog/RakshitAralimatti/streaming-data-rag
A real-time Streaming Data to RAG system that listens to live radio, transcribes it on-the-fly, and lets you query across TIME.
Not just "what was discussed" β but "what happened in the last 10 minutes on channel 0?" or "at 9 AM, what was the breaking news?" This is RAG that understands temporal context.
Please check - https://huggingface.co/blog/RakshitAralimatti/streaming-data-rag
A real-time Streaming Data to RAG system that listens to live radio, transcribes it on-the-fly, and lets you query across TIME.
Not just "what was discussed" β but "what happened in the last 10 minutes on channel 0?" or "at 9 AM, what was the breaking news?" This is RAG that understands temporal context.
reacted to
StJohnDeakins's
post with π
about 16 hours ago
Post
1134
Hey all π
A Quick one for any founders building with Small Language Models in mobile apps: Weβre opening 10 Innovation Partner spots this month for our Device Native AI (DNA) platform.
What you get:
- Device Native AI SDK β (AI processes data on-device, not cloud π²)
- 99% off for 3 months, then 90% off for the rest of the year (no lock-in)
- Direct engineering access + feature releases
- It's an Innovation community, so at least some participation is required
Perfect if you're building consumer apps and want:
β Hyper-personalization without privacy risks
β Zero cloud AI token costs
β Early access to next-gen mobile AI
Limited spots, and on a first-come basis, so DM me "DNA" for more info and an access code. Cheers Singe π΅
A Quick one for any founders building with Small Language Models in mobile apps: Weβre opening 10 Innovation Partner spots this month for our Device Native AI (DNA) platform.
What you get:
- Device Native AI SDK β (AI processes data on-device, not cloud π²)
- 99% off for 3 months, then 90% off for the rest of the year (no lock-in)
- Direct engineering access + feature releases
- It's an Innovation community, so at least some participation is required
Perfect if you're building consumer apps and want:
β Hyper-personalization without privacy risks
β Zero cloud AI token costs
β Early access to next-gen mobile AI
Limited spots, and on a first-come basis, so DM me "DNA" for more info and an access code. Cheers Singe π΅
reacted to
Teen-Different's
post with π
about 16 hours ago
Post
793
Interesting... looked into Apple's DiffuCoder and the masked diffusion approach is actually hitting SOTA parity... basicallly proving global MDLM can work for code https://arxiv.org/pdf/2506.20639
but then you look at Tiny-A2D results and itβs the complete opposite...BD3LM (block diffusion) totally outperforms MDLM... and then both MDLM and BD3LM models struggle hard compared to the AR baselines... https://github.com/ZHZisZZ/dllm/tree/main/examples/a2d
digging into the why and i think it comes down to the adaptation method....tiny-A2D just SFTβd an AR model adaption to force it into diffusion... asking a model wired for left to right causal attention to suddenly think bidirectionally is a massive shock... it struggles to unlearn that strong AR inductive bias
...that explains why BD3LM worked better in their case... since it generates in chunks it preserves some sequential order... acts like a bridge or crutch that feels more natural to the original Qwen weights
contrast that with Apple... they didn't just SFT...they pre-trained/adapted on 130B tokens... fundamentally rewiring the model to understand global dependencies from the ground up
my theory is if we want MDLM to actually work we canβt just SFT... we need that heavy adaptation or full pre-training phase to break the causal priors... otherwise the model just gets confused
but then you look at Tiny-A2D results and itβs the complete opposite...BD3LM (block diffusion) totally outperforms MDLM... and then both MDLM and BD3LM models struggle hard compared to the AR baselines... https://github.com/ZHZisZZ/dllm/tree/main/examples/a2d
digging into the why and i think it comes down to the adaptation method....tiny-A2D just SFTβd an AR model adaption to force it into diffusion... asking a model wired for left to right causal attention to suddenly think bidirectionally is a massive shock... it struggles to unlearn that strong AR inductive bias
...that explains why BD3LM worked better in their case... since it generates in chunks it preserves some sequential order... acts like a bridge or crutch that feels more natural to the original Qwen weights
contrast that with Apple... they didn't just SFT...they pre-trained/adapted on 130B tokens... fundamentally rewiring the model to understand global dependencies from the ground up
my theory is if we want MDLM to actually work we canβt just SFT... we need that heavy adaptation or full pre-training phase to break the causal priors... otherwise the model just gets confused
reacted to
dhruv3006's
post with π
about 16 hours ago
Post
1187
Switching between API Client, browser, and API documentation tools to test and document APIs can harm your flow and leave your docs outdated.
This is what usually happens: While debugging an API in the middle of a sprint, the API Client says that everything's fine, but the docs still show an old version.
So you jump back to the code, find the updated response schema, then go back to the API Client, which gets stuck, forcing you to rerun the tests.
Hours can go by just trying to sync all this up (and thatβs if you catch the inconsistencies at all).
The reason? Using disconnected tools for specs, tests, and docs. Doing manual updates, stale docs, and a lot of context switching.
Voiden takes a different approach: Puts specs, tests & docs all in one Markdown file, stored right in the repo.
Everything stays in sync, versioned with Git, and updated in one place, inside your editor.
Download Voiden here: https://voiden.md/download
This is what usually happens: While debugging an API in the middle of a sprint, the API Client says that everything's fine, but the docs still show an old version.
So you jump back to the code, find the updated response schema, then go back to the API Client, which gets stuck, forcing you to rerun the tests.
Hours can go by just trying to sync all this up (and thatβs if you catch the inconsistencies at all).
The reason? Using disconnected tools for specs, tests, and docs. Doing manual updates, stale docs, and a lot of context switching.
Voiden takes a different approach: Puts specs, tests & docs all in one Markdown file, stored right in the repo.
Everything stays in sync, versioned with Git, and updated in one place, inside your editor.
Download Voiden here: https://voiden.md/download
reacted to
melvindave's
post with π
about 16 hours ago
Post
1414
Currently having a blast learning the transformers library.
I noticed that model cards usually have Transformers code as usage examples.
So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp.
Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies.
Once I got the model loaded and sample inference working, I made an API to serve it.
I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working!
Model used: Qwen/Qwen3-VL-8B-Instruct
GPU: NVIDIA GeForce RTX 3090
Here's the result of my experimentation
I noticed that model cards usually have Transformers code as usage examples.
So I tried to figure out how to load a model just using the transformers library without using ollama, lmstudio, or llamacpp.
Learned how to install dependencies required to make it work like pytorch and CUDA. I also used Conda for python environment dependencies.
Once I got the model loaded and sample inference working, I made an API to serve it.
I know it's very basic stuff for machine learning experts here in HF but I'm completely new to this so I'm happy to get it working!
Model used: Qwen/Qwen3-VL-8B-Instruct
GPU: NVIDIA GeForce RTX 3090
Here's the result of my experimentation
Getting this error
π
1
7
#21 opened 7 months ago
by
LostInTimee3