AI & ML interests

None defined yet.

Recent Activity

Aurelien-Morgan 
posted an update about 22 hours ago
prithivMLmods 
posted an update 3 days ago
view post
Post
2119
Introducing the D.Markdown Experimental Models, Proxima and Epsilon OCR models, built on top of Qwen3-VL and Qwen2.5-VL respectively. Proxima is optimized for Markdown generation and is capable of embedding inline programming code snippets and generating rich nodes such as HTML, XML, JSON, and YAML. Epsilon is optimized for reconstructing complex layouts including tables, forms, and mathematical content. 🌌✨

● proxima-ocr-d.markdown-post3.0.l: prithivMLmods/proxima-ocr-d.markdown-post3.0.l
● epsilon-ocr-d.markdown-post3.0.m: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m
● proxima-ocr-d.markdown-post3.0.l-gguf: prithivMLmods/proxima-ocr-d.markdown-post3.0.l-GGUF
● epsilon-ocr-d.markdown-post3.0.m-gguf: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m-GGUF

● Collection: https://huggingface.co/collections/prithivMLmods/dynamic-markdowns
● Multimodal Apps: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

👉 These models are stage progression models, and currently they may contain artifacts.

To know more about it, visit the app page or the respective model page!
prithivMLmods 
posted an update 4 days ago
view post
Post
1046
Try CUA GUI Operator 🖥️ Space, the demo of some interesting multimodal ultra-compact Computer Use Agent (CUA) models in a single app, including Fara-7B, UI-TARS-1.5-7B, and Holo models, to perform GUI localization tasks.

● CUA-GUI-Operator [Demo]: prithivMLmods/CUA-GUI-Operator
● Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

Other related multimodal spaces

● Qwen3-VL: prithivMLmods/Qwen3-VL-HF-Demo
● Multimodal-VLM-v1.0: prithivMLmods/Multimodal-VLM-v1.0
● Vision-to-VibeVoice-en: prithivMLmods/Vision-to-VibeVoice-en

I have planned to add Chrome sandboxes to streamline it and turn it into a browser based CUA multimodal tool, which will be added to the same space soon.

To know more about it, visit the app page or the respective model page!
  • 1 reply
·
ZennyKenny 
posted an update 5 days ago
view post
Post
183
What a trip. Just walked through @burtenshaw and @evalstate tutorial on adding Hugging Face Skills to your Claude Code agent so you can fine tune LLMs by chatting with AI.

These are the kinds of innovations that are going to help everyone benefit from the power of Artificial Intelligence. Well done gentlemen and thank you for sharing.
  • 1 reply
·
prithivMLmods 
posted an update 6 days ago
view post
Post
3508
One speech model with seven voices, streamlined with multimodal capabilities for vision tasks. Performs vision(image-text) to audio inference with Qwen2.5-VL + VibeVoice-Realtime-0.5B. Vision to VibeVoice (EN) - The demo is live. 🗣️🔥

🤗 Vision-to-VibeVoice-en [Demo]: prithivMLmods/Vision-to-VibeVoice-en
✨ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ Speech [VibeVoice-Realtime-0.5B]: microsoft/VibeVoice-Realtime-0.5B
✨ Vision [Qwen2.5-VL]: Qwen/Qwen2.5-VL-7B-Instruct

To know more about it, visit the app page or the respective model page!
·
prithivMLmods 
posted an update 10 days ago
view post
Post
3677
Hello everyone,

The strangerzonehf [HF] Community / Organization Page, which is maintained by me, has reached the Top 10 Developer Pages ranking at 6th place, contributing 3.4% in the calendar cycle from August 2024 to August 2025. It is also the only South Asia / Indian page in the list. I could not be more proud to be doing things for the community. ❤️🤗

Source: https://www.dataprovenance.org/economies-of-open-intelligence.pdf

It is a pleasure to be a part of it.
Thank you!
@prithivMLmods
ZennyKenny 
posted an update 11 days ago
view post
Post
244
😐 I keep seeing takes on LinkedIn from American business influencers melting down about Silicon Valley startup "dependence" on open-source Chinese models.

🤔 Can anyone describe a credible scenario where these models can be leveraged by the Chinese government to endanger American security interests or am I right to believe that this is just Red Scare nonsense?
  • 2 replies
·
prithivMLmods 
posted an update 14 days ago
view post
Post
10625
Introducing the Super-OCRs Demo, a comparison of state-of-the-art multimodal OCR VLMs, including HunyuanOCR, DeepSeekOCR, Dots, and Nanonets in one space for performing OCR, rendering LaTeX and Markdown, and visual grounding (layout). Find the related Spaces and models below.🤗🔥

✨Super-OCRs[Demo]: prithivMLmods/Super-OCRs-Demo
✨Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨GitHub: https://github.com/PRITHIVSAKTHIUR/Super-OCRs-Demo

⭐ Models Used:
✦ HunyuanOCR: tencent/HunyuanOCR
✦ DeepSeek-OCR: (-) deepseek-ai/DeepSeek-OCR (+) prithivMLmods/DeepSeek-OCR-Latest-BF16.I64
✦ Dots.OCR: (-) rednote-hilab/dots.ocr (+) prithivMLmods/Dots.OCR-Latest-BF16
✦ Nanonets-OCR2-3B: nanonets/Nanonets-OCR2-3B

⭐ Some Other Relevant Apps:
✦ Qwen3-VL-HF-Demo: prithivMLmods/Qwen3-VL-HF-Demo
✦ Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
✦ Multimodal-OCR: prithivMLmods/Multimodal-OCR
✦ Multimodal-OCR2: prithivMLmods/Multimodal-OCR2
✦ Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
✦ DeepSeek-OCR-experimental: prithivMLmods/DeepSeek-OCR-experimental

To know more about it, visit the app page or the respective model page!
Nymbo 
posted an update 15 days ago
view post
Post
4738
🚀 I've just shipped a major update to the Nymbo/Tools MCP server: the Agent_Terminal, a single "master tool" that cuts token usage by over 90%!

Anthropic found 98.7% context savings using code execution with MCP, Cloudflare published similar findings. This is my open-source implementation of the same idea.

# The Problem

Traditional MCP exposes every tool definition directly to the model. With 12 tools, that's thousands of tokens consumed *before the conversation even starts*. Each tool call also passes intermediate results through the context window — a 10,000-row spreadsheet? That's all going into context just to sum a column.

# The Solution: One Tool to Rule Them All

Agent_Terminal wraps all 12 tools (Web_Search, Web_Fetch, File_System, Generate_Image, Generate_Speech, Generate_Video, Deep_Research, Memory_Manager, Obsidian_Vault, Shell_Command, Code_Interpreter) into a single Python code execution gateway.

Instead of the model making individual tool calls, it writes Python code that orchestrates the tools directly:

# Search for Bitcoin price
result = Web_Search("current price of bitcoin", max_results=3)
print(result)


Don't know what tools are available? The agent can discover them at runtime:

print(search_tools('image'))  # Find tools by keyword
print(usage('Generate_Image'))  # Get full docs for a specific tool


The individual direct tool calls are all still there, but they can be disabled if using the Agent_Terminal. Try it now - https://www.nymbo.net/nymbot
  • 1 reply
·
prithivMLmods 
posted an update 18 days ago
view post
Post
3202
Introducing the advanced sketch-board editor "Nano-Banana-Pro-Sketch-Board" powered by the Gemini 2.5 Flash Image and Gemini 3 Pro Preview Image models through the Gemini API. This version includes more features than the Nano-Banana-AIO app for drawing and prompt-based concept transformation of freestyle sketches. 🔥🍌

✨Nano-Banana-Pro-Sketch-Board: prithivMLmods/Nano-Banana-Pro-Sketch-Board
✨Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
✨Github: https://github.com/PRITHIVSAKTHIUR/Nano-Banana-Pro-Sketch-Board
✨Model-Garden: https://tinyurl.com/4xxs9dvy

Some Other Relevant Apps [OSS]

⭐Qwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
⭐Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
⭐Photo-Mate-i2i: prithivMLmods/Photo-Mate-i2i
⭐Kontext-Photo-Mate-v2: https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2

Note: The Nano-Banana-Pro-Sketch-Board demo requires a Gemini API key for the editing process. Your API key will be removed when the app is reloaded or closed. Your key remains safe and will not be exposed to any medium. Also, the Gemini 3 Pro Preview Image model may require a paid API key from a Google Cloud project with billing enabled.

To know more about it, visit the app info section or the respective Model Garden page!
prithivMLmods 
posted an update 19 days ago
view post
Post
1307
Try the demo of NVIDIA Nemotron Parse v1.1, NVIDIA's latest VLM for understanding document semantics and extracting text and table elements with spatial grounding. It is capable of comprehensive text understanding and document structure analysis in a given document, and can provide bounding boxes with coordinates.

⭐Space[Demo]: prithivMLmods/NVIDIA-Nemotron-Parse-OCR
⭐Model: nvidia/NVIDIA-Nemotron-Parse-v1.1
⭐Multimodal-Spaces: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

Some relevant Spaces

⭐DeepSeek-OCR-experimental [latest transformers]: prithivMLmods/DeepSeek-OCR-experimental
⭐Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
⭐Multimodal-OCR3: prithivMLmods/Multimodal-OCR3

Check out the other spaces in the multimodal implementation collection.

To know more about it, visit the app page or the respective model page!
ZennyKenny 
posted an update 21 days ago
view post
Post
421
The #feedback channel of app early access Slack Workspaces is some of the best unintentional comedy material I have ever come across tbh.
prithivMLmods 
posted an update 22 days ago
view post
Post
1488
Try the all-new trending Qwen-Image-Edit-2509 (Multi-Image-Edits) specialized adapter demos, including Cloth-Design-Fuse, Texture Edit, Guided-Objects-Patching, and more — all in a single Hugging Face Space. The demo link is provided below. 🤗🔥

⮞ Space[Demo]: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
⮞ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
⮞ Base Model: Qwen/Qwen-Image-Edit-2509

Similar applications↗️

⮞ Kontext-Photo-Mate-v2: https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2
⮞ Photo-Mate-i2i: prithivMLmods/Photo-Mate-i2i
⮞ Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast

To know more about it, visit the app page or the respective model page!
prithivMLmods 
posted an update 23 days ago
view post
Post
3509
Made a demo for multimodal understanding of Qwen3-VL space for tasks including point annotation, detection, captioning, guided text inferences, and more. Find the demo link below. 🤗↗️

⮞ Space[Demo]: prithivMLmods/Qwen3-VL-HF-Demo
⮞ Model Used: Qwen/Qwen3-VL-4B-Instruct
⮞ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
⮞ GitHub: https://github.com/PRITHIVSAKTHIUR/Qwen-3VL-Multimodal-Understanding

To know more about it, visit the app page or the respective model page!
ZennyKenny 
posted an update 23 days ago
view post
Post
3144
🎉 Wow. Congratulations @bfirsh and the Replicate team on the CloudFlare acquisition!

✌️ You've really built an incredible ecosystem and product offering and should be super proud.
prithivMLmods 
posted an update 26 days ago
view post
Post
3745
Made a small write up and experimental finetuning guide for MetaCLIP2 for Image Classification on Downstream Tasks. The blog titled Fine Tuning MetaCLIP 2 for Image Classification on Downstream Tasks demonstrates the step by step finetuning using CIFAR10 and is also flexible for adapting to other datasets. For more details, check out the linked blog below. 🤗↗️

⮞ Blog Article: https://huggingface.co/blog/prithivMLmods/metaclip2-downstream-finetune
⮞ Demo Space[Zero-Shot Classification]: prithivMLmods/metaclip-2-demo

Some other models
╰› MetaCLIP-2-Cifar10: prithivMLmods/MetaCLIP-2-Cifar10
╰› MetaCLIP-2-Age-Range-Estimator: prithivMLmods/MetaCLIP-2-Age-Range-Estimator
╰› MetaCLIP-2-Gender-Identifier: prithivMLmods/MetaCLIP-2-Gender-Identifier
╰› MetaCLIP-2-Open-Scene: prithivMLmods/MetaCLIP-2-Open-Scene

⮞ Collection: https://huggingface.co/collections/prithivMLmods/metaclip2-image-classification-experiments

To know more about it, visit the app page or the respective model page!
prithivMLmods 
posted an update 29 days ago
view post
Post
3274
Try the all-new trending Qwen-Image-Edit specialized adapter demos, including Photo-to-Anime, Light Restoration, Multi-Angle Edits, Relighting, and more — all in a single Hugging Face Space. Below is the demo link. 🤗🌠

⮞ Demo-Space: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
⮞ How-to-Use: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast#2
⮞ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!
·
ZennyKenny 
posted an update about 1 month ago
view post
Post
330
🎉 Novoyaz is live.

A few months ago, I built a quick POC in Hugging Face that used a fine-tuned variant of OpenAI's OSS-20B model that I trained to convert the text from pre-reform Russian-language documents into modern Russian orthography.

⚡️ This morning, I launched novoyaz.io.

This is a production app, the frontend for which I built in like two hours with Lovable, that uses that same fine-tuned model for transliteration, but now has a bunch of extra features that make using it even easier (like taking and uploading pictures with your on-device camera for example 😅).

👉 If you're a researcher, or know a researcher, for whom this app will improve their day-to-day workflows, please get in touch with me.
prithivMLmods 
posted an update about 1 month ago
view post
Post
2871
Introducing Photo-Mate-v2, based on FLUX.1-Kontext-dev, for advanced image manipulation tasks. It supports transforming scenes into top-down/bottom-up perspectives, CAM-right/left-view and its reverse, as well as general kontext-specified object removal. Below is the list of demos and adapters.🔥🤗

➤ Spaces [Demo] : https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2

Kontext-Adapters :
✦ Kontext-Bottom-Up-View: prithivMLmods/Kontext-Bottom-Up-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Top-Down-View: prithivMLmods/Kontext-Top-Down-View
✦ Kontext-CAM-Left-View: prithivMLmods/Kontext-CAM-Left-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Unblur-Upscale: prithivMLmods/Kontext-Unblur-Upscale
✦ Kontext-0811-exp: prithivMLmods/Kontext-0811-exp

Photo-Mate Collection:
✦ Kontext CAM Angles: https://huggingface.co/collections/prithivMLmods/kontext-cam-angles
✦ i2i - Kontext (exp): https://huggingface.co/collections/prithivMLmods/i2i-kontext-exp
✦ LZO-1 (Lossless Zoom Operator): https://huggingface.co/collections/prithivMLmods/lzo-1-lossless-zoom-operator

Related-Apps:
✦ Photo-Mate [Version 1.0]: prithivMLmods/Photo-Mate-i2i
✦ Image Generation Apps [Collection]: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!
@prithivMLmods