Spaces:
Sleeping
Sleeping
| # π§ ToGMAL Prompt Difficulty Analyzer | |
| Real-time LLM capability boundary detection using vector similarity search. | |
| ## π― What This Does | |
| This system analyzes any prompt and tells you: | |
| 1. **How difficult it is** for current LLMs (based on real benchmark data) | |
| 2. **Why it's difficult** (shows similar benchmark questions) | |
| 3. **What to do about it** (actionable recommendations) | |
| ## π₯ Key Innovation | |
| Instead of clustering by domain (all math together), we cluster by **difficulty** - what's actually hard for LLMs regardless of domain. | |
| ## π Real Data | |
| - **14,042 MMLU questions** with real success rates from top models | |
| - **<50ms query time** for real-time analysis | |
| - **Production ready** vector database | |
| ## π Demo Links | |
| - **Local**: http://127.0.0.1:7860 | |
| - **Public**: https://99b38fc2e31da2f83d.gradio.live | |
| ## π§ͺ Example Results | |
| ### Hard Questions (Low Success Rates) | |
| ``` | |
| Prompt: "Statement 1 | Every field is also a ring..." | |
| Risk: HIGH (23.9% success) | |
| Recommendation: Multi-step reasoning with verification | |
| Prompt: "Find all zeros of polynomial xΒ³ + 2x + 2 in Zβ" | |
| Risk: MODERATE (43.8% success) | |
| Recommendation: Use chain-of-thought prompting | |
| ``` | |
| ### Easy Questions (High Success Rates) | |
| ``` | |
| Prompt: "What is 2 + 2?" | |
| Risk: MINIMAL (100% success) | |
| Recommendation: Standard LLM response adequate | |
| Prompt: "What is the capital of France?" | |
| Risk: MINIMAL (100% success) | |
| Recommendation: Standard LLM response adequate | |
| ``` | |
| ## π οΈ Technical Details | |
| ### Architecture | |
| ``` | |
| User Prompt β Embedding Model β Vector DB β K Nearest Questions β Weighted Score | |
| ``` | |
| ### Components | |
| 1. **Sentence Transformers** (all-MiniLM-L6-v2) for embeddings | |
| 2. **ChromaDB** for vector storage | |
| 3. **Real MMLU data** with success rates from top models | |
| 4. **Gradio** for web interface | |
| ## π Next Steps | |
| 1. Add more benchmark datasets (GPQA, MATH) | |
| 2. Fetch real per-question results from multiple top models | |
| 3. Integrate with ToGMAL MCP server for Claude Desktop | |
| 4. Deploy to HuggingFace Spaces for permanent hosting | |
| ## π Quick Start | |
| ```bash | |
| # Install dependencies | |
| uv pip install -r requirements.txt | |
| uv pip install gradio | |
| # Run the demo | |
| python demo_app.py | |
| ``` | |
| Visit http://127.0.0.1:7860 to use the web interface. |