ICRM (ICRM)

lancerts

authored 3 papers 3 months ago

Liger Kernel: Efficient Triton Kernels for LLM Training

Paper • 2410.10989 • Published Oct 14, 2024 • 1

AlphaPO -- Reward shape matters for LLM alignment

Paper • 2501.03884 • Published Jan 7, 2025 • 2

LLaDA-MedV: Exploring Large Language Diffusion Models for Biomedical Image Understanding

Paper • 2508.01617 • Published Aug 3, 2025

jasonzwang

authored 2 papers 3 months ago

LLaDA-MedV: Exploring Large Language Diffusion Models for Biomedical Image Understanding

Paper • 2508.01617 • Published Aug 3, 2025

Local2Global query Alignment for Video Instance Segmentation

Paper • 2507.20120 • Published Jul 27, 2025

lancerts

authored a paper 3 months ago

Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction

Paper • 2509.12464 • Published Sep 15, 2025

jasonzwang

authored 2 papers 3 months ago

Reasoning Models Can be Accurately Pruned Via Chain-of-Thought Reconstruction

Paper • 2509.12464 • Published Sep 15, 2025

Debunk the Myth of SFT Generalization

Paper • 2510.00237 • Published Sep 30, 2025 • 2

lancerts

authored a paper 3 months ago

Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Paper • 2509.25779 • Published Sep 30, 2025 • 18

jasonzwang

authored a paper 3 months ago

Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs

Paper • 2509.25779 • Published Sep 30, 2025 • 18

JW17

updated a Space 3 months ago

README

🚀

JW17

published a Space 3 months ago

README

🚀

JW17

updated a dataset 3 months ago

ICRM/skpv2-icrm-v0.1

Viewer • Updated Sep 29, 2025 • 77k • 14

JW17

published a dataset 3 months ago

ICRM/skpv2-icrm-v0.1

Viewer • Updated Sep 29, 2025 • 77k • 14

JW17

authored 2 papers 6 months ago

AlphaPO -- Reward shape matters for LLM alignment

Paper • 2501.03884 • Published Jan 7, 2025 • 2

Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning

Paper • 2504.03380 • Published Apr 4, 2025

JW17

authored a paper 8 months ago

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Paper • 2505.11855 • Published May 17, 2025 • 10

JW17

authored 2 papers about 1 year ago

Stable Language Model Pre-training by Reducing Embedding Variability

Paper • 2409.07787 • Published Sep 12, 2024

Cross-lingual Transfer of Reward Models in Multilingual Alignment

Paper • 2410.18027 • Published Oct 23, 2024

JW17

authored a paper over 1 year ago

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Paper • 2406.06424 • Published Jun 10, 2024 • 15

AI & ML interests

Team members 4

ICRM's activity

README

README