JITServe QRF Length Predictor

This repository provides the pretrained QRF (Quantile Regression Forest) length predictor used by JITServe (NSDI’26) to estimate conservative upper bounds on LLM output lengths.

This predictor is:

  • Not an LLM evaluation model
  • Not fine-tuned during inference
  • A lightweight offline-trained prediction model used solely for scheduling decisions

It is released to ensure full reproducibility of the JITServe artifact.


What Is Included

This repository contains two components that must be used together:

qrf_model/
  β”œβ”€β”€ 0_qrf_lmsys_chat_llama3_8b.pkl
  └── 0_qrf_lmsys_chat_qwen25_7b.pkl

qrf_vectorizer/
  β”œβ”€β”€ 0_qrf_lmsys_chat_llama3_8b.pkl
  └── 0_qrf_lmsys_chat_qwen25_7b.pkl

Usage

These artifacts are consumed by JITServe at runtime.

Expected directory layout in the JITServe artifact:

assets/qrf/
β”œβ”€β”€ qrf_model/
└── qrf_vectorizer/

After downloading this repository, place its contents under the path above.

JITServe loads the predictor automatically during startup and does not require any additional configuration by default.

Citation

If you use these artifacts, please consider to cite our paper:

@misc{zhang2025jitservesloawarellmserving,
      title={JITServe: SLO-aware LLM Serving with Imprecise Request Information}, 
      author={Wei Zhang and Zhiyu Wu and Yi Mu and Rui Ning and Banruo Liu and Nikhil Sarda and Myungjin Lee and Fan Lai},
      year={2025},
      eprint={2504.20068},
      archivePrefix={arXiv},
      primaryClass={cs.DC},
      url={https://arxiv.org/abs/2504.20068}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support