JITServe QRF Length Predictor
This repository provides the pretrained QRF (Quantile Regression Forest) length predictor used by JITServe (NSDIβ26) to estimate conservative upper bounds on LLM output lengths.
This predictor is:
- Not an LLM evaluation model
- Not fine-tuned during inference
- A lightweight offline-trained prediction model used solely for scheduling decisions
It is released to ensure full reproducibility of the JITServe artifact.
What Is Included
This repository contains two components that must be used together:
qrf_model/
βββ 0_qrf_lmsys_chat_llama3_8b.pkl
βββ 0_qrf_lmsys_chat_qwen25_7b.pkl
qrf_vectorizer/
βββ 0_qrf_lmsys_chat_llama3_8b.pkl
βββ 0_qrf_lmsys_chat_qwen25_7b.pkl
Usage
These artifacts are consumed by JITServe at runtime.
Expected directory layout in the JITServe artifact:
assets/qrf/
βββ qrf_model/
βββ qrf_vectorizer/
After downloading this repository, place its contents under the path above.
JITServe loads the predictor automatically during startup and does not require any additional configuration by default.
Citation
If you use these artifacts, please consider to cite our paper:
@misc{zhang2025jitservesloawarellmserving,
title={JITServe: SLO-aware LLM Serving with Imprecise Request Information},
author={Wei Zhang and Zhiyu Wu and Yi Mu and Rui Ning and Banruo Liu and Nikhil Sarda and Myungjin Lee and Fan Lai},
year={2025},
eprint={2504.20068},
archivePrefix={arXiv},
primaryClass={cs.DC},
url={https://arxiv.org/abs/2504.20068},
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support