UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement
π **UniSE**: A Unified, Prompt-Free, Autoregressive Speech Enhancement Framework Based on Decoder-only Language Modelsπ Key Highlights:
- β Unified & Prompt-Free: Handles multiple tasks without explicit instruction.
- βοΈ Decoder-only AR-LM Backbone: Leverages LLM-style autoregressive generation for speech token prediction.
- π End-to-End Compatible: Integrates WavLM (feature extractor), BiCodec (discrete codec), and LM into one pipeline.
- π Multitask Support: SE, SR, TSE, SS, and more β all in a single model.
π Paper: arXiv:2510.20441 | π€ Model: [Hugging Face Spaces]https://huggingface.co/QuarkAudio/QuarkAudio-UniSE/)
π Supported Tasks
| Task | Full Name | Status | Description |
|---|---|---|---|
| SR | Speech Restoration | β Stable | General-purpose denoising and clarity improvemen (e.g., noise, reverb, packet loss) |
| TSE | Target Speaker Extraction | β Stable | Extract target speaker using reference enrollment audio |
| SS | Speech Separation | β Stable | Separate mixed speakers or sound sources |
| AEC | Acoustic Echo Cancellation | β³ Developing | Coming soon in next release |
π‘ Unlike traditional models requiring task-specific prompts or modules, UniSE autonomously infers the task type from input context β enabled by powerful LLM comprehension.
π― Quick Start: Run Inference in 3 Minutes
1. Clone Repository
git clone https://github.com/alibaba/unified-audio.git
cd QuarkAudio-UniSE
2. Create a Conda environment and install dependencies
conda create -n unise python=3.10
conda activate unise
pip install -r requirements.txt
3. Download Checkpoints
QuarkAudio-UniSE requires three additional WavLM and BiCodec pre-trained models and checkpoint of the middle LM on Huggingface to function properly. You can download three of them using the provided shell script:
cd checkpoints
bash download.sh
Additionally, download WavLM-Large.pt from this URL and put it at ./ckpt/WavLM-Large.pt .
Alternatively, you can download them manually and place them in the ./model/bicodec/ directory.
After Downloading, the tree should be like this:
Train
- Quick start
#!/bin/bash
python ./train.py --config conf/config.yaml
| Parameter | Description |
|---|---|
resume |
if want to resume, specify ckpt path |
simulation_config |
data simulate config |
speech_scp_path |
SCP of clean audio files |
noise_scp_path |
SCP of noise audio files |
rir_scp_path |
SCP of rir audio files |
mode |
Task type: se (Noise Suppression,Speech Restoration,Packet Loss Concealment), tse (Target Speaker Extraction), SS (Speech Separation). |
Inference
- Quick start
The main inference script is
test.py. The inference process consists of two stages:
- Extract hidden states from all WavLM layers and obtain a single representation by averaging them across layers.
- Use the language model (LM) to predict speech tokens, and then decode them into audio using BiCodec.
Running Inference
- Quick start
To run test.py, configure the parameters in
./conf/config.yaml:
| Parameter | Description |
|---|---|
ckpt_path |
pretrained weight |
enroll_duration |
Number of inference iterations. |
data_src_dir |
Directory of processed audio files directory. |
data_tgt_dir |
Directory of processed audio files directory. |
mode |
Task type: se (Noise Suppression,Speech Restoration,Packet Loss Concealment), se (Target Speaker Extraction), SS (Speech Separation). |
Command to run inference:
python test.py
Model Checkpoints
Our pretrained model is available on Hugging Face.
Hints
Our approach focuses on leveraging the LLM's comprehension capabilities to enable autonomous determination of task types, though this may exhibit instability in certain scenarios. A more stable and robust iteration will be released in the upcoming version.
Citation
@misc{yan2025uniseunifiedframeworkdecoderonly,
title={UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement},
author={Haoyin Yan and Chengwei Liu and Shaofei Xue and Xiaotao Liang and Zheng Xue},
year={2025},
eprint={2510.20441},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2510.20441},
}
Contact
For any questions, please contact: yanhaoyin.yhy@alibaba-inc.com
