YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement

Paper GitHub Hugging Face ModelScope

πŸ”Š **UniSE**: A Unified, Prompt-Free, Autoregressive Speech Enhancement Framework Based on Decoder-only Language Models

πŸš€ Key Highlights:

  • βœ… Unified & Prompt-Free: Handles multiple tasks without explicit instruction.
  • βš™οΈ Decoder-only AR-LM Backbone: Leverages LLM-style autoregressive generation for speech token prediction.
  • πŸ”„ End-to-End Compatible: Integrates WavLM (feature extractor), BiCodec (discrete codec), and LM into one pipeline.
  • 🌍 Multitask Support: SE, SR, TSE, SS, and more β€” all in a single model.

πŸ“„ Paper: arXiv:2510.20441 | πŸ€— Model: [Hugging Face Spaces]https://huggingface.co/QuarkAudio/QuarkAudio-UniSE/)


πŸ“‹ Supported Tasks

Task Full Name Status Description
SR Speech Restoration βœ… Stable General-purpose denoising and clarity improvemen (e.g., noise, reverb, packet loss)
TSE Target Speaker Extraction βœ… Stable Extract target speaker using reference enrollment audio
SS Speech Separation βœ… Stable Separate mixed speakers or sound sources
AEC Acoustic Echo Cancellation ⏳ Developing Coming soon in next release

πŸ’‘ Unlike traditional models requiring task-specific prompts or modules, UniSE autonomously infers the task type from input context β€” enabled by powerful LLM comprehension.


🎯 Quick Start: Run Inference in 3 Minutes

1. Clone Repository

git clone https://github.com/alibaba/unified-audio.git
cd QuarkAudio-UniSE

2. Create a Conda environment and install dependencies

conda create -n unise python=3.10
conda activate unise
pip install -r requirements.txt

3. Download Checkpoints

QuarkAudio-UniSE requires three additional WavLM and BiCodec pre-trained models and checkpoint of the middle LM on Huggingface to function properly. You can download three of them using the provided shell script:

cd checkpoints
bash download.sh

Additionally, download WavLM-Large.pt from this URL and put it at ./ckpt/WavLM-Large.pt .

Alternatively, you can download them manually and place them in the ./model/bicodec/ directory.

After Downloading, the tree should be like this:

Train

  • Quick start
#!/bin/bash
python ./train.py --config conf/config.yaml
Parameter Description
resume if want to resume, specify ckpt path
simulation_config data simulate config
speech_scp_path SCP of clean audio files
noise_scp_path SCP of noise audio files
rir_scp_path SCP of rir audio files
mode Task type: se (Noise Suppression,Speech Restoration,Packet Loss Concealment), tse (Target Speaker Extraction), SS (Speech Separation).

Inference

  • Quick start The main inference script is test.py. The inference process consists of two stages:
  1. Extract hidden states from all WavLM layers and obtain a single representation by averaging them across layers.
  2. Use the language model (LM) to predict speech tokens, and then decode them into audio using BiCodec.

Running Inference

  • Quick start To run test.py, configure the parameters in ./conf/config.yaml:
Parameter Description
ckpt_path pretrained weight
enroll_duration Number of inference iterations.
data_src_dir Directory of processed audio files directory.
data_tgt_dir Directory of processed audio files directory.
mode Task type: se (Noise Suppression,Speech Restoration,Packet Loss Concealment), se (Target Speaker Extraction), SS (Speech Separation).

Command to run inference:

python test.py

Model Checkpoints

Our pretrained model is available on Hugging Face.

Hints

Our approach focuses on leveraging the LLM's comprehension capabilities to enable autonomous determination of task types, though this may exhibit instability in certain scenarios. A more stable and robust iteration will be released in the upcoming version.

Citation

@misc{yan2025uniseunifiedframeworkdecoderonly,
      title={UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement}, 
      author={Haoyin Yan and Chengwei Liu and Shaofei Xue and Xiaotao Liang and Zheng Xue},
      year={2025},
      eprint={2510.20441},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2510.20441}, 
}

Contact

For any questions, please contact: yanhaoyin.yhy@alibaba-inc.com

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support