YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement

🔊 **UniSE**: A Unified, Prompt-Free, Autoregressive Speech Enhancement Framework Based on Decoder-only Language Models

🚀 Key Highlights:

✅ Unified & Prompt-Free: Handles multiple tasks without explicit instruction.
⚙️ Decoder-only AR-LM Backbone: Leverages LLM-style autoregressive generation for speech token prediction.
🔄 End-to-End Compatible: Integrates WavLM (feature extractor), BiCodec (discrete codec), and LM into one pipeline.
🌍 Multitask Support: SE, SR, TSE, SS, and more — all in a single model.

📄 Paper: arXiv:2510.20441 | 🤗 Model: [Hugging Face Spaces]https://huggingface.co/QuarkAudio/QuarkAudio-UniSE/)

📋 Supported Tasks

Task	Full Name	Status	Description
SR	Speech Restoration	✅ Stable	General-purpose denoising and clarity improvemen (e.g., noise, reverb, packet loss)
TSE	Target Speaker Extraction	✅ Stable	Extract target speaker using reference enrollment audio
SS	Speech Separation	✅ Stable	Separate mixed speakers or sound sources
AEC	Acoustic Echo Cancellation	⏳ Developing	Coming soon in next release

💡 Unlike traditional models requiring task-specific prompts or modules, UniSE autonomously infers the task type from input context — enabled by powerful LLM comprehension.

🎯 Quick Start: Run Inference in 3 Minutes

1. Clone Repository

git clone https://github.com/alibaba/unified-audio.git
cd QuarkAudio-UniSE

2. Create a Conda environment and install dependencies

conda create -n unise python=3.10
conda activate unise
pip install -r requirements.txt

3. Download Checkpoints

QuarkAudio-UniSE requires three additional WavLM and BiCodec pre-trained models and checkpoint of the middle LM on Huggingface to function properly. You can download three of them using the provided shell script:

cd checkpoints
bash download.sh

Additionally, download WavLM-Large.pt from this URL and put it at ./ckpt/WavLM-Large.pt .

Alternatively, you can download them manually and place them in the ./model/bicodec/ directory.

After Downloading, the tree should be like this:

Train

Quick start

#!/bin/bash
python ./train.py --config conf/config.yaml

Parameter	Description
`resume`	if want to resume, specify ckpt path
`simulation_config`	data simulate config
`speech_scp_path`	SCP of clean audio files
`noise_scp_path`	SCP of noise audio files
`rir_scp_path`	SCP of rir audio files
`mode`	Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `tse` (Target Speaker Extraction), `SS` (Speech Separation).

Inference

Quick start The main inference script is test.py. The inference process consists of two stages:

Extract hidden states from all WavLM layers and obtain a single representation by averaging them across layers.
Use the language model (LM) to predict speech tokens, and then decode them into audio using BiCodec.

Running Inference

Quick start To run test.py, configure the parameters in ./conf/config.yaml:

Parameter	Description
`ckpt_path`	pretrained weight
`enroll_duration`	Number of inference iterations.
`data_src_dir`	Directory of processed audio files directory.
`data_tgt_dir`	Directory of processed audio files directory.
`mode`	Task type: `se` (Noise Suppression,Speech Restoration,Packet Loss Concealment), `se` (Target Speaker Extraction), `SS` (Speech Separation).

Command to run inference:

python test.py

Model Checkpoints

Our pretrained model is available on Hugging Face.

Hints

Our approach focuses on leveraging the LLM's comprehension capabilities to enable autonomous determination of task types, though this may exhibit instability in certain scenarios. A more stable and robust iteration will be released in the upcoming version.

Citation

@misc{yan2025uniseunifiedframeworkdecoderonly,
      title={UniSE: A Unified Framework for Decoder-only Autoregressive LM-based Speech Enhancement}, 
      author={Haoyin Yan and Chengwei Liu and Shaofei Xue and Xiaotao Liang and Zheng Xue},
      year={2025},
      eprint={2510.20441},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2510.20441}, 
}

Contact

For any questions, please contact: yanhaoyin.yhy@alibaba-inc.com

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support