QuarkAudio-HCodec-1.5: A Unified Discrete Audio Tokenizer with adaptive frame rate for High-Fidelity, Multitask Audio Generation

🎯 Quick Start: Run Inference in 3 Minutes

1. Installation

Install dependencies from requirement.txt via pypi
Download pretrained weights from Huggingface 🤗: QuarkAudio/HCodec-1.5-adaptive and save them to ./checkpoints/
confirm the ckpt_path in file conf/config_adaptive_v3.yaml is valid

2. Clone Repository

git clone https://github.com/alibaba/unified-audio.git
cd QuarkAudio-HCodec

3. Create a Conda environment and install dependencies

conda create -n unise python=3.10
conda activate unise
pip install -r requirements.txt

4. Tokenizer

#!/bin/bash

python audio_tokenizer.py

5. Optional configuration

Customize your testing options about adaptive frame rate

  # hyperparameter configuration in conf/config_adaptive_v3.yaml

  training: false # keep false when testing
  use_similarity_alignment: true
  use_dynamic_similarity_threshold: false
  infer_using_dynamic_threshold: true # work when manual_threshold is null
  similarity_threshold: 0.7
  similarity_threshold_lower: 0.7
  similarity_threshold_upper: 1.0 # valid interval of dynamic threshold when 'infer_using_dynamic_threshold' turns on
  max_tokens_per_group: 8
  manual_threshold: 0.6 # set to a fixed value when evaluate specific threshold

😘 Acknowlegement

We would like to thank the great work of following projects:

The adaptive mechanism implementation is based on the work from FlexiCodec and VARSTok.
Transformer implementation is based on the work from Mimi Codec

Downloads last month: -; Downloads are not tracked for this model. How to track