QuarkAudio-HCodec-1.5: A Unified Discrete Audio Tokenizer with adaptive frame rate for High-Fidelity, Multitask Audio Generation

Paper GitHub Hugging Face ModelScope

🎯 Quick Start: Run Inference in 3 Minutes

1. Installation

  1. Install dependencies from requirement.txt via pypi
  2. Download pretrained weights from Huggingface πŸ€—: QuarkAudio/HCodec-1.5-adaptive and save them to ./checkpoints/
  3. confirm the ckpt_path in file conf/config_adaptive_v3.yaml is valid

2. Clone Repository

git clone https://github.com/alibaba/unified-audio.git
cd QuarkAudio-HCodec

3. Create a Conda environment and install dependencies

conda create -n unise python=3.10
conda activate unise
pip install -r requirements.txt

4. Tokenizer

#!/bin/bash

python audio_tokenizer.py

5. Optional configuration

  • Customize your testing options about adaptive frame rate
  # hyperparameter configuration in conf/config_adaptive_v3.yaml

  training: false # keep false when testing
  use_similarity_alignment: true
  use_dynamic_similarity_threshold: false
  infer_using_dynamic_threshold: true # work when manual_threshold is null
  similarity_threshold: 0.7
  similarity_threshold_lower: 0.7
  similarity_threshold_upper: 1.0 # valid interval of dynamic threshold when 'infer_using_dynamic_threshold' turns on
  max_tokens_per_group: 8
  manual_threshold: 0.6 # set to a fixed value when evaluate specific threshold

😘 Acknowlegement

We would like to thank the great work of following projects:

  • The adaptive mechanism implementation is based on the work from FlexiCodec and VARSTok.
  • Transformer implementation is based on the work from Mimi Codec
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support