Was this model also trained in two stage fashion as the Parakeet model?
I am curious how did the model gain proper punctuation and capitalization capability as many of the public and classic datasets (Librispeech, Fischer) are not punctuated, capitalized or have numbers transcribed as numerals? Were those speech datasets re-transcribed to ensure proper punctuation and capitalization? Or did the stage-2 training with properly punctuated and capitalized dataset ensure this?