View in Telegram
Speech Technology
An example of issue copied from repo to repo:
https://github.com/jaywalnut310/vits/issues/11
in vits we predict float duration and then convert it to attention steps. So we need to round floats. VITS applies ceil which results in longer duration than original (usually the scale is 0.9). As a result, you need to scale back to match original length
https://github.com/jaywalnut310/vits/blob/main/models.py#L511
In glowtts there is extra clamp
https://github.com/coqui-ai/TTS/blob/main/TTS/tts/models/glow_tts.py#L351
This thing is copied from repo to repo, fun thing happends in Matcha, where we multiply by length factor after we applied ceil:
https://github.com/shivammehta25/Matcha-TTS/blob/main/matcha/models/matcha_tts.py#L122
GitHub
About ceiling for calculating phoneme duration 路 Issue #11 路 jaywalnut310/vits
Is there any reason to use torch.ceil instead of torch.round or other algorithms for calculating phoneme duration? Thank you.
Share
Love Center - Dating, Friends & Matches, NY, LA, Dubai, Global
Find friends or serious relationships easily
Start