Provide examples to use alternative Text-to-Speech services #26

Fu-u0718 · 2024-04-09T02:33:44Z

アバターと日本語だけではなく、英語での会話も行ってみたいと考えているのですが、コードで使用しているVOICEVOXは英語が話せないと知りました。例えば、GoogleやAzureのText-to-Speechを使用するなどして組んだプログラムはお作りになっていませんか？

uezo · 2024-04-13T09:43:17Z

@Fu-u0718 says:
I'm interested in having conversations not only in Japanese using an avatar, but also in English. However, I found out that the VOICEVOX software used in the code does not support English. Have you created any programs that utilize Text-to-Speech services like Google or Azure for this purpose?

uezo · 2024-04-13T11:02:49Z

Hi @Fu-u0718 ,
You can make custom SpeechController that is based on TTS services you like.

Make SpeechController that implements aiavatar.speech.SpeechController
Set the instance of your custom SpeechController to AvatarController

Here is an example for Azure:

Make AzureSpeechController

import aiohttp
import asyncio
import io
from logging import getLogger, NullHandler
import traceback
import wave
import numpy
import sounddevice
from . import SpeechController


class VoiceClip:
    def __init__(self, text: str):
        self.text = text
        self.download_task = None
        self.audio_clip = None


class AzureSpeechController(SpeechController):
    def __init__(self, api_key: str, region: str, speaker_name: str="ja-JP-AoiNeural", speaker_gender: str="Female", lang="ja-JP", device_index: int=-1, playback_margin: float=0.1):
        self.logger = getLogger(__name__)
        self.logger.addHandler(NullHandler())

        self.api_key = api_key
        self.region = region
        self.speaker_name = speaker_name
        self.speaker_gender = speaker_gender
        self.lang = lang

        self.device_index = device_index
        self.playback_margin = playback_margin
        self.voice_clips = {}
        self._is_speaking = False

    async def download(self, voice: VoiceClip):
        url = f"https://{self.region}.tts.speech.microsoft.com/cognitiveservices/v1"
        headers = {
            "X-Microsoft-OutputFormat": "riff-16khz-16bit-mono-pcm",
            "Content-Type": "application/ssml+xml",
            "Ocp-Apim-Subscription-Key": self.api_key
        }
        ssml_text = f"<speak version='1.0' xml:lang='{self.lang}'><voice xml:lang='{self.lang}' xml:gender='{self.speaker_gender}' name='{self.speaker_name}'>{voice.text}</voice></speak>"
        data = ssml_text.encode("utf-8")

        async with aiohttp.ClientSession() as session:
            async with session.post(url, headers=headers, data=data) as response:
                if response.status == 200:
                    voice.audio_clip = await response.read()

    def prefetch(self, text: str):
        v = self.voice_clips.get(text)
        if v:
            return v

        v = VoiceClip(text)
        v.download_task = asyncio.create_task(self.download(v))
        self.voice_clips[text] = v
        return v

    async def speak(self, text: str):
        voice = self.prefetch(text)
        
        if not voice.audio_clip:
            await voice.download_task
        
        with wave.open(io.BytesIO(voice.audio_clip), "rb") as f:
            try:
                self._is_speaking = True
                data = numpy.frombuffer(
                    f.readframes(f.getnframes()),
                    dtype=numpy.int16
                )
                framerate = f.getframerate()
                sounddevice.play(data, framerate, device=self.device_index, blocking=False)
                await asyncio.sleep(len(data) / framerate + self.playback_margin)

            except Exception as ex:
                self.logger.error(f"Error at speaking: {str(ex)}\n{traceback.format_exc()}")

            finally:
                self._is_speaking = False

    def is_speaking(self) -> bool:
        return self._is_speaking

Set the instance of your custom SpeechController to AvatarController

app.avatar_controller.speech_controller = AzureSpeechController(
    AZURE_SUBSCRIPTION_KEY, AZURE_REGION,
    speaker_name="en-US-AvaNeural",
    speaker_gender="Female",
    lang="en-US",
    device_index=2    # Set output device number on you PC
)

However, I've found that AIAvatar has an issue handling English responses from ChatGPT. I will fix it soon.

uezo · 2024-04-13T11:34:46Z

I've fixed it👍
#32

Fu-u0718 · 2024-04-13T11:38:56Z

thank you! You will learn a lot. I would also like to enjoy conversation in English. Thank you for taking the time out of your busy schedule to respond!

mosu7 · 2024-07-15T09:30:05Z

Hi I tried with openai speech service, however it got stucked on [INFO] 2024-07-15 17:28:44,009 : Listening... (OpenAIWakewordListener)

uezo · 2024-07-15T11:44:50Z

Hi @mosu7,
Thank you for your post but we are discussing about Text-to-Speech in this issue, not wake word listener.
Make another issue if you want discuss about it.

uezo changed the title ~~Text-to-Speechについて~~ Provide examples to use alternative Text-to-Speech services Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide examples to use alternative Text-to-Speech services #26

Provide examples to use alternative Text-to-Speech services #26

Fu-u0718 commented Apr 9, 2024 •

edited

Loading

uezo commented Apr 13, 2024

uezo commented Apr 13, 2024

uezo commented Apr 13, 2024

Fu-u0718 commented Apr 13, 2024 •

edited

Loading

mosu7 commented Jul 15, 2024

uezo commented Jul 15, 2024

Provide examples to use alternative Text-to-Speech services #26

Provide examples to use alternative Text-to-Speech services #26

Comments

Fu-u0718 commented Apr 9, 2024 • edited Loading

uezo commented Apr 13, 2024

uezo commented Apr 13, 2024

uezo commented Apr 13, 2024

Fu-u0718 commented Apr 13, 2024 • edited Loading

mosu7 commented Jul 15, 2024

uezo commented Jul 15, 2024

Fu-u0718 commented Apr 9, 2024 •

edited

Loading

Fu-u0718 commented Apr 13, 2024 •

edited

Loading