This is a feature that I’d love to see as well; my understanding (not much at present, but I’m learning) would be that one could set up a server via node.js and through the user code block in the AI chat, send the AI response text (along with the speaker’s name) to that process. That process, being local on your machine, could potentially invoke a local instance of XTTS and speak the text.
This is conjecture on my part; I’ve been making some progress integrating per-message JS in the chat. Right now, for TTS, I’d like a means to separate narrator dialog, action text, and speaker text separately so that the TTS doesn’t simply say the entire message. For example:
"The sheriff walked slowly into the room. ‘Everyone freeze! I’m looking for Bad Bart’ "
I’d like to have the AI somehow separate this text so the narrator voice could speak the narrator part of the message and the character (with a different voice) would speak the character message. This would involve invoking the TTS engine twice for one message, as expected.
It will certainly take months for me to approach anything workable, but luckily technology will improve as well over time perhaps making it easier.
This is a feature that I’d love to see as well; my understanding (not much at present, but I’m learning) would be that one could set up a server via node.js and through the user code block in the AI chat, send the AI response text (along with the speaker’s name) to that process. That process, being local on your machine, could potentially invoke a local instance of XTTS and speak the text.
This is conjecture on my part; I’ve been making some progress integrating per-message JS in the chat. Right now, for TTS, I’d like a means to separate narrator dialog, action text, and speaker text separately so that the TTS doesn’t simply say the entire message. For example:
"The sheriff walked slowly into the room. ‘Everyone freeze! I’m looking for Bad Bart’ "
I’d like to have the AI somehow separate this text so the narrator voice could speak the narrator part of the message and the character (with a different voice) would speak the character message. This would involve invoking the TTS engine twice for one message, as expected.
It will certainly take months for me to approach anything workable, but luckily technology will improve as well over time perhaps making it easier.