Implement text-to-speech support on Android, iOS, HTML5, Linux, macOS, and Windows. #56192

bruvzg · 2021-12-23T12:09:35Z

Improved version of #21478 and https://github.com/bruvzg/godot_tts/tree/master.

Thanks to https://github.com/hpvb/dynload-wrapper LGPL dependency on Linux is loaded dynamically (the same way other LGPL libs are linked: asoundlib, pulseaudio and udev), on other platforms TTS is part of the core system API.

New functions:

bool DisplayServer.tts_is_speaking()
bool DisplayServer.tts_is_paused()
Array DisplayServer.tts_get_voices() - voice/language enumeration, returns Array of Dictionaries with following key-value pairs "name":String, "id":String, "language":String.
PackedStringArray DisplayServer.tts_get_voices_for_language(language) - returns array of voice IDs for the language.
void DisplayServer.tts_speak(text, voice_id, volume, pitch, rate, utterance_id, interrupt) - asynchronous, adds utterance with the specified parameters to the queue.
void DisplayServer.tts_stop()
void DisplayServer.tts_pause()
void DisplayServer.tts_resume()
void DisplayServer.tts_set_utterance_callback(event, callable) - adds a callback for the specific utterance event:
- started, ended, canceled/failed, callback function take one int parameter, the utterance_id passed to tts_speak.
- boundary, callback function take two int parameters, the char_index and utterance_id passed to tts_speak.

Also added ICU backed TextServer.string_get_word_breaks(text, language) required to add index marks at the word breaks.

Platforms:

Android, implemented and tested (on Android 11 / Poco F3).
Linux (speech-dispatcher), implemented and tested (on x86_64 Fedora 35, arm64 Ubuntu 21.10).
Windows (SAPI), implemented and tested (on Windows 11).
macOS (AVSpeech on 10.14+ / NSSpeech on older versions), implemented and tested (on M1 macOS and x86_64 via Rosetta).
iOS (AVSpeech), same code as macOS, tested on M1 mac.
Web, implemented and tested (on Firefox / Windows 11).

Notes:

All methods and callbacks are implemented on all platforms, but Web version probably won't have boundary callbacks on Linux, since it seems to be not implemented in Chromium and Firefox.

Demo:

tts2_test.zip
Demo last updated: 29. Dec. 2021.

Related: #14011, #20683, #20254

Partially implements godotengine/godot-proposals#983

Bugsquad edit: This closes godotengine/godot-proposals#3584.

doc/classes/DisplayServer.xml

m4gr3d · 2022-01-13T19:30:05Z

platform/android/display_server_android.cpp

+	if (ids.has(p_id)) {
+		int pos = 0;
+		if ((TTSUtteranceEvent)p_event == DisplayServer::TTS_UTTERANCE_BOUNDARY) {
+			// Convert position from UTF-8 to UTF-32.


Why is this needed?

Also, it may be better to add this utility method to CharString (if it's not available already) in order to prevent unnecessary code duplication.

This should be UTF-16 not UTF-8, but the idea is the same, Java string and Char16String will have some characters (with codes > 0xFFFF) encoded as two codepoints, Godot String as one. This is to compensate the difference in offsets. Not sure if it's worth adding a utility method to the CharString / Char16String, UTF-8 version is used once for Javascript, and UTF-16 twice for Android and Windows, and unlikely more will be added.

I think the fact it's used more than once warrant an addition :).

I'm mostly looking forward trying to avoid subtle bugs creeping in if one side is updated and the other not. With common methods, it's easy to see who's using it and the changes propagate automatically.
With custom logic, you lose that advantage.

platform/android/java/lib/src/org/godotengine/godot/Godot.java

platform/android/java/lib/src/org/godotengine/godot/GodotLib.java

m4gr3d

Most of the Android logic looks sounds; the majority of my comments are around styling and organization issues so let me know if you have any questions.

m4gr3d · 2022-01-14T17:07:17Z

platform/android/java/lib/src/org/godotengine/godot/Godot.java

@@ -166,6 +167,7 @@ private void setButtonPausedState(boolean paused) {

 	public static GodotIO io;
 	public static GodotNetUtils netUtils;
+	public static GodotTTS tts;


This can be made private. I believe the jni logic should still work as expected with the scope change, but let me know if it doesn't.

platform/android/java/lib/src/org/godotengine/godot/tts/GodotTTS.java

m4gr3d · 2022-01-14T17:10:21Z

platform/android/java/lib/src/org/godotengine/godot/tts/GodotTTS.java

+import java.util.LinkedList;
+import java.util.Set;
+
+public class GodotTTS {


Can you add a javadoc describing the role/use of the class.

From this PR's implementation, it looks like this class is mostly used on the rendering thread. I didn't see anything in the documentation that would suggest otherwise, but were you able to validate that calling the android tts apis on the render thread works as expected?

If you expect this class to be called both from the render thread and from the main thread, then some of its fields should be made thread-safe. Let me know if that's the case and I can advise on the best manner to do so.

platform/android/java/lib/src/org/godotengine/godot/tts/GodotTTS.java

platform/android/java/lib/src/org/godotengine/godot/tts/GodotUtteranceListener.java

m4gr3d · 2022-01-14T17:55:50Z

platform/android/java/lib/src/org/godotengine/godot/tts/GodotUtteranceListener.java

+	GodotTTS tts = null;
+
+	public void updateTTS() {
+		if (!tts.tts_speaking && tts.tts_queue.size() > 0) {


It's recommend to use methods instead of fields as it simplifies any future refactoring (e.g: method signature remains the same but internal logic and fields can change).

That said, given these classes are within the same package and interdependent, it's not a big issue here, so I leave the decision up to you.

platform/android/java/lib/src/org/godotengine/godot/tts/GodotUtteranceListener.java

akien-mga · 2022-04-28T10:17:51Z

Since we seem to make most "advanced" features opt out at compile time on Linux, this could be done here too:

diff --git a/platform/linuxbsd/SCsub b/platform/linuxbsd/SCsub
index 479659dfa4..d7ee9821b6 100644
--- a/platform/linuxbsd/SCsub
+++ b/platform/linuxbsd/SCsub
@@ -8,9 +8,9 @@ import platform_linuxbsd_builders
 common_linuxbsd = [
     "crash_handler_linuxbsd.cpp",
     "os_linuxbsd.cpp",
-    "tts_linux.cpp",
     "joypad_linux.cpp",
     "freedesktop_screensaver.cpp",
+    "tts_linux.cpp",
 ]
 
 if "x11" in env and env["x11"]:
@@ -27,7 +27,8 @@ if "vulkan" in env and env["vulkan"]:
 if "udev" in env and env["udev"]:
     common_linuxbsd.append("libudev-so_wrap.c")
 
-common_linuxbsd.append("speechd-so_wrap.c")
+if "speechd" in env and env["speechd"]:
+    common_linuxbsd.append("speechd-so_wrap.c")
 
 prog = env.add_program("#bin/godot", ["godot_linuxbsd.cpp"] + common_linuxbsd)
 
diff --git a/platform/linuxbsd/detect.py b/platform/linuxbsd/detect.py
index 2fba58fc53..1ebfd941d5 100644
--- a/platform/linuxbsd/detect.py
+++ b/platform/linuxbsd/detect.py
@@ -76,6 +76,7 @@ def get_opts():
         BoolVariable("pulseaudio", "Detect and use PulseAudio", True),
         BoolVariable("dbus", "Detect and use D-Bus to handle screensaver", True),
         BoolVariable("udev", "Use udev for gamepad connection callbacks", True),
+        BoolVariable("speechd", "Detect and use Speech Dispatcher for Text-to-Speech support", True),
         BoolVariable("x11", "Enable X11 display", True),
         BoolVariable("debug_symbols", "Add debugging symbols to release/release_debug builds", True),
         BoolVariable("separate_debug_symbols", "Create a separate file containing debugging symbols", False),
@@ -337,6 +338,13 @@ def configure(env):
         else:
             print("Warning: D-Bus development libraries not found. Disabling screensaver prevention.")
 
+    if env["speechd"]:
+        if os.system("pkg-config --exists speech-dispatcher") == 0:  # 0 means found
+            env.Append(CPPDEFINES=["SPEECHD_ENABLED"])
+            env.ParseConfig("pkg-config speech-dispatcher --cflags")  # Only cflags, we dlopen the library.
+        else:
+            print("Warning: Speech Dispatcher development libraries not found. Disabling Text-to-Speech support.")
+
     if platform.system() == "Linux":
         env.Append(CPPDEFINES=["JOYDEV_ENABLED"])
         if env["udev"]:
diff --git a/platform/linuxbsd/display_server_x11.cpp b/platform/linuxbsd/display_server_x11.cpp
index 44f96fd69e..59056d8be3 100644
--- a/platform/linuxbsd/display_server_x11.cpp
+++ b/platform/linuxbsd/display_server_x11.cpp
@@ -308,6 +308,8 @@ void DisplayServerX11::_flush_mouse_motion() {
 	xi.relative_motion.y = 0;
 }
 
+#ifdef SPEECHD_ENABLED
+
 bool DisplayServerX11::tts_is_speaking() const {
 	ERR_FAIL_COND_V(!tts, false);
 	return tts->is_speaking();
@@ -343,6 +345,8 @@ void DisplayServerX11::tts_stop() {
 	tts->stop();
 }
 
+#endif
+
 void DisplayServerX11::mouse_set_mode(MouseMode p_mode) {
 	_THREAD_SAFE_METHOD_
 
@@ -4669,8 +4673,10 @@ DisplayServerX11::DisplayServerX11(const String &p_rendering_driver, WindowMode
 	xdnd_finished = XInternAtom(x11_display, "XdndFinished", False);
 	xdnd_selection = XInternAtom(x11_display, "XdndSelection", False);
 
+#ifdef SPEECH_ENABLLED
 	// Init TTS
 	tts = memnew(TTS_Linux);
+#endif
 
 	//!!!!!!!!!!!!!!!!!!!!!!!!!!
 	//TODO - do Vulkan and OpenGL support checks, driver selection and fallback
@@ -5024,7 +5030,9 @@ DisplayServerX11::~DisplayServerX11() {
 		memfree(xmbstring);
 	}
 
+#ifdef SPEECHD_ENABLED
 	memdelete(tts);
+#endif
 
 #ifdef DBUS_ENABLED
 	memdelete(screensaver);
diff --git a/platform/linuxbsd/display_server_x11.h b/platform/linuxbsd/display_server_x11.h
index 9c77fe4189..10be853604 100644
--- a/platform/linuxbsd/display_server_x11.h
+++ b/platform/linuxbsd/display_server_x11.h
@@ -45,7 +45,6 @@
 #include "servers/audio_server.h"
 #include "servers/rendering/renderer_compositor.h"
 #include "servers/rendering_server.h"
-#include "tts_linux.h"
 
 #if defined(GLES3_ENABLED)
 #include "gl_manager_x11.h"
@@ -60,6 +59,10 @@
 #include "freedesktop_screensaver.h"
 #endif
 
+#if defined(SPEECHD_ENABLED)
+#include "tts_linux.h"
+#endif
+
 #include <X11/Xcursor/Xcursor.h>
 #include <X11/Xlib.h>
 #include <X11/extensions/XInput2.h>
@@ -113,7 +116,9 @@ class DisplayServerX11 : public DisplayServer {
 	bool keep_screen_on = false;
 #endif
 
+#if defined(SPEECHD_ENABLED)
 	TTS_Linux *tts = nullptr;
+#endif
 
 	struct WindowData {
 		Window x11_window;
@@ -301,6 +306,7 @@ public:
 	virtual bool has_feature(Feature p_feature) const override;
 	virtual String get_name() const override;
 
+#if defined(SPEECHD_ENABLED)
 	virtual bool tts_is_speaking() const override;
 	virtual bool tts_is_paused() const override;
 	virtual Array tts_get_voices() const override;
@@ -309,6 +315,7 @@ public:
 	virtual void tts_pause() override;
 	virtual void tts_resume() override;
 	virtual void tts_stop() override;
+#endif
 
 	virtual void mouse_set_mode(MouseMode p_mode) override;
 	virtual MouseMode mouse_get_mode() const override;
diff --git a/platform/linuxbsd/speechd-so_wrap.c b/platform/linuxbsd/speechd-so_wrap.c
index 34a2418033..1a3f8e5436 100644
--- a/platform/linuxbsd/speechd-so_wrap.c
+++ b/platform/linuxbsd/speechd-so_wrap.c
@@ -1,7 +1,7 @@
 // This file is generated. Do not edit!
 // see https://github.com/hpvb/dynload-wrapper for details
-// generated by ./dynload-wrapper/generate-wrapper.py 0.3 on 2021-11-05 07:08:15
-// flags: ./dynload-wrapper/generate-wrapper.py --sys-include <speech-dispatcher/libspeechd.h> --include /usr/include/speech-dispatcher/libspeechd.h --soname libspeechd.so.2 --init-name speechd --output-header speechd-so_wrap.h --output-implementation speechd-so_wrap.c
+// generated by ./generate-wrapper.py 0.3 on 2022-04-28 11:58:03
+// flags: ./generate-wrapper.py --sys-include <libspeechd.h> --include /usr/include/speech-dispatcher/libspeechd.h --soname libspeechd.so.2 --init-name speechd --output-header speechd-so_wrap.h --output-implementation speechd-so_wrap.c --omit-prefix spd_get_client_list
 //
 #include <stdint.h>
 
@@ -71,7 +71,6 @@
 #define spd_set_output_module spd_set_output_module_dylibloader_orig_speechd
 #define spd_set_output_module_all spd_set_output_module_all_dylibloader_orig_speechd
 #define spd_set_output_module_uid spd_set_output_module_uid_dylibloader_orig_speechd
-#define spd_get_client_list spd_get_client_list_dylibloader_orig_speechd
 #define spd_get_message_list_fd spd_get_message_list_fd_dylibloader_orig_speechd
 #define spd_list_modules spd_list_modules_dylibloader_orig_speechd
 #define free_spd_modules free_spd_modules_dylibloader_orig_speechd
@@ -85,7 +84,7 @@
 #define spd_execute_command_wo_mutex spd_execute_command_wo_mutex_dylibloader_orig_speechd
 #define spd_send_data spd_send_data_dylibloader_orig_speechd
 #define spd_send_data_wo_mutex spd_send_data_wo_mutex_dylibloader_orig_speechd
-#include <speech-dispatcher/libspeechd.h>
+#include <libspeechd.h>
 #undef SPDConnectionAddress__free
 #undef spd_get_default_address
 #undef spd_open
@@ -152,7 +151,6 @@
 #undef spd_set_output_module
 #undef spd_set_output_module_all
 #undef spd_set_output_module_uid
-#undef spd_get_client_list
 #undef spd_get_message_list_fd
 #undef spd_list_modules
 #undef free_spd_modules
@@ -171,7 +169,7 @@
 void (*SPDConnectionAddress__free_dylibloader_wrapper_speechd)( SPDConnectionAddress*);
 SPDConnectionAddress* (*spd_get_default_address_dylibloader_wrapper_speechd)( char**);
 SPDConnection* (*spd_open_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode);
-SPDConnection* (*spd_open2_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode, SPDConnectionAddress*, int, char**);
+SPDConnection* (*spd_open2_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode,const SPDConnectionAddress*, int, char**);
 int (*spd_get_client_id_dylibloader_wrapper_speechd)( SPDConnection*);
 void (*spd_close_dylibloader_wrapper_speechd)( SPDConnection*);
 int (*spd_say_dylibloader_wrapper_speechd)( SPDConnection*, SPDPriority,const char*);
@@ -234,7 +232,6 @@ char* (*spd_get_language_dylibloader_wrapper_speechd)( SPDConnection*);
 int (*spd_set_output_module_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 int (*spd_set_output_module_all_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 int (*spd_set_output_module_uid_dylibloader_wrapper_speechd)( SPDConnection*,const char*, unsigned int);
-int (*spd_get_client_list_dylibloader_wrapper_speechd)( SPDConnection*, char**, int*, int*);
 int (*spd_get_message_list_fd_dylibloader_wrapper_speechd)( SPDConnection*, int, int*, char**);
 char** (*spd_list_modules_dylibloader_wrapper_speechd)( SPDConnection*);
 void (*free_spd_modules_dylibloader_wrapper_speechd)( char**);
@@ -242,10 +239,10 @@ char* (*spd_get_output_module_dylibloader_wrapper_speechd)( SPDConnection*);
 char** (*spd_list_voices_dylibloader_wrapper_speechd)( SPDConnection*);
 SPDVoice** (*spd_list_synthesis_voices_dylibloader_wrapper_speechd)( SPDConnection*);
 void (*free_spd_voices_dylibloader_wrapper_speechd)( SPDVoice**);
-char** (*spd_execute_command_with_list_reply_dylibloader_wrapper_speechd)( SPDConnection*, char*);
-int (*spd_execute_command_dylibloader_wrapper_speechd)( SPDConnection*, char*);
-int (*spd_execute_command_with_reply_dylibloader_wrapper_speechd)( SPDConnection*, char*, char**);
-int (*spd_execute_command_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*, char*);
+char** (*spd_execute_command_with_list_reply_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
+int (*spd_execute_command_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
+int (*spd_execute_command_with_reply_dylibloader_wrapper_speechd)( SPDConnection*,const char*, char**);
+int (*spd_execute_command_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 char* (*spd_send_data_dylibloader_wrapper_speechd)( SPDConnection*,const char*, int);
 char* (*spd_send_data_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*,const char*, int);
 int initialize_speechd(int verbose) {
@@ -787,14 +784,6 @@ int initialize_speechd(int verbose) {
       fprintf(stderr, "%s\n", error);
     }
   }
-// spd_get_client_list
-  *(void **) (&spd_get_client_list_dylibloader_wrapper_speechd) = dlsym(handle, "spd_get_client_list");
-  if (verbose) {
-    error = dlerror();
-    if (error != NULL) {
-      fprintf(stderr, "%s\n", error);
-    }
-  }
 // spd_get_message_list_fd
   *(void **) (&spd_get_message_list_fd_dylibloader_wrapper_speechd) = dlsym(handle, "spd_get_message_list_fd");
   if (verbose) {
diff --git a/platform/linuxbsd/speechd-so_wrap.h b/platform/linuxbsd/speechd-so_wrap.h
index 043ba6c3c6..b8c59bf0d8 100644
--- a/platform/linuxbsd/speechd-so_wrap.h
+++ b/platform/linuxbsd/speechd-so_wrap.h
@@ -2,8 +2,8 @@
 #define DYLIBLOAD_WRAPPER_SPEECHD
 // This file is generated. Do not edit!
 // see https://github.com/hpvb/dynload-wrapper for details
-// generated by ./dynload-wrapper/generate-wrapper.py 0.3 on 2021-11-05 07:08:15
-// flags: ./dynload-wrapper/generate-wrapper.py --sys-include <speech-dispatcher/libspeechd.h> --include /usr/include/speech-dispatcher/libspeechd.h --soname libspeechd.so.2 --init-name speechd --output-header speechd-so_wrap.h --output-implementation speechd-so_wrap.c
+// generated by ./generate-wrapper.py 0.3 on 2022-04-28 11:58:03
+// flags: ./generate-wrapper.py --sys-include <libspeechd.h> --include /usr/include/speech-dispatcher/libspeechd.h --soname libspeechd.so.2 --init-name speechd --output-header speechd-so_wrap.h --output-implementation speechd-so_wrap.c --omit-prefix spd_get_client_list
 //
 #include <stdint.h>
 
@@ -73,7 +73,6 @@
 #define spd_set_output_module spd_set_output_module_dylibloader_orig_speechd
 #define spd_set_output_module_all spd_set_output_module_all_dylibloader_orig_speechd
 #define spd_set_output_module_uid spd_set_output_module_uid_dylibloader_orig_speechd
-#define spd_get_client_list spd_get_client_list_dylibloader_orig_speechd
 #define spd_get_message_list_fd spd_get_message_list_fd_dylibloader_orig_speechd
 #define spd_list_modules spd_list_modules_dylibloader_orig_speechd
 #define free_spd_modules free_spd_modules_dylibloader_orig_speechd
@@ -87,7 +86,7 @@
 #define spd_execute_command_wo_mutex spd_execute_command_wo_mutex_dylibloader_orig_speechd
 #define spd_send_data spd_send_data_dylibloader_orig_speechd
 #define spd_send_data_wo_mutex spd_send_data_wo_mutex_dylibloader_orig_speechd
-#include <speech-dispatcher/libspeechd.h>
+#include <libspeechd.h>
 #undef SPDConnectionAddress__free
 #undef spd_get_default_address
 #undef spd_open
@@ -154,7 +153,6 @@
 #undef spd_set_output_module
 #undef spd_set_output_module_all
 #undef spd_set_output_module_uid
-#undef spd_get_client_list
 #undef spd_get_message_list_fd
 #undef spd_list_modules
 #undef free_spd_modules
@@ -237,7 +235,6 @@ extern "C" {
 #define spd_set_output_module spd_set_output_module_dylibloader_wrapper_speechd
 #define spd_set_output_module_all spd_set_output_module_all_dylibloader_wrapper_speechd
 #define spd_set_output_module_uid spd_set_output_module_uid_dylibloader_wrapper_speechd
-#define spd_get_client_list spd_get_client_list_dylibloader_wrapper_speechd
 #define spd_get_message_list_fd spd_get_message_list_fd_dylibloader_wrapper_speechd
 #define spd_list_modules spd_list_modules_dylibloader_wrapper_speechd
 #define free_spd_modules free_spd_modules_dylibloader_wrapper_speechd
@@ -254,7 +251,7 @@ extern "C" {
 extern void (*SPDConnectionAddress__free_dylibloader_wrapper_speechd)( SPDConnectionAddress*);
 extern SPDConnectionAddress* (*spd_get_default_address_dylibloader_wrapper_speechd)( char**);
 extern SPDConnection* (*spd_open_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode);
-extern SPDConnection* (*spd_open2_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode, SPDConnectionAddress*, int, char**);
+extern SPDConnection* (*spd_open2_dylibloader_wrapper_speechd)(const char*,const char*,const char*, SPDConnectionMode,const SPDConnectionAddress*, int, char**);
 extern int (*spd_get_client_id_dylibloader_wrapper_speechd)( SPDConnection*);
 extern void (*spd_close_dylibloader_wrapper_speechd)( SPDConnection*);
 extern int (*spd_say_dylibloader_wrapper_speechd)( SPDConnection*, SPDPriority,const char*);
@@ -317,7 +314,6 @@ extern char* (*spd_get_language_dylibloader_wrapper_speechd)( SPDConnection*);
 extern int (*spd_set_output_module_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 extern int (*spd_set_output_module_all_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 extern int (*spd_set_output_module_uid_dylibloader_wrapper_speechd)( SPDConnection*,const char*, unsigned int);
-extern int (*spd_get_client_list_dylibloader_wrapper_speechd)( SPDConnection*, char**, int*, int*);
 extern int (*spd_get_message_list_fd_dylibloader_wrapper_speechd)( SPDConnection*, int, int*, char**);
 extern char** (*spd_list_modules_dylibloader_wrapper_speechd)( SPDConnection*);
 extern void (*free_spd_modules_dylibloader_wrapper_speechd)( char**);
@@ -325,10 +321,10 @@ extern char* (*spd_get_output_module_dylibloader_wrapper_speechd)( SPDConnection
 extern char** (*spd_list_voices_dylibloader_wrapper_speechd)( SPDConnection*);
 extern SPDVoice** (*spd_list_synthesis_voices_dylibloader_wrapper_speechd)( SPDConnection*);
 extern void (*free_spd_voices_dylibloader_wrapper_speechd)( SPDVoice**);
-extern char** (*spd_execute_command_with_list_reply_dylibloader_wrapper_speechd)( SPDConnection*, char*);
-extern int (*spd_execute_command_dylibloader_wrapper_speechd)( SPDConnection*, char*);
-extern int (*spd_execute_command_with_reply_dylibloader_wrapper_speechd)( SPDConnection*, char*, char**);
-extern int (*spd_execute_command_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*, char*);
+extern char** (*spd_execute_command_with_list_reply_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
+extern int (*spd_execute_command_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
+extern int (*spd_execute_command_with_reply_dylibloader_wrapper_speechd)( SPDConnection*,const char*, char**);
+extern int (*spd_execute_command_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*,const char*);
 extern char* (*spd_send_data_dylibloader_wrapper_speechd)( SPDConnection*,const char*, int);
 extern char* (*spd_send_data_wo_mutex_dylibloader_wrapper_speechd)( SPDConnection*,const char*, int);
 int initialize_speechd(int verbose);
diff --git a/platform/linuxbsd/tts_linux.cpp b/platform/linuxbsd/tts_linux.cpp
index aea1183d3d..0ffa52f7bb 100644
--- a/platform/linuxbsd/tts_linux.cpp
+++ b/platform/linuxbsd/tts_linux.cpp
@@ -30,6 +30,8 @@
 
 #include "tts_linux.h"
 
+#ifdef SPEECHD_ENABLED
+
 #include "core/config/project_settings.h"
 #include "servers/text_server.h"
 
@@ -259,3 +261,5 @@ TTS_Linux::~TTS_Linux() {
 
 	singleton = nullptr;
 }
+
+#endif // SPEECHD_ENABLED
diff --git a/platform/linuxbsd/tts_linux.h b/platform/linuxbsd/tts_linux.h
index 12a3d0f052..fcc243eaa6 100644
--- a/platform/linuxbsd/tts_linux.h
+++ b/platform/linuxbsd/tts_linux.h
@@ -31,6 +31,8 @@
 #ifndef TTS_LINUX_H
 #define TTS_LINUX_H
 
+#ifdef SPEECHD_ENABLED
+
 #include "core/os/thread.h"
 #include "core/os/thread_safe.h"
 #include "core/string/ustring.h"
@@ -39,8 +41,6 @@
 #include "core/variant/array.h"
 #include "servers/display_server.h"
 
-#include <speech-dispatcher/libspeechd.h>
-
 #include "speechd-so_wrap.h"
 
 class TTS_Linux {
@@ -77,4 +77,6 @@ public:
 	~TTS_Linux();
 };
 
+#endif // SPEECHD_ENABLED
+
 #endif // TTS_LINUX_H

It's questionable whether this is really useful, as the dlopen mechanics already makes it optional, but this can be used to remove the compile-time dependency (since the wrapper needs the actual library header).

BTW I changed the system header format since speech-dispatcher seems to be included in the CFLAGS:

 $ pkg-config speech-dispatcher --cflags
-I/usr/include/speech-dispatcher -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include

So this would be needed to compile on NixOS (see #59991), or on any distro where the speechd includes are not in /usr/include.

bruvzg · 2022-04-28T11:20:13Z

Added a build option and a new wrapper, for extra compatibility, generated from older libspeechd2 0.9.1-4 (Ubuntu 20.04 LTS version).

akien-mga · 2022-04-28T11:33:00Z

I think you need to use --sys-include <libspeechd.h> instead of --sys-include <speech-dispatcher/libspeechd.h> as I mentioned here:

BTW I changed the system header format since speech-dispatcher seems to be included in the CFLAGS:
$ pkg-config speech-dispatcher --cflags
-I/usr/include/speech-dispatcher -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include
So this would be needed to compile on NixOS (see #59913), or on any distro where the speechd includes are not in /usr/include.

… and Windows. Implement TextServer word break method.

bruvzg · 2022-04-28T11:37:00Z

I think you need to use --sys-include <libspeechd.h> instead of --sys-include <speech-dispatcher/libspeechd.h> as I mentioned here

Changed, it seems to work both ways.

Changes done.

akien-mga

Tested and reviewed Linux code, looks good!

akien-mga · 2022-04-28T13:09:37Z

Thanks!

Sslaxx · 2022-04-29T23:00:51Z

Would there be any chance of backporting this to 3.x?

CC @Cheeseness

CsloudX · 2022-05-01T01:48:51Z

Wow! Godot, why I love you so much!!!

CsloudX · 2022-05-13T01:16:49Z

Why this methods not under AudioServer but DisplayServer?

seocwen · 2022-05-17T00:13:14Z

The demo for this is out of date. I believe line 31 should be:
$ButtonPause.set_pressed(DisplayServer.tts_is_paused())

Great work though. I'd love if the voice playback could be routed through the audio buses so we could throw effects on them.

lesleyrs · 2022-12-31T17:34:48Z

Linux (speech-dispatcher), implemented and tested (on x86_64 Fedora 35, arm64 Ubuntu 21.10)

@bruvzg On fedora 37 it seems to just spam TTS is not supported by this display server. Even after switching to X11. I haven't tried any other version.

How do you get it working?

Calinou · 2022-12-31T17:36:12Z

@bruvzg On fedora 37 it seems to just spam TTS is not supported by this display server. Even after switching to X11. I haven't tried any other version.

How do you get it working?

See #67863.

ghost · 2023-02-17T00:17:41Z

Linux speech-dispatcher gets my ears bleeding. It's also hard to understand, just a sound mess. Have you guys tried using RHVoice instead? It has much, MUCH better quality.

Cheeseness · 2023-02-17T04:46:26Z

Linux speech-dispatcher gets my ears bleeding. It's also hard to understand, just a sound mess. Have you guys tried using RHVoice instead? It has much, MUCH better quality.

Out of the box, Speech Dispatcher doesn't sound great to my ear, but I can be confident that any users dependent on it will already have it configured with voices (you're not limited to the default!), speeds, volumes, etc. to match their own tastes and needs, which is important!

The understanding I've gained from talking to vision-impaired players is that the kind of robotty voice that Speech Dispatcher users by default can end up being easier to parse at higher speed for people who are used to it. The analogy I often use to help explain this is that it's similar to the difference between a fancy script font that feels hand written with personality and a monospace font or a dyslexic-friendly font that's focused on readability but not very pretty by comparison - which one is easier to skim for useful information?

IMO, TTS as an accessibility tool (rather than a substitute for voice acting) is best viewed as interface rather than content - an insight that took me some time to internalise.

All that said, I don't rely on these features for my day-to-day computer usage and will happily defer to anybody who does.

And with all that said, from what I can see, RHVoice uses Speech Dispatcher on Linux, just with some different initial configuration loaded, which again, people who use this stuff will already have set up the way they need/like it to be.

Calinou · 2023-02-18T00:09:50Z

The understanding I've gained from talking to vision-impaired players is that the kind of robotty voice that Speech Dispatcher users by default can end up being easier to parse at higher speed for people who are used to it.

This is indeed the case 🙂

Here's a sample of a screenreader used at such high speeds: https://s3.amazonaws.com/freecodecamp/screen-reader.mp3
It may sound like gibberish, but it is intelligible with proper training.

ghost · 2023-02-19T07:22:49Z

Out of the box, Speech Dispatcher doesn't sound great to my ear, but I can be confident that any users dependent on it will already have it configured with voices (you're not limited to the default!), speeds, volumes, etc. to match their own tastes and needs, which is important!

Sure. But I'm rather talking about the characters voicing, not screen reader feature. In this special case, character voices should sound the way developer wants it to. You may suggest using pre-recorded .wav's, but what if I want characters to voice player's name as well?

It may sound like gibberish, but it is intelligible with proper training.

Eminem mode, lol

Zireael07 · 2023-02-19T09:33:37Z

Sure. But I'm rather talking about the characters voicing, not screen reader feature. In this special case, character voices should sound the way developer wants it to.

Current text to speech voices are very limited and not at all relevant to voicing characters.

Calinou · 2023-02-20T17:38:34Z

Text-to-speech is meant to be an accessibility or convenience feature (e.g. to read player text chat aloud).

TTS APIs aren't what you should use for voicing characters. You will need an entirely different solution, which is more complex to develop as there's no standard for this. AI-based voice synthesis is a thing but it needs to be done offline, as it requires lots of hardware resources.

ghost · 2023-02-24T04:02:47Z

You will need an entirely different solution, which is more complex to develop as there's no standard for this.

I realize that my suggestion may not be convinient for many people, but everyone should have a choice. It's more a matter of freedom, it's cheap and it's better than nothing. Besides, it might give the game a twist, making it unique. So why not?

bruvzg added feature proposal topic:audio labels Dec 23, 2021

bruvzg added this to the 4.0 milestone Dec 23, 2021

bruvzg force-pushed the tts2.0 branch 7 times, most recently from aec2b7c to 2ca7483 Compare December 30, 2021 07:18

Calinou added the topic:porting label Jan 11, 2022

bruvzg force-pushed the tts2.0 branch 3 times, most recently from d7f2742 to c76e988 Compare January 13, 2022 11:46

bruvzg marked this pull request as ready for review January 13, 2022 11:46

bruvzg requested review from a team as code owners January 13, 2022 11:46

m4gr3d requested changes Jan 13, 2022

View reviewed changes

bruvzg force-pushed the tts2.0 branch from c76e988 to 5823c20 Compare January 14, 2022 08:07

m4gr3d requested changes Jan 14, 2022

View reviewed changes

bruvzg force-pushed the tts2.0 branch 2 times, most recently from 2ada3fa to 91018a3 Compare January 21, 2022 07:43

bruvzg force-pushed the tts2.0 branch from efad567 to 115369d Compare April 28, 2022 11:20

Implement text-to-speech support on Android, iOS, HTML5, Linux, macOS…

6ab672d

… and Windows. Implement TextServer word break method.

bruvzg force-pushed the tts2.0 branch from 115369d to 6ab672d Compare April 28, 2022 11:36

akien-mga approved these changes Apr 28, 2022

View reviewed changes

akien-mga merged commit d25c3aa into godotengine:master Apr 28, 2022

bruvzg deleted the tts2.0 branch April 28, 2022 13:15

Calinou mentioned this pull request May 16, 2022

List speech-dispatcher libraries in distro oneliners in Compiling for Linux/*BSD godotengine/godot-docs#5823

Open

This was referenced May 23, 2022

Add a text-to-speech demo godotengine/godot-demo-projects#744

Merged

[3.x] Backport text-to-speech support. #61316

Merged

and-rad mentioned this pull request Jun 14, 2022

Documentation pages to add or update for Godot 4.0 godotengine/godot-docs#5121

Open

41 tasks

lawnjelly mentioned this pull request May 16, 2023

Text to speech draining CPU with Pulse Audio when not in use #77124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement text-to-speech support on Android, iOS, HTML5, Linux, macOS, and Windows. #56192

Implement text-to-speech support on Android, iOS, HTML5, Linux, macOS, and Windows. #56192

bruvzg commented Dec 23, 2021 •

edited

Loading

m4gr3d Jan 13, 2022

bruvzg Jan 14, 2022

m4gr3d Jan 14, 2022

m4gr3d left a comment

m4gr3d Jan 14, 2022

m4gr3d Jan 14, 2022

m4gr3d Jan 14, 2022

akien-mga commented Apr 28, 2022 •

edited

Loading

bruvzg commented Apr 28, 2022

akien-mga commented Apr 28, 2022

bruvzg commented Apr 28, 2022

akien-mga left a comment

akien-mga commented Apr 28, 2022

Sslaxx commented Apr 29, 2022

CsloudX commented May 1, 2022

CsloudX commented May 13, 2022

seocwen commented May 17, 2022

lesleyrs commented Dec 31, 2022

Calinou commented Dec 31, 2022

ghost commented Feb 17, 2023

Cheeseness commented Feb 17, 2023

Calinou commented Feb 18, 2023 •

edited

Loading

ghost commented Feb 19, 2023 •

edited by ghost

Loading

Zireael07 commented Feb 19, 2023

Calinou commented Feb 20, 2023 •

edited

Loading

ghost commented Feb 24, 2023 •

edited by ghost

Loading

Implement text-to-speech support on Android, iOS, HTML5, Linux, macOS, and Windows. #56192

Implement text-to-speech support on Android, iOS, HTML5, Linux, macOS, and Windows. #56192

Conversation

bruvzg commented Dec 23, 2021 • edited Loading

New functions:

Platforms:

Notes:

Demo:

m4gr3d Jan 13, 2022

Choose a reason for hiding this comment

bruvzg Jan 14, 2022

Choose a reason for hiding this comment

m4gr3d Jan 14, 2022

Choose a reason for hiding this comment

m4gr3d left a comment

Choose a reason for hiding this comment

m4gr3d Jan 14, 2022

Choose a reason for hiding this comment

m4gr3d Jan 14, 2022

Choose a reason for hiding this comment

m4gr3d Jan 14, 2022

Choose a reason for hiding this comment

akien-mga commented Apr 28, 2022 • edited Loading

bruvzg commented Apr 28, 2022

akien-mga commented Apr 28, 2022

bruvzg commented Apr 28, 2022

akien-mga left a comment

Choose a reason for hiding this comment

akien-mga commented Apr 28, 2022

Sslaxx commented Apr 29, 2022

CsloudX commented May 1, 2022

CsloudX commented May 13, 2022

seocwen commented May 17, 2022

lesleyrs commented Dec 31, 2022

Calinou commented Dec 31, 2022

ghost commented Feb 17, 2023

Cheeseness commented Feb 17, 2023

Calinou commented Feb 18, 2023 • edited Loading

ghost commented Feb 19, 2023 • edited by ghost Loading

Zireael07 commented Feb 19, 2023

Calinou commented Feb 20, 2023 • edited Loading

ghost commented Feb 24, 2023 • edited by ghost Loading

bruvzg commented Dec 23, 2021 •

edited

Loading

akien-mga commented Apr 28, 2022 •

edited

Loading

Calinou commented Feb 18, 2023 •

edited

Loading

ghost commented Feb 19, 2023 •

edited by ghost

Loading

Calinou commented Feb 20, 2023 •

edited

Loading

ghost commented Feb 24, 2023 •

edited by ghost

Loading