Speech Recognition & Synthesis: Difference between revisions

Speech Recognition & Synthesis
Developer(s)	Google
Initial release	November 13, 2013; 10 years ago
Stable release	Version Version googletts.google-speech-apk_20240416.00_p2.627182800(Android 8-14) / April 16, 2024; 60 days ago
Operating system	Android
Type	Screen reader

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Inline

Latest revision as of 11:54, 18 May 2024

Speech Recognition & Synthesis, formerly known as Speech Services,^[2] is a screen reader application developed by Google for its Android operating system. It powers applications to read aloud (speak) the text on the screen, with support for many languages. Text-to-Speech may be used by apps such as Google Play Books for reading books aloud, Google Translate for reading aloud translations for the pronunciation of words, Google TalkBack, and other spoken feedback accessibility-based applications, as well as by third-party apps. Users must install voice data for each language.

Supported languages[edit]

Albanian (Albania)
Arabic
Assamese (India)
Bengla (Bangladesh)
Bengla (India)
Bodo (India)
Bosnian (Bosnia and Herzegovina)
Bulgarian (Bulgaria)
Cantonese (Hong Kong)
Catalan (Spain)
Chinese (China)
Chinese (Taiwan)
Croatian (Croatia)
Czech (Czech Republic)
Danish (Denmark)
Dogri (India)
Dutch (Belgium)
Dutch (Netherlands)
English (Australia)
English (Nigeria)
English (India)
English (United Kingdom)
English (United States)
Estonian (Estonia)
Filipino (Philippines)
Finnish (Finland)
French (Canadian)
French (France)
German (Germany)
Greek (Greece)
Gujarati (India)
Hebrew (Israel)
Hindi (India)
Hungarian (Hungary)
Icelandic (Iceland)
Indonesian (Indonesia)
Italian (Italy)
Japanese (Japan)
Javanese (Indonesia)
Kannada (India)
Kashmiri (India)
Khmer (Cambodia)
Konkani (India)
Korean (South Korea)
Latvian (Latvia)
Lithuanian (Lithuania)
Maithili (India)
Malay (Malaysia)
Malayalam (India)
Manipuri (India)
Marathi (India)
Nepali (Nepal)
Norwegian Bokmål (Norway)
Odia (India)
Polish (Poland)
Portuguese (Brazil)
Portuguese (Portugal)
Punjabi (India)
Romanian (Romania)
Russian (Russia)
Sanskirt (India)
Santali (India)
Sindhi (India)
Sinhala (Sri Lanka)
Slovak (Slovakia)
Spanish (Spain)
Spanish (United States)
Sundanese (Indonesia)
Swahili (Kenya)
Swedish (Sweden)
Tamil (India)
Telugu (India)
Thai (Thailand)
Turkish (Turkey)
Ukrainian (Ukraine)
Urdu (Pakistan)
Vietnamese (Vietnam)
Welsh (United Kingdom)

History[edit]

Some app developers have started adapting and tweaking their Android Auto apps to include Text-to-Speech, such as Hyundai in 2015.^[3] Apps such as textPlus and WhatsApp use Text-to-Speech to read notifications aloud and provide voice-reply functionality.

Google Cloud Text-to-Speech is powered by WaveNet,^[4] software created by Google's UK-based AI subsidiary DeepMind, which was bought by Google in 2014.^[5] It tries to distinguish from its competitors, Amazon and Microsoft.^[6]

Most voice synthesizers (including Apple's Siri) use concatenative synthesis,^[4] in which a program stores individual phonemes and then pieces them together to form words and sentences. WaveNet synthesizes speech with human-like emphasis and inflection on syllables, phonemes, and words. Unlike most other text-to-speech systems, a WaveNet model creates raw audio waveforms from scratch. The model uses a neural network that has been trained using a large volume of speech samples. During training, the network extracts the underlying structure of the speech, such as which tones follow each other and what a realistic speech waveform looks like. When given a text input, the trained WaveNet model can generate the corresponding speech waveforms from scratch, one sample at a time, with up to 24,000 samples per second and smooth transitions between the individual sounds.^[4]

The service was renamed Speech Recognition & Synthesis in 2023.^{[citation needed]}

References[edit]

^ "Speech Services by Google APKs". APKMirror.
^ Wang, Jules (November 8, 2021). "You'll never guess the latest Google app to cross 10 billion installs (seriously)". Android Police. Archived from the original on November 8, 2021. Retrieved November 18, 2021.
^ "Google, Hyundai show off new third-party Android Auto apps". CNET. CBS Interactive. Retrieved 17 January 2015.
^ ^a ^b ^c "WaveNet". www.deepmind.com. Retrieved 2023-06-22.
^ Gibbs, Samuel (2014-01-27). "Google buys UK artificial intelligence startup Deepmind for £400m". The Guardian. ISSN 0261-3077. Retrieved 2023-06-22.
^ "Text-to-Speech AI: Lifelike Speech Synthesis". Google Cloud. Retrieved 2023-06-22.

External links[edit]

Speech Recognition & Synthesis on Google Play

[1] "Speech Services by Google APKs". APKMirror.

[2] Wang, Jules (November 8, 2021). "You'll never guess the latest Google app to cross 10 billion installs (seriously)". Android Police. Archived from the original on November 8, 2021. Retrieved November 18, 2021.

[3] "Google, Hyundai show off new third-party Android Auto apps". CNET. CBS Interactive. Retrieved 17 January 2015.

[:0-4] "WaveNet". www.deepmind.com. Retrieved 2023-06-22.

[5] Gibbs, Samuel (2014-01-27). "Google buys UK artificial intelligence startup Deepmind for £400m". The Guardian. ISSN 0261-3077. Retrieved 2023-06-22.

[6] "Text-to-Speech AI: Lifelike Speech Synthesis". Google Cloud. Retrieved 2023-06-22.

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 1: / Line 1: @@
 {{Short description|Screen reader application by Google}}
 {{Infobox software
-| name                   = Speech Services
+| name                   = Speech Recognition & Synthesis
 | logo                   = File:Google Text to Speech logo.svg
 | developer              = [[Google]]
@@ Line 7: / Line 7: @@
 | operating system       = [[Android (operating system)|Android]]
 | released               = {{start date and age|2013|11|13}}
-| latest release version = 20230612.01_p1.540072880
+| latest release version = Version Version googletts.google-speech-apk_20240416.00_p2.627182800(Android 8-14)
-| latest release date    = {{start date and age|2023|6|27}}<ref>{{cite web|url=https://www.apkmirror.com/apk/google-inc/google-text-to-speech-engine/|title=Speech Services by Google APKs|website=APKMirror}}</ref>
+| latest release date    = {{start date and age|2024|04|16}}<ref>{{cite web|url=https://www.apkmirror.com/apk/google-inc/google-text-to-speech-engine/|title=Speech Services by Google APKs|website=APKMirror}}</ref>
 }}
-'''Speech Services'''<!--NOTE: Per convention on Google articles such as [[Wear OS]], [[Socratic (Google)]], and [[Files (Google)]], do NOT add "by Google"--><ref>{{Cite web |last=Wang |first=Jules |date=November 8, 2021 |title=You'll never guess the latest Google app to cross 10 billion installs (seriously) |url=https://www.androidpolice.com/youll-never-guess-the-latest-google-app-to-cross-10-biillion-installs-seriously/ |url-status=live |archive-url=https://web.archive.org/web/20211108221702/https://www.androidpolice.com/youll-never-guess-the-latest-google-app-to-cross-10-biillion-installs-seriously/ |archive-date=November 8, 2021 |access-date=November 18, 2021 |website=Android Police}}</ref> is a [[screen reader]] application developed by [[Google]] for its [[Android (operating system)|Android]] operating system. It powers applications to read aloud (speak) the text on the screen with support for many languages. Text-to-Speech may be used by apps such as [[Google Play Books]] for reading books aloud, by [[Google Translate]] for reading aloud translations providing useful insight to the pronunciation of words, by [[Google TalkBack]] and other spoken feedback accessibility-based applications, as well as by third-party apps. Users must install voice data for each language.
+'''Speech Recognition & Synthesis''', formerly known as '''Speech Services'''<!--NOTE: Per convention on Google articles such as [[Wear OS]], [[Socratic (Google)]], and [[Files (Google)]], do NOT add "by Google"-->,<ref>{{Cite web |last=Wang |first=Jules |date=November 8, 2021 |title=You'll never guess the latest Google app to cross 10 billion installs (seriously) |url=https://www.androidpolice.com/youll-never-guess-the-latest-google-app-to-cross-10-biillion-installs-seriously/ |url-status=live |archive-url=https://web.archive.org/web/20211108221702/https://www.androidpolice.com/youll-never-guess-the-latest-google-app-to-cross-10-biillion-installs-seriously/ |archive-date=November 8, 2021 |access-date=November 18, 2021 |website=Android Police}}</ref> is a [[screen reader]] application developed by [[Google]] for its [[Android (operating system)|Android]] operating system. It powers applications to read aloud (speak) the text on the screen, with support for many languages. Text-to-Speech may be used by apps such as [[Google Play Books]] for reading books aloud, [[Google Translate]] for reading aloud translations for the pronunciation of words, [[Google TalkBack]], and other spoken feedback accessibility-based applications, as well as by third-party apps. Users must install voice data for each language.
 == Supported languages ==
 {{div-col|colwidth=15em}}
-* Afrikaans (South Africa)
 * Albanian (Albania)
 * Arabic
 * Assamese (India)
+* Bengla (Bangladesh)
-* Azerbaijani (Azerbaijan)
-* Basque (Spain)
+* Bengla (India)
-* Bengali (Bangladesh)
+* Bodo (India)
-* Bengali (India)
 * Bosnian (Bosnia and Herzegovina)
 * Bulgarian (Bulgaria)
@@ Line 31: / Line 29: @@
 * Czech (Czech Republic)
 * Danish (Denmark)
+* Dogri (India)
 * Dutch (Belgium)
 * Dutch (Netherlands)
@@ Line 43: / Line 42: @@
 * French (Canadian)
 * French (France)
-* Galician (Spain)
 * German (Germany)
 * Greek (Greece)
@@ Line 56: / Line 54: @@
 * Javanese (Indonesia)
 * Kannada (India)
-* Kazakh (Kazakhstan)
+* Kashmiri (India)
 * Khmer (Cambodia)
+* Konkani (India)
 * Korean (South Korea)
 * Latvian (Latvia)
 * Lithuanian (Lithuania)
+* Maithili (India)
 * Malay (Malaysia)
 * Malayalam (India)
 * Manipuri (India)
 * Marathi (India)
-* Mizo (India)
 * Nepali (Nepal)
 * Norwegian Bokmål (Norway)
+* Odia (India)
 * Polish (Poland)
 * Portuguese (Brazil)
@@ Line 74: / Line 74: @@
 * Romanian (Romania)
 * Russian (Russia)
-* Sanskrit (India)
+* Sanskirt (India)
-* Serbian (Serbia)
+* Santali (India)
-* Sindhi (Pakistan)
+* Sindhi (India)
 * Sinhala (Sri Lanka)
 * Slovak (Slovakia)
-* Slovenian (Slovenia)
-* Spanish (Argentina)
-* Spanish (Chile)
-* Spanish (Colombia)
-* Spanish (Peru)
-* Spanish (Puerto Rico)
 * Spanish (Spain)
 * Spanish (United States)
-* Spanish (Venezuela)
 * Sundanese (Indonesia)
 * Swahili (Kenya)
@@ Line 102: / Line 95: @@
 == History ==
-{{One source|1=section|date=March 2022}}
+{{More citations needed|date=November 2023}}
 Some app developers have started adapting and tweaking their Android Auto apps to include Text-to-Speech, such as [[Hyundai Motor Company|Hyundai]] in 2015.<ref>{{cite web|url=http://www.cnet.com/au/news/google-hyundai-demonstrate-android-autos-new-api/|title=Google, Hyundai show off new third-party Android Auto apps|publisher=CBS Interactive|work=CNET|accessdate=17 January 2015}}</ref> Apps such as textPlus and [[WhatsApp]] use Text-to-Speech to read notifications aloud and provide voice-reply functionality.
-Google Cloud Text-to-Speech is powered by [[WaveNet]],<ref name=":0">{{Cite web |title=WaveNet |url=https://www.deepmind.com/research/highlighted-research/wavenet |access-date=2023-06-22 |website=www.deepmind.com |language=en}}</ref> software created by Google's UK-based AI subsidiary [[DeepMind]], which was bought by Google in 2014.<ref>{{Cite news |last=Gibbs |first=Samuel |date=2014-01-27 |title=Google buys UK artificial intelligence startup Deepmind for £400m |language=en-GB |work=The Guardian |url=https://www.theguardian.com/technology/2014/jan/27/google-acquires-uk-artificial-intelligence-startup-deepmind |access-date=2023-06-22 |issn=0261-3077}}</ref> It tries to distinguish from its competitors, [[Amazon Polly|Amazon]] and [[Microsoft text-to-speech voices|Microsoft]], with distinct AI features.<ref>{{Cite web |title=Text-to-Speech AI: Lifelike Speech Synthesis |url=https://cloud.google.com/text-to-speech |access-date=2023-06-22 |website=Google Cloud |language=en}}</ref>
+Google Cloud Text-to-Speech is powered by [[WaveNet]],<ref name=":0">{{Cite web |title=WaveNet |url=https://www.deepmind.com/research/highlighted-research/wavenet |access-date=2023-06-22 |website=www.deepmind.com |language=en}}</ref> software created by Google's UK-based AI subsidiary [[DeepMind]], which was bought by Google in 2014.<ref>{{Cite news |last=Gibbs |first=Samuel |date=2014-01-27 |title=Google buys UK artificial intelligence startup Deepmind for £400m |language=en-GB |work=The Guardian |url=https://www.theguardian.com/technology/2014/jan/27/google-acquires-uk-artificial-intelligence-startup-deepmind |access-date=2023-06-22 |issn=0261-3077}}</ref> It tries to distinguish from its competitors, [[Amazon Polly|Amazon]] and [[Microsoft text-to-speech voices|Microsoft]].<ref>{{Cite web |title=Text-to-Speech AI: Lifelike Speech Synthesis |url=https://cloud.google.com/text-to-speech |access-date=2023-06-22 |website=Google Cloud |language=en}}</ref>
-DeepMind's AI voice synthesis tech is notably advanced and realistic. Most voice synthesizers (including Apple's [[Siri]]) use [[concatenative synthesis]],<ref name=":0" /> in which a program stores individual [[Phoneme|phonemes]] and then pieces them together to form words and sentences.
+Most voice synthesizers (including Apple's [[Siri]]) use [[concatenative synthesis]],<ref name=":0" /> in which a program stores individual [[Phoneme|phonemes]] and then pieces them together to form words and sentences.
+WaveNet synthesizes speech with human-like emphasis and inflection on syllables, phonemes, and words.
+Unlike most other text-to-speech systems, a WaveNet model creates [[Raw audio format|raw audio waveforms]] from scratch. The model uses a neural network that has been trained using a large volume of speech samples. During training, the network extracts the underlying structure of the speech, such as which tones follow each other and what a realistic speech waveform looks like. When given a text input, the trained WaveNet model can generate the corresponding speech waveforms from scratch, one sample at a time, with up to 24,000 samples per second and smooth transitions between the individual sounds.<ref name=":0" />
+The service was renamed Speech Recognition & Synthesis in 2023.{{cn|date=August 2023}}
-A WaveNet generates speech that sounds more natural than other text-to-speech systems. It synthesizes speech with more human-like emphasis and inflection on syllables, phonemes, and words. On average, a WaveNet produces speech audio that people prefer over other text-to-speech technologies.
-Unlike most other text-to-speech systems, a WaveNet model creates raw audio waveforms from scratch. The model uses a neural network that has been trained using a large volume of speech samples. During training, the network extracts the underlying structure of the speech, such as which tones follow each other and what a realistic speech waveform looks like. When given a text input, the trained WaveNet model can generate the corresponding speech waveforms from scratch, one sample at a time, with up to 24,000 samples per second and seamless transitions between the individual sounds.<ref name=":0" />
 == See also ==
@@ Line 126: / Line 120: @@
 {{Android (operating system)}}
-[[Category:Google services|Speech Services]]
+[[Category:Google services|Speech Recognition & Synthesis]]
 [[Category:Screen readers]]
-[[Category:Computer-related introductions in 2013]]
+[[Category:Internet properties established in 2013]]

Latest revision as of 11:54, 18 May 2024

Supported languages[edit]

History[edit]

See also[edit]

References[edit]

External links[edit]