Jeff Noxon
2003-Jul-15 09:07 UTC
[Asterisk-Users] Poll - Would you pay $30-$50 for high quality speech synthesis?
Many of you are familiar with how lousy Festival sounds. AT&T has a product, NaturalVoices, that sounds much better. There are male & female voice fonts for US/UK/Indian English, French, Spanish, and German. I am considering offering a linux-based text-to-speech engine based on the NaturalVoices runtime. An asterisk module would also be provided, making it easy to add natural sounding synthesis to Asterisk applications. You could also use it for other purposes, such as home automation. After discussing royalties with AT&T, I have concluded that I can probably offer such a product at the following prices: Runtime - $30 intro price with one voice font & one processor Extra voices/languages - $15 each Extra processors - $15 each Depending on demand, the price may rise to $50 at some point. The lower the demand, the higher the price, due to AT&T's royalty structure. There is a demo of the synthesis engine here: http://www.naturalvoices.att.com/demos/ If you would be willing to pay for this kind of software, please e-mail me privately (not to the list). Please indicate your purchase timeframe as well as the number of licenses, extra voices, and processors you would want. Regards, Jeff / 'Bicster' on IRC
Michael Bielicki
2003-Jul-15 10:44 UTC
[Asterisk-Users] Poll - Would you pay $30-$50 for high quality speech synthesis?
we would buy it even just for emergency prompts generation but we would need different languages, like polish and danish as well :)) On Tuesday 15 July 2003 17:07, Jeff Noxon wrote:> Many of you are familiar with how lousy Festival sounds. > > AT&T has a product, NaturalVoices, that sounds much better. There are > male & female voice fonts for US/UK/Indian English, French, Spanish, > and German. > > I am considering offering a linux-based text-to-speech engine based on > the NaturalVoices runtime. An asterisk module would also be provided, > making it easy to add natural sounding synthesis to Asterisk applications. > You could also use it for other purposes, such as home automation. > > After discussing royalties with AT&T, I have concluded that I can probably > offer such a product at the following prices: > > Runtime - $30 intro price with one voice font & one processor > Extra voices/languages - $15 each > Extra processors - $15 each > > Depending on demand, the price may rise to $50 at some point. The lower > the demand, the higher the price, due to AT&T's royalty structure. > > There is a demo of the synthesis engine here: > > http://www.naturalvoices.att.com/demos/ > > If you would be willing to pay for this kind of software, please e-mail > me privately (not to the list). Please indicate your purchase timeframe > as well as the number of licenses, extra voices, and processors you > would want. > > Regards, > > Jeff / 'Bicster' on IRC > _______________________________________________ > Asterisk-Users mailing list > Asterisk-Users@lists.digium.com > http://lists.digium.com/mailman/listinfo/asterisk-users-- Michael Bielicki Managing Director TAAN Consultants Ltd http://www.global-gateway.net/ -------------------------------------------------------------------------- This correspondence is for the named person's use only. It may contain confidential or legally privileged information or both. No confidentiality or privilege is waived or lost by any mistransmission. If you receive this correspondence in error, please immediately delete it from your system and notify the sender. You must not disclose, copy or rely on any part of this correspondence if you are not the intended recipient. Any opinions expressed in this message are those of the individual sender.
Greg Renouf
2003-Jul-15 11:36 UTC
[Asterisk-Users] Poll - Would you pay $30-$50 for high quality speech synthesis?
Wow, I thought they charged for this application on a per-channel basis. I would spend $50 for this in a heartbeat. I would need English, Dutch, German, Spanish, Arabic & Russian... -GSR ----- Original Message ----- From: "Jeff Noxon" <jeff-asterisk@planetfall.com> To: <asterisk-users@lists.digium.com> Sent: Tuesday, July 15, 2003 6:07 PM Subject: [Asterisk-Users] Poll - Would you pay $30-$50 for high quality speech synthesis?> Many of you are familiar with how lousy Festival sounds. > > AT&T has a product, NaturalVoices, that sounds much better. There are > male & female voice fonts for US/UK/Indian English, French, Spanish, > and German. > > I am considering offering a linux-based text-to-speech engine based on > the NaturalVoices runtime. An asterisk module would also be provided, > making it easy to add natural sounding synthesis to Asterisk applications. > You could also use it for other purposes, such as home automation. > > After discussing royalties with AT&T, I have concluded that I can probably > offer such a product at the following prices: > > Runtime - $30 intro price with one voice font & one processor > Extra voices/languages - $15 each > Extra processors - $15 each > > Depending on demand, the price may rise to $50 at some point. The lower > the demand, the higher the price, due to AT&T's royalty structure. > > There is a demo of the synthesis engine here: > > http://www.naturalvoices.att.com/demos/ > > If you would be willing to pay for this kind of software, please e-mail > me privately (not to the list). Please indicate your purchase timeframe > as well as the number of licenses, extra voices, and processors you > would want. > > Regards, > > Jeff / 'Bicster' on IRC > _______________________________________________ > Asterisk-Users mailing list > Asterisk-Users@lists.digium.com > http://lists.digium.com/mailman/listinfo/asterisk-users >
Florian Overkamp
2003-Jul-15 12:48 UTC
[Asterisk-Users] Poll - Would you pay $30-$50 for high quality speech synthesis?
At 20:36 15-7-2003 +0200, you wrote:>Wow, I thought they charged for this application on a per-channel basis. > >I would spend $50 for this in a heartbeat. I would need English, Dutch, >German, Spanish, Arabic & Russian...I second that motion, my requirement would be Dutch support... Florian
Scott Stingel
2003-Jul-15 13:12 UTC
[Asterisk-Users] high quality speech synthesis - AT&T vs the others
Hi Jeff- You might be interested to know that over the last couple of months I've conducted (in the Windows world) an informal side-by-side comparison of AT&T's Natural Voices product, with Lernout and Hauspies' (now ScanSoft) RealSpeak. My small sample consisted of a few news clippings spoken with the several male and female voices available on both systems. I had about 25 people call in (mostly from the UK and US), and asked them afterwards which they preferred in "tone" and which they found the most understandable. Most people thought that the AT&T Male voice was the most natural and pleasant to listen to, while ScanSoft's female voice had the highest "readability". I think this was due to the fact that the AT&T product occasionally ran words together - if they have fixed this bug, AT&T may come out fully on top. However, the L&H (Scansoft) product is already available in many more languages (19), and so you might also give them a call to talk about licensing before you proceed, since many of the people in the Linux world may require languages that AT&T doesn't have. Anyway, I don't mean to dampen the enthusiasm for this undertaking, as we all would benefit! Please clarify if the pricing you mention is per system, or per-port. Thanks, Scott Stingel Scott M. Stingel Emerging Voice Technology Inc. Palo Alto California & London England Email: scott@evtmedia.com URL: www.evtmedia.com> -----Original Message----- > From: asterisk-users-admin@lists.digium.com > [mailto:asterisk-users-admin@lists.digium.com] On Behalf Of Jeff Noxon > Sent: Tuesday, July 15, 2003 5:07 PM > To: asterisk-users@lists.digium.com > Subject: [Asterisk-Users] Poll - Would you pay $30-$50 for > high quality speech synthesis? > > > Many of you are familiar with how lousy Festival sounds. > > AT&T has a product, NaturalVoices, that sounds much better. There are > male & female voice fonts for US/UK/Indian English, French, Spanish, > and German. > > I am considering offering a linux-based text-to-speech engine based on > the NaturalVoices runtime. An asterisk module would also be provided, > making it easy to add natural sounding synthesis to Asterisk > applications. > You could also use it for other purposes, such as home automation. > > After discussing royalties with AT&T, I have concluded that I > can probably > offer such a product at the following prices: > > Runtime - $30 intro price with one voice font & one processor > Extra voices/languages - $15 each > Extra processors - $15 each > > Depending on demand, the price may rise to $50 at some point. > The lower > the demand, the higher the price, due to AT&T's royalty structure. > > There is a demo of the synthesis engine here: >http://www.naturalvoices.att.com/demos/ If you would be willing to pay for this kind of software, please e-mail me privately (not to the list). Please indicate your purchase timeframe as well as the number of licenses, extra voices, and processors you would want. Regards, Jeff / 'Bicster' on IRC _______________________________________________ Asterisk-Users mailing list Asterisk-Users@lists.digium.com http://lists.digium.com/mailman/listinfo/asterisk-users
Chris Albertson
2003-Jul-15 13:41 UTC
[Asterisk-Users] Poll - Would you pay $30-$50 for high quality speech synthesis?
--- Jeff Noxon <jeff-asterisk@planetfall.com> wrote:> Many of you are familiar with how lousy Festival sounds. > > AT&T has a product, NaturalVoices, that sounds much better. There > are > male & female voice fonts for US/UK/Indian English, French, Spanish, > and German.Festival only sounds bad because you are using it in a very simple way. It comes with a "demo" text to speech application that really is no more then a demo but most people just use the demo app and think Festival itself sounds bad. You need to read a bit more. Also at the cmu.edu web site are some tools for building your own voices. These can sound very good and speeak other languages. If you work at it the sound can be very natural. ====Chris Albertson Home: 310-376-1029 chrisalbertson90278@yahoo.com Cell: 310-990-7550 Office: 310-336-5189 Christopher.J.Albertson@aero.org KG6OMK __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
Jared Smith
2003-Jul-15 17:28 UTC
[Asterisk-Users] Poll - Would you pay $30-$50 for high quality speech synthesis?
Would you mind giving us a few examples on how we can make festival sound better? (Some sample festival configs would be nice!) Jared On Tue, 2003-07-15 at 14:41, Chris Albertson wrote:> --- Jeff Noxon <jeff-asterisk@planetfall.com> wrote: > > Many of you are familiar with how lousy Festival sounds. > > > > AT&T has a product, NaturalVoices, that sounds much better. There > > are > > male & female voice fonts for US/UK/Indian English, French, Spanish, > > and German. > > Festival only sounds bad because you are using it in a very simple way. > It comes with a "demo" text to speech application that really is no > more then a demo but most people just use the demo app and think > Festival itself sounds bad. You need to read a bit more. Also at > the cmu.edu web site are some tools for building your own voices. > These can sound very good and speeak other languages. > If you work at it the sound can be very natural. > > ====> Chris Albertson > Home: 310-376-1029 chrisalbertson90278@yahoo.com > Cell: 310-990-7550 > Office: 310-336-5189 Christopher.J.Albertson@aero.org > KG6OMK > > __________________________________ > Do you Yahoo!? > SBC Yahoo! DSL - Now only $29.95 per month! > http://sbc.yahoo.com > _______________________________________________ > Asterisk-Users mailing list > Asterisk-Users@lists.digium.com > http://lists.digium.com/mailman/listinfo/asterisk-users
Chris Albertson
2003-Jul-15 17:31 UTC
[Asterisk-Users] Poll - Would you pay $30-$50 for high quality speech synthesis?
Sorry, I left out the URL. Here it is http://fife.speech.cs.cmu.edu/festival/ If interrested look at "demos" on the top menu bar. You see what is involved in building new databases for new languages or speaker voices for Festival. When you complain about the sound you are really complaining about the database quality or the quality of in mark-up embedded in the input text. In building a new database first you will need to find a human with a nice sounding voice who speaks the language with the "correct" regional accent. That may be the hardest part but after that it seems to take about a day or two --- Chris Albertson <chrisalbertson90278@yahoo.com> wrote:> > --- Jeff Noxon <jeff-asterisk@planetfall.com> wrote: > > Many of you are familiar with how lousy Festival sounds. > > > > AT&T has a product, NaturalVoices, that sounds much better. There > > are > > male & female voice fonts for US/UK/Indian English, French, > Spanish, > > and German. > > Festival only sounds bad because you are using it in a very simple > way. > It comes with a "demo" text to speech application that really is no > more then a demo but most people just use the demo app and think > Festival itself sounds bad. You need to read a bit more. Also at > the cmu.edu web site are some tools for building your own voices. > These can sound very good and speeak other languages. > If you work at it the sound can be very natural.====Chris Albertson Home: 310-376-1029 chrisalbertson90278@yahoo.com Cell: 310-990-7550 Office: 310-336-5189 Christopher.J.Albertson@aero.org KG6OMK __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
Matthew John Darnell
2003-Jul-15 19:02 UTC
[Asterisk-Users] Poll - Would you pay $30-$50 for high quality speech synthesis?
> If you would be willing to pay for this kind of software, please e-mail > me privately (not to the list). Please indicate your purchase timeframe > as well as the number of licenses, extra voices, and processors you > would want.It should be simple, call an executable, with two parameters, the file where the text is and what to name the output sound file. I would pay for that. We could sell at least 10-15 per month. -Matt
Steve Underwood
2003-Jul-15 21:47 UTC
[Asterisk-Users] Poll - Would you pay $30-$50 for high quality speech synthesis?
Jeff Noxon wrote:>Many of you are familiar with how lousy Festival sounds. > >AT&T has a product, NaturalVoices, that sounds much better. There are >male & female voice fonts for US/UK/Indian English, French, Spanish, >and German. > >I am considering offering a linux-based text-to-speech engine based on >the NaturalVoices runtime. An asterisk module would also be provided, >making it easy to add natural sounding synthesis to Asterisk applications. >You could also use it for other purposes, such as home automation. > >After discussing royalties with AT&T, I have concluded that I can probably >offer such a product at the following prices: > >Runtime - $30 intro price with one voice font & one processor >Extra voices/languages - $15 each >Extra processors - $15 each > >Depending on demand, the price may rise to $50 at some point. The lower >the demand, the higher the price, due to AT&T's royalty structure. >Maybe you are right, but take great care with this. You can get packaged versions of Natural Voices cheaply for desktop applications. However, when you want to use it for telephony systems it usually costs more like $600-$700 per port. There are also big differences in the way ports are counted by different vendors. For example, the per port pricing for RealSpeak (which is not realated to Naturally Speaking) and Speechify (Speechworks derivative of Naturally Speaking) is not too different, but the final bill may be. With Realspeak, if you have 1000 ports, and only use TTS a little you still pay for 1000 ports. With Speechify you pay for the maximum current channels you will have speaking at any instant. Unless your system is very TTS heavy, this makes a huge difference. I last worked heavily with these TTS engines about two years ago. They have improved, but I don't think by that much. Speechify was a lot more functional then Naturally Speaking, as its front end language processing was more complete. Naturally Speaking read too many things in the wrong way (a lot of other TTSs did too). The various Naturally Speaking derivatives are not all equal. Naturally Speaking is itself a derivative of Festival. Look in the directories, and you still see lots of Festival files. Cepstral and Rhetorical Systems both have impressive sounding TTS based on Festival. Festival seems to be the root of most things other than RealSpeak and Eloquence. Eloquence seems pretty much the only mainstream package which does things differently, and actually synthesizes voice from basic principals. Two years ago we deployed systems using RealSpeak, Speechify and Eloquence. People hated the robotic quality of Eloquence, but could understand it clearly (at least the English one - the Mandarin version sounded terrible). People liked the natural sound of RealSpeak, but couldn't understand it very well - they could follow paragraphs of text OK, but ask them about a specific thing that was said, like a street name, and their accuarcy was very poor. Speechify was somewhere in between, but tending towards RealSpeak. In the end, adverse user reaction made us rip out all the TTS and abandon attempts to use it. Some pointers from working with this stuff: - First impressions are a bad indicator of true quality, due to the next point. You need to play with these things for a while, and see how they behave in real world use, before you can really evaluate their usefulness. - In current TTS systems (all of them), natural sounding tned to equate with hard to understand. Most TTS systems basically use a database of recorded snippets, and blend them to form speech. The longer the snippets, the smoother and more natural the sound, but the worse is its accuracy. Short snippets allow more flexibility in sculpting the result. giving better intelligibility, but making the sound more robotic. - If your application is reading long tracts of text, the natural sounding TTSs do fairly well. The words you don't hear clearly are naturally filled in by your brain from the context. If your application is reading out addresses, the more robotic systems do better - I found Eloquence does the best for this. - Don't underestimate the importance of the front end language processor. Most offerings deal with this part poorly. They all have demos that show how well the sysem will read things like currency and dates. Try feeding those texts to other vendor's TTS engines. The results can be quite interestings. The demos only contain examples of things the particular engine does well, and they have all focussed on getting different things right. - You put together a system you think is really neat. Users initially think it pretty neat too. Then those same users gradually abandon the system as they find its limitations. - Watch our for resource usage. You might expect these things to hog the CPU. They don't. However, they take hundreds of megs of disk (OK), and some (like Naturally Speak and Speechify) needed it all in RAM at once to work well (not so OK). So, you had to allow more than 200MB of RAM per voice. This may have been improved in newer versions of Speechify, but I don' t think Naturally Speaking has changed much in that time. Regards, Steve