Thursday, May 8, 2014

AVSpeechSynthesizer example iOS7

There is no doubt that iOS has a very rich set of APIs and that richness is getting just better with each release of the OS. Blocks, ARC, GCD, Background Fetch etc are making the things a lot more easier and thus makes the app more intuitive in terms of performance and in terms of usability.

Since long people/developers have wanted to incorporate the voice interaction into their app and this made devs use other frameworks like OpenEar etc which is quite difficult to implement in real sense.

But, with the introduction of iOS 7, iOS has received a set of APIs to accomplish the text to voice functionality that too in different voices, based on the devices current locale or can be set through code. So, lets dive into it.

AVSpeechSynthesizer is the class under AVFoundation framework which is used to leverage the text to voice functionality. There are a couple of other classes involved as well to handle the things like speed, pitch, voice etc.

The text to voice can be implemented in simple four lines of code as below,


AVSpeechUtterance *utterence = [[AVSpeechUtterance allocinitWithString:@"Hi there."];
utterence.rate = 0.25f;
AVSpeechSynthesizer *synthesizer = [[AVSpeechSynthesizer allocinit];
[synthesizer speakUtterance:utterence];

And wollahh!! you are done.

But that is not all. 
Notice that we have created an AVSpeechUtterance object. Utterance basically defines the kind of speech you want including pitch, speed, voice, volume, pre and post delays between speeches.
Right now we have just defined an utterance with the speed(.25) we wanted, and we can add above properties to the speech as needed. The speed can vary from 0.0f (super slow) to 2.0f(super fast). Take a look at the class for more on it.

Also, one thing to be noted about AVSpeechSynthesizer is that if there is an ongoing speech and you ask the synthesizer to speak another text, this text will be added in a queue and will be spoken after the first speech is finished. So, in summary only one text/speech can be spoken at a time, rest are added in queue and dequeued on first come first serve (FCFS).

We have just made the Siri say few words in devices current locale voice. Now lets go ahead an change the voice on demand which it can be done as below, 

AVSpeechSynthesisVoice *voice = [AVSpeechSynthesisVoice voiceWithLanguage:@"zh-CN"];
[utterence setVoice:voice];

Notice that we have created a voice object and have set the voice to utterance. So, again its the utterance that has all the attributes of a speech. Also, in above, we have set the voice to Chinese but that doesn't mean that out English text will be converted to Chinese and Siri will speak in Chinese :).
It simply means that the voice will be like a English spoken by a Chinese person, in short voice is the accent of speech.

Voice strings can be like en-US (English- United States), ar-SA (Arabic- Saudi Arabia), fr-FR (French - France), zh-CN (Chinese - China) and many more.

AVSpeechSynthesizer does have a set of delegate methods in protocol AVSpeechSynthesizerDelegate, which keeps you posted on the current state of the speech, started, ended, cancelled, the word that will be spoken next etc. However all the delegate methods are optional. Below are the list of delegate methods,

- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didStartSpeechUtterance:(AVSpeechUtterance *)utterance;
- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didFinishSpeechUtterance:(AVSpeechUtterance *)utterance;
- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didPauseSpeechUtterance:(AVSpeechUtterance *)utterance;
- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didContinueSpeechUtterance:(AVSpeechUtterance *)utterance;
- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didCancelSpeechUtterance:(AVSpeechUtterance *)utterance;
- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer willSpeakRangeOfSpeechString:(NSRange)characterRange utterance:(AVSpeechUtterance *)utterance;


Checkout the below video to see the AVSpeechSynthesizer in action in a demo app.

(Sorry that the sound is not there, as it is the screen recording,  but you can checkout the code an see for yourself)


The source code of this demo app can found at my Github page here.

Thanks for checking it out and happy coding :)