Greetings list,
I am working on a project on which we wish to use Speex with Google Automatic
Speech
Recognition (ASR) to transcribe Speex audio being sent on to Google ASR service
and return
us the text of the spoken audio in the Speex audio stream. However, Google
ASR's Speex
support requires the off-standard Speex-with-header-byte format, and my group
cannot find
any worthwhile documentation on how we should properly encode that format.
For educational value, we have initially referred to the following blog post,
which mostly
focuses on using FLAC for Google ASR:
<http://mikepultz.com/2011/03/accessing-google-speech-api-chrome-11/>
That article *does* mention the following project on GitHub which can write
successfully
a Speex-with-header-byte format file that we have confirmed to some degree that
Google ASR
will accept and render text of spoken audio:
<https://github.com/QXIP/Speex-with-header-bytes>
However, we have a chunk of our own code which attempts to duplicate that
project in a new
way, specifically for a Cocoa/Objective-C application, and unfortunately, it
does not yet
seem to yield data that Google ASR is willing to accept (we get "Bad
Data" errors back if
we send this data to them). I am permitted by my group to share with you the
following
body of code:
CODE BELOW:
SpeexRecorder::SpeexRecorder()
{
mFileCount = 0;
mRecordPacket = 0;
mRecordData = NULL;
mAudioStreamer = NULL;
int sampling_rate = 16000;
memset(&bits_, 0, sizeof(bits_));
speex_bits_init(&bits_);
encoder_state_ = speex_encoder_init(&speex_wb_mode);
speex_encoder_ctl(encoder_state_, SPEEX_GET_FRAME_SIZE,
&samples_per_frame_);
int quality = kSpeexEncodingQuality;
speex_encoder_ctl(encoder_state_, SPEEX_SET_QUALITY, &quality);
int vbr = 1;
speex_encoder_ctl(encoder_state_, SPEEX_SET_VBR, &vbr);
memset(encoded_frame_data_, 0, sizeof(encoded_frame_data_));
}
SpeexRecorder::~SpeexRecorder()
{
speex_bits_destroy(&bits_);
speex_encoder_destroy(encoder_state_);
}
void SpeexRecorder::WriteToFile(int16 * buf, int count)
{
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
count -= (count % samples_per_frame_);
for (int i = 0; i < count; i += samples_per_frame_)
{
speex_encode_int(encoder_state_, (spx_int16_t*)buf, &bits_);
int frame_length = speex_bits_write(&bits_, encoded_frame_data_ + 1,
kMaxSpeexFrameLength);
encoded_frame_data_[0] = static_cast<char>(frame_length);
speex_bits_reset(&bits_);
NSUserDefaults *defs = [NSUserDefaults standardUserDefaults];
NSData *dataToSend = [NSData dataWithBytes:encoded_frame_data_
length:frame_length];
NSArray *array = [NSArray arrayWithObjects:dataToSend, [defs
objectForKey:@"inLang"], nil];
NSLog(@"WriteToFile -> dataToSend: [%d]", [dataToSend length]);
[mAudioStreamer performSelectorOnMainThread:@selector(sendDataToServer:)
withObject:array waitUntilDone:YES];
}
[pool drain];
}
void SpeexRecorder::OpenNextFile()
{
mFileCount++;
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
NSUserDefaults *defs = [NSUserDefaults standardUserDefaults];
if (mAudioStreamer)
{
// send 0 bytes to stream to signify the end
// stream will be closed, causing http chunking to end and google to respond
NSData *dataToSend = [NSData dataWithBytes:0 length:0];
NSLog(@"OpenNextFile -- #%d# -- [%d] bytes", mFileCount, [dataToSend
length]);
NSArray *array = [NSArray arrayWithObjects:dataToSend, [defs
objectForKey:@"inLang"], nil];
[mAudioStreamer performSelectorOnMainThread:@selector(sendDataToServer:)
withObject:array waitUntilDone:YES];
}
else
{
mAudioStreamer = [[AudioStreamer alloc] init];
NSLog(@"OpenNextFile -- #%d# -- [%d] bytes", mFileCount, 0);
[mAudioStreamer performSelectorOnMainThread:@selector(_setupConnection:)
withObject:[defs objectForKey:@"inLang"] waitUntilDone:YES];
}
[pool drain];
}
CODE ABOVE:
Is there anything obvious in my code here that we may have missed? I greatly
appreciate
any help you can offer, I apologize in-advance if this is the wrong place to
post such
messages, and finally I understand that this off-standard (likely) way of
encoding Speex
may not be supportable by the members viewing this list and place no particular
weight on
lack of response or lack of ability for you kind folks to help us with this
problem.
Thanks in-advance for your time and willingness to consider our situation!
--Quinn Ebert