Monday, April 07, 2008

Dictate, Round Two

by Marc Zeedar macopinion@designwrite.com

In my last column I wrote about MacSpeech is Dictate software. I had not intended that to be a review, per se - it was really my first impressions of the software. While I think those first impressions were accurate, they don't really reflect the actual usefulness of the software as it might be after you've trained it and are more familiar with how it works. So I decided to take another look and write a little bit more about the software. I think that is only fair.

I am writing this column with my voice and I'm using a headset microphone instead of my laptops built in microphone. I haven't yet decided if I'm going to edit this text after I write it. I may just leave it as it is, verbatim. The software actually works well enough now that other than a few specialized words and phrases it doesn't need much editing. I'm actually surprised at how much better it works with the headset microphone: it really does make a significant difference.

I'm in a very quiet place and I thought the built-in microphone would be okay. But the headset definitely improves recognition. When I use the old software, I listen, it did not make much difference whether I used a headset or not: either way there were so many errors it was useless. With Dictate, the recognition is almost flawless with a headset.

As you can already tell there are very few errors in this text. However, it is not perfect. It made a mistake in the first line where did not make the word "MacSpeech" possessive (possessives seem to be a consistent problem). In the last paragraph, it did not know the product name "iListen." (Wow! It worked that time. Weird.)

When I used Dictate the first time I often found myself frustrated because it had trouble recognizing ordinary words. For instance, it could not tell the difference between bat and cat and fat. I had thought that was a problem with the speech recognition, but it now seems it was just a microphone issue. When ordinary words give you such trouble it really slows down the usage of speech recognition and I found the product frustrating to use and slower than typing. But now that I'm using it with a headset I find I can talk at almost normal speeds and it will keep up with me, and the accuracy is excellent.

Another significant issue with speech recognition is that for it to work the best it needs in the context of the surrounding words. This is how it is able to understand the difference between two words that sound alike and pick the right one. Unfortunately, when the software isn't recognizing your words, you tend to slow your speech down and speak one word at a time and that as a side effect of making overall speech-recognition worse because there's less context for it to interpret. Now that I'm using the headset and it's recognizing simpler words easier and I can speak at a normal rate I find that I'm able to feed it longer sentences and the accuracy of ambiguous words is significantly better.

Some of the points I made in my first impressions review are still valid, however. I'm still not convinced at dictates usefulness for fiction. Fiction is too broad and complex -- character names, unusual punctuation, jargon, slang, foreign expressions, even made up words, are all common in fiction writing. Dictate does have a mode where it can import samples of your writing and analyze them to help it recognize your particular style of writing. That's a good idea and I think it can help in some ways, but it only works if you already have writing for it to import. If you're starting a new novel with brand-new characters, for example, there's nothing for it to read.

When Dictate does make mistakes they are often subtle. In some ways the increased accuracy and getting with the headset is worse simply because it works so well much of the time that I miss errors as I assume it's correct. This has happened to me several times in my experimentations where it inserted a "the" instead of an "and" or something similar. If I'm not watching carefully or proofreading I might miss such an error. Because the words are spelled correctly they are unlikely to be caught with a spelling checker and because the errors are so minor they are easy to overlook and could be published.

Another issue is that I don't see a way to correct the program when it makes a mistake, to teach it the right way. It's frustrating when it keeps suggesting the same wrong word over and over each time when you say a word. I don't know if the program learns from what you're saying. For instance, if it makes a suggestion and I tell it to forget it, is it learning from that correction? If so, what happens when I edit myself and simply change my mind about what I wanted to write? Does it think it made a mistake in recognition? I don't want to get in the habit of teaching it incorrectly. Perhaps it does learn from my corrections, but it isn't very obvious -- I would prefer a more manual method of correcting it, such as a menu command that tells it the last recognized word was incorrect.

Overall, however, I am pleased with the software. I don't know that I would bother with it without the headset -- there were just too many errors and it was too frustrating to use to make it worth the trouble. With the headset, however, it's surprisingly useful. I could definitely see myself using it for certain types of work or if my fingers or wrists were bothering me from too much typing. I have had minor bouts of repetitive stress injury in the past and it's always been a concern of mine that such a thing could disable me. With Dictate, I can write with my voice and rest my hands. That's pretty impressive. (Though I still find having to use a headset a bother.)

I'm also curious about the amount of time it takes to actually write something using voice. When I tested this before it felt much slower than typing it manually. I was having to speak slowly and enunciate and repeat phrases and words so often and fight with the dictation software that even if it didn't take longer it felt like it. So with this article I am keeping track of my time: I started writing this at 6:43 PM and it is now 7:18 PM. So about 30 minutes, which is not bad. I doubt I could have typed this much faster (keeping in mind I'm composing thoughts as I type or speak, not just typing text that's already written). With that kind of speed, if the errors are minimal or acceptable, I could see myself using this for e-mails or other kinds of routine correspondence. That's pretty cool.

As promised, I have not made any manual corrections to this text. I did make corrections with voice if I saw that it misunderstood something I said. But generally speaking, simply repeating the phrase and enunciating made it recognize what I said. That's much different than before when I would say the phrase 10 times and it would give me 10 different responses (all of them wrong).

I do think there are some improvements to the software that can be made, but overall I am much more impressed by it now than I was before. Before it was like hearing a dog talk: it was just impressive that he could do it at all but it wasn't practical or useful. Now the recognition is accurate enough I think I could actually use this in day-to-day work. If you look through this column you will notice that it is recognizing things like dates and times, trademarks and product names, and some interesting and unusual turns of phrase. That's amazing.

I will continue to play with the software and experiment with it on various projects and perhaps I will dictate more on the subject in the future.

macopinion@designwrite.com

Posted by Charles in • Less Tangible
(4) CommentsPermalink
Page 1 of 1 pages