Voice Recognition - Will It Throw Transcription Companies On The Scrap Heap?
But then I would say that, wouldn't I? Below are some of the reasons why I remain unconvinced that it will replace me quite yet.
Voice recognition software currently comes in two forms - Dragon Naturally Speaking and IBM's ViaVoice.
As well as being used to give a computer commands (e.
g.
to save a file), it can also be used to convert speech straight into a Word document.
However, the main disadvantage is that voice recognition software is a 'dog for one master' only.
It's possible to use the software successfully for dictation but it can't cope with even one-to-one interviews and would probably go into meltdown if you tried it with group meetings or focus groups.
The software needs to be trained to become used to one voice.
Asking it to recognise and accurately transcribe the different voice of your interviewee as well as your own questions, or the multiple voices of a group, is simply not possible at the moment.
Even with one voice dictation, the software still needs to be corrected and trained to recognise new and unfamiliar words or technical terms or names.
The only way to use voice recognition software effectively for an interview situation would be to listen to the recording and re-speak everything you hear - both questions and responses.
Obviously, this would be very time consuming, having to stop and start the recording so that you can 'speak' what you're hearing.
Essentially, you're attempting to perform simultaneous translation - concentrating on listening to someone else's speech, while saying those words a few seconds later but without losing what's being said next.
Try it with a recorded TV or radio programme and you'll see how difficult a skill that is to master.
Add on the time taken at the end to proofread and tidy up the Word document and it becomes clear that it would take far longer than the original interview length, and no different from the time taken by a professional transcriber to complete the transcription.
Is that a valuable and productive use of your time?Magnify that time and effort many times if you then tried to tackle 'speaking' a focus group, especially if the recording is less than clear.
Another major factor when considering speech recognition software is its inability to judge which homonyms should be used.
The latest versions are apparently now capable of recognising the more common ones in simple sentences, such as deciding whether it's there or their.
However, long, complicated sentences can defeat it.
All this adds to your proofreading time at the end.
You also need to consider how the software will tackle commands for punctuation or formatting certain words in bold, for example.
Most programmes require you to leave a pause between the command and the next chunk of 'text'.
If the pause isn't long enough, you'll find those commands entered as part of the text itself, which will need to be edited out later.
For an interview format, you will need to indicate a change of speaker with a new line and initials, and a tab command.
Even if you just decide to enter a change of speaker on a new line and tidy it up afterwards, this all adds to the time taken.
Allegedly, one of the latest versions (Naturally Speaking) will also punctuate for you - deciding where all the commas and full stops go.
Be prepared to correct this later - its idea of punctuation is not mine! 'Naturally Speaking', as it were, I'm biased! But I'm convinced that voice recognition software is a useful tool fit for a specific purpose, but that purpose is not yet transcription of all recordings in all circumstances.