by A Cassandra.
If you can help, please pass this on.
Shorthand typist revived.
I had thought of calling this digging up the shorthand typist because my first piece here at WdC was SHT later renamed short hand typist. It was short not very clear and contained a plea for help that I will be repeating.
It started by badly describing pitman shorthand and the role of the shorthand typist in one overly short paragraph that read.
`in the 1950s a shorthand typist would listen to her boss, and copy what he said in a pad as pitman symbols. Symbols that represent the sound he uttered (not the correctly spelled words). These symbols would then be converted into a perfectly typed document.'
Whilst this covered the important point that pitman shorthand is a way of recording spoken sounds as marks. I missed completely the opportunity to speculate as to what they might be writing whilst her moron of a boss was dictating the same proforma letter for the 400th time. I clearly missed the very important point, without which shorthand could not work. People speak at about one hundred and sixty words per minute! 160 wpm! 2.7 words per second. It is important that I impress upon you that in terms of computer operation this is very, very, slow.
Telecom and sound engineers will tell you that it takes a sample rate of at least 3 Khz to record the human voice. But I have just told you the shorthand typists took dictation at 160 wpm, less than 3 words per second. You might think the difference is because those women had brains. Yes, it is obviously true. And also it is true that half of those women will be more intelligent than the bloke droning on at them, but that is not the reason for the discrepancy. The difference between the two figures is down to the difference between sounds and words. Sound is the communication channel through which words are communicated.
Picture this Tony and Sid get their copies of the shorthand typist program. Sid is a hassled boss, program loaded, he tries it. He speaks at the microphone, expecting word salad. Watching the screen to see the letters appear one at a time, he instead sees the sentence he spoke appear as a block of correct text. After playing with it for a bit he notices it as an odd problem with false homophones, such as an eye for a high. This would only be a problem for early versions of the program. As a professional, eventing gets proofread anyway. Tony knew it would not work out of the box for him, he has a disability that makes his speech slow, and hard to understand for people who do not know him. The installer had left his family a text-to-speech program that they could adjust to make it sound like Tony. The installer sets up the shorthand typist program and uses the text-to-speech program to train it in about fifteen minutes. Incidentally, that would be the same text-to-speech engine as would have been used to train the out-of-the-box version.
In the original post I then went on to describe how it would work, in just six sentences. Here is a slightly less brief explanation of how it will work.
Overview: in essence, the shorthand typist program follows the same steps as a shorthand typist of old, would have done. But it is a lot less intelligence at the input end.
The input from the microphone must first have all of the sounds lower in pitch than 80 Hz or higher than 3000 Hz removed, in order to remove a lot of the background noise that can not be speech. The next thing to filter out is the pitch and tone of the voice. Although these qualities of speech are essential for listeners to understand the subtle nuances of what is being said, they are of no use when it comes to the question of which word is being used. This depersonalizing anonymizing, action is done by adding multiple short echoes, to a maximum delay of around 10 ms. The result is that although the words are still recognizable, it will sound like they have been produced by a machine.
Before I go on I feel the need to point out, the obvious fact, that although it is easy to filter out and discard information it is a lot harder to recreate it once it is lost.
This "machine voice" has to be sampled so that is can be identified as being a voice, or noise? If it is a voice witch sound is it? In the spirit of the above warning, I would suggest about 30 samples per second. This will give the comparison engine several chances to recognize each spoken sound. The comparison engine performs the simple task of comparing the current sample from the input to a set of stored examples that have been passed through the same filter. If the comparison engine finds a match it issues the output that matches the input. If the current sample does not match any of the stored examples then it issues an output equal to pressing the space bar.
In Pitman shorthand vowels are mainly recorded only when there is a risk of confusion. The shorthand typist program will need to record all of the vowel sounds and put out an appropriate single letter for every vowel sound. The reason for this is because the shorthand typist program does not have the intelligence of even the least capable human shorthand typist. Pitman shorthand does not use the letters q, x, and c. Quick, if you Think about it is pronounced with the sounds kwik. C does make a unique sound of its own, it makes either; s or k sounds. It will be most sensible to record the sound and let the spell checker chose the correct spelling.
In the original post I focused on the use of a chatbot program to perform the function of the comparison engine. But whatever the tool that is used. What matters is the function it performs. The comparison engine performs the simple task of comparing the current sample from the input to a set of stored examples that have been passed through the same filter. If the comparison engine finds a match it issues the output that matches the input. If the current sample does not match any of the stored examples then it issues an output equal to pressing the space bar. The output from the comparison engine will be added to an internal working text document, as 30 letters or spaces per second. The reason for going for 30 samplings per second is to be sure that are no spoken sounds that get missed. The excessive duplication can be easily dealt with by a spell checker. This spell checker would need to be set up to correct the phonetically spelled words to their correctly spelled words. There is an obvious problem caused by homophones one spoken word that has different spelling and meanings, and false homophones words that sound very close to each other but which a human being would not confuse. It will probably take a few generations of this program to get a spell checker and grammar checker to get the finished versions of this program.
If you know of anyone who can develop this idea please pass it along.