Okay, so from what I gather, the biggest difference between the old SAPI 5.1 and SAPI 5.3 is that SAPI 5.3 actually takes less code to do some of the same things that SAPI 5.1 does. This is just a simple example that runs notepad and says it is before doing so.
One thing to remember is that the name of speech lib is changed again. Now you have to do this: Project -> Add Reference -> .NET -> System.Speech
CODE
imports system.speech
Public Class Form1
Public WithEvents recognizer As New System.Speech.Recognition.SpeechRecognitionEngine
Dim gram As New System.Speech.Recognition.DictationGrammar()
Public synth As New System.Speech.Synthesis.SpeechSynthesizer
Dim cmd as String
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
recognizer.LoadGrammar(gram)
recognizer.SetInputToDefaultAudioDevice()
recognizer.RecognizeAsync()
End Sub
Private Sub GotSpeech( ByVal sender As System.Object, ByVal phrase As System.Speech.Recognition.SpeechRecognizedEventArgs ) Handles recognizer.SpeechRecognized
cmd = phrase.Result.Text
If cmd.IndexOf("Run") <> 0 or cmd.IndexOf("run") <> 0 Then
If cmd.Split(" ")(1) = "Notepad" Or cmd.Split(" ")(1) = "notepad" Then
synth.Speak("Running Notepad.")
Shell("notepad.exe", AppWinStyle.NormalFocus, False)
End If
End If
End Sub
End Class
Now some explanation.
cmd is the String version of what the speech recognition engine determined was what was said. This is a simple example that simply allows someone to say run notepad and it will run. You can essentially adapt this to any purpose in an app.
Also, i check both "Run" and "run" because I'm not sure at any point whether the recognizer will capitalize the first letter or not.
gram is a DictationGrammar object. As it turns out, there are two grammar objects in SAPI 5.3. As you've probably already guessed, DictationGrammar is for spoken language and the normal grammar object is for simple grammar rules.
So basically, Load the grammar, set audio to the default input device and then start an Async recognize call. They have a blocking recognize call and one thing to note here is that it will turn itself off after each sucessful recognized phrase. Meaning that you have to recall Async recognize at the end of the GotSpeech sub if you want to keep picking up voice commands. As it stands, this will pick up one command and no more because it stops recognition.
One neat thing about the synthesis engine is that you can specify things like ".NET" to be said as "dot net" by Anna. Also, as it stands, the periods aren't said and so are pretty much useless. It acts just like the Vista Recognition engine also so if you spell out a word, Anna will spell it out too.
Anyways, hope it helps someone out there. If you want to know more, well, MSDN is your friend.