You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/doc/en/audio/digit.md
+17-11Lines changed: 17 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,15 +13,15 @@ update:
13
13
14
14
## Maix-Speech
15
15
16
-
[`Maix-Speech`](https://github.com/sipeed/Maix-Speech) is an offline speech library specifically designed for embedded environments. It features deep optimization of speech recognition algorithms, achieving a significant lead in memory usage while maintaining excellent WER. For more details on the principles, please refer to the open-source project.
16
+
[`Maix-Speech`](https://github.com/sipeed/Maix-Speech) is an offline speech recognition library specifically designed for embedded environments. It has been deeply optimized for speech recognition algorithms, significantly reducing memory usage while maintaining excellent recognition accuracy. For detailed information, please refer to the [Maix-Speech Documentation](https://github.com/sipeed/Maix-Speech/blob/master/usage_zh.md).
speech.init(nn.SpeechDevice.DEVICE_MIC, "hw:0,0") # Specify the audio input device
59
59
```
60
60
61
-
- This uses the onboard microphone and supports both `WAV` and `PCM` audio as input devices.
61
+
- This uses the onboard microphone and supports both `WAV` and `PCM` audio as input.
62
62
63
63
```python
64
64
speech.init(nn.SpeechDevice.DEVICE_WAV, "path/audio.wav") # Using WAV audio input
@@ -74,11 +74,10 @@ speech.init(nn.SpeechDevice.DEVICE_PCM, "path/audio.pcm") # Using PCM audio in
74
74
arecord -d 5 -r 16000 -c 1 -f S16_LE audio.wav
75
75
```
76
76
77
-
- When recognizing `PCM/WAV` , if you want to reset the data source, such as for the next WAV file recognition, you can use the `speech.devive` method, which will automatically clear the cache:
78
-
77
+
- When recognizing `PCM/WAV` , if you want to reset the data source, such as for the next WAV file recognition, you can use the `speech.device` method, which will automatically clear the cache:
-Users can register several decoders (or none), which decode the results from the acoustic model and execute the corresponding user callback. Here, a `digit` decoder is registered to output the Chinese digit recognition results from the last 4 seconds. The returned recognition results are in string format and support `0123456789 .(dot) S(ten) B(hundred) Q(thousand) W(thousand)`. For other decoder usages, please refer to the sections on Real-time voice recognition and keyword recognition.
91
+
-The user can configure multiple decoders simultaneously. `digit` decoder is registered to output the Chinese digit recognition results from the last 4 seconds. The returned recognition results are in string format and support `0123456789 .(dot) S(ten) B(hundred) Q(thousand) W(thousand)`.
93
92
94
93
- When setting the `digit` decoder, you need to specify a `blank` value; exceeding this value (in ms) will insert a `_` in the output results to indicate idle silence.
95
94
96
-
- After registering the decoder, use the `speech.deinit()` method to clear the initialization.
95
+
- If a decoder is no longer needed, you can deinitialize it by calling the `speech.dec_deinit` method.
96
+
97
+
```python
98
+
speech.dec_deinit(nn.SpeechDecoder.DECODER_DIG)
99
+
```
97
100
98
101
5. Recognition
99
102
@@ -102,12 +105,15 @@ while not app.need_exit():
102
105
frames = speech.run(1)
103
106
if frames <1:
104
107
print("run out\n")
105
-
speech.deinit()
106
108
break
107
109
```
108
110
109
111
- Use the `speech.run` method to run speech recognition. The parameter specifies the number of frames to run each time, returning the actual number of frames processed. Users can choose to run 1 frame each time and then perform other processing, or run continuously in a single thread, stopping it with an external thread.
110
112
113
+
- To clear the cache of recognized results, you can use the `speech.clear` method.
114
+
115
+
- When switching decoders during recognition, the first frame after the switch may produce incorrect results. You can use `speech.skip_frames(1)` to skip the first frame and ensure the accuracy of subsequent results.
116
+
111
117
### Recognition Results
112
118
113
119
If the above program runs successfully, speaking into the onboard microphone will yield continuous Chinese digit recognition results, such as:
Copy file name to clipboardExpand all lines: docs/doc/en/audio/keyword.md
+17-11Lines changed: 17 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,15 +13,15 @@ update:
13
13
14
14
## Maix-Speech
15
15
16
-
[`Maix-Speech`](https://github.com/sipeed/Maix-Speech) is an offline speech library specifically designed for embedded environments. It features deep optimization of speech recognition algorithms, achieving a significant lead in memory usage while maintaining excellent WER. For more details on the principles, please refer to the open-source project.
16
+
[`Maix-Speech`](https://github.com/sipeed/Maix-Speech) is an offline speech recognition library specifically designed for embedded environments. It has been deeply optimized for speech recognition algorithms, significantly reducing memory usage while maintaining excellent recognition accuracy. For detailed information, please refer to the [Maix-Speech Documentation](https://github.com/sipeed/Maix-Speech/blob/master/usage_zh.md).
speech.init(nn.SpeechDevice.DEVICE_MIC, "hw:0,0") # Specify the audio input device
66
66
```
67
67
68
-
- This uses the onboard microphone and supports both `WAV` and `PCM` audio as input devices.
68
+
- This uses the onboard microphone and supports both `WAV` and `PCM` audio as input.
69
69
70
70
```python
71
71
speech.init(nn.SpeechDevice.DEVICE_WAV, "path/audio.wav") # Using WAV audio input
@@ -81,11 +81,10 @@ speech.init(nn.SpeechDevice.DEVICE_PCM, "path/audio.pcm") # Using PCM audio in
81
81
arecord -d 5 -r 16000 -c 1 -f S16_LE audio.wav
82
82
```
83
83
84
-
- When recognizing `PCM/WAV` , if you want to reset the data source, such as for the next WAV file recognition, you can use the `speech.devive` method, which will automatically clear the cache:
85
-
84
+
- When recognizing `PCM/WAV` , if you want to reset the data source, such as for the next WAV file recognition, you can use the `speech.device` method, which will automatically clear the cache:
-Users can register several decoders (or none), which decode the results from the acoustic model and execute the corresponding user callback. Here, a `kws` decoder is registered to output a list of probabilities for all registered keywords from the last frame. Users can observe the probability values and set their own thresholds for activation. For other decoder usages, please refer to the sections on Real-time voice recognition and continuous Chinese numeral recognition.
105
+
-The user can configure multiple decoders simultaneously. `kws` decoder is registered to output a list of probabilities for all registered keywords from the last frame. Users can observe the probability values and set their own thresholds for activation.
107
106
108
107
- When setting up the `kws` decoder, you need to provide a `keyword list` separated by spaces in Pinyin, a `keyword probability threshold list` arranged in order, and specify whether to enable `automatic near-sound processing`. If set to `True`, different tones of the same Pinyin will be treated as similar words to accumulate probabilities. Finally, you need to set a callback function to handle the decoded data.
- After registering the decoder, use the `speech.deinit()` method to clear the initialization.
116
+
- If a decoder is no longer needed, you can deinitialize it by calling the `speech.dec_deinit` method.
117
+
118
+
```python
119
+
speech.dec_deinit(nn.SpeechDecoder.DECODER_KWS)
120
+
```
118
121
119
122
5. Recognition
120
123
@@ -123,12 +126,15 @@ while not app.need_exit():
123
126
frames = speech.run(1)
124
127
if frames <1:
125
128
print("run out\n")
126
-
speech.deinit()
127
129
break
128
130
```
129
131
130
132
- Use the `speech.run` method to run speech recognition. The parameter specifies the number of frames to run each time, returning the actual number of frames processed. Users can choose to run 1 frame each time and then perform other processing, or run continuously in a single thread, stopping it with an external thread.
131
133
134
+
- To clear the cache of recognized results, you can use the `speech.clear` method.
135
+
136
+
- When switching decoders during recognition, the first frame after the switch may produce incorrect results. You can use `speech.skip_frames(1)` to skip the first frame and ensure the accuracy of subsequent results.
137
+
132
138
### Recognition Results
133
139
134
140
If the above program runs successfully, speaking into the onboard microphone will yield keyword recognition results, such as:
Copy file name to clipboardExpand all lines: docs/doc/en/audio/recognize.md
+17-11Lines changed: 17 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,15 +13,15 @@ update:
13
13
14
14
## Maix-Speech
15
15
16
-
[`Maix-Speech`](https://github.com/sipeed/Maix-Speech) is an offline speech library specifically designed for embedded environments. It features deep optimization of speech recognition algorithms, achieving a significant lead in memory usage while maintaining excellent WER. For more details on the principles, please refer to the open-source project.
16
+
[`Maix-Speech`](https://github.com/sipeed/Maix-Speech) is an offline speech recognition library specifically designed for embedded environments. It has been deeply optimized for speech recognition algorithms, significantly reducing memory usage while maintaining excellent recognition accuracy. For detailed information, please refer to the [Maix-Speech Documentation](https://github.com/sipeed/Maix-Speech/blob/master/usage_zh.md).
speech.init(nn.SpeechDevice.DEVICE_MIC, "hw:0,0") # Specify the audio input device
63
63
```
64
64
65
-
- This uses the onboard microphone and supports both `WAV` and `PCM` audio as input devices.
65
+
- This uses the onboard microphone and supports both `WAV` and `PCM` audio as input.
66
66
67
67
```python
68
68
speech.init(nn.SpeechDevice.DEVICE_WAV, "path/audio.wav") # Using WAV audio input
@@ -78,11 +78,10 @@ speech.init(nn.SpeechDevice.DEVICE_PCM, "path/audio.pcm") # Using PCM audio in
78
78
arecord -d 5 -r 16000 -c 1 -f S16_LE audio.wav
79
79
```
80
80
81
-
- When recognizing `PCM/WAV` , if you want to reset the data source, such as for the next WAV file recognition, you can use the `speech.devive` method, which will automatically clear the cache:
82
-
81
+
- When recognizing `PCM/WAV` , if you want to reset the data source, such as for the next WAV file recognition, you can use the `speech.device` method, which will automatically clear the cache:
-Users can register several decoders (or none), which decode the results from the acoustic model and execute the corresponding user callback. Here, a `lvcsr` decoder is registered to output continuous speech recognition results (for fewer than 1024 Chinese characters). For other decoder usages, please refer to the sections on continuous Chinese numeral recognition and keyword recognition.
99
+
-The user can configure multiple decoders simultaneously. `lvcsr` decoder is registered to output continuous speech recognition results (for fewer than 1024 Chinese characters).
101
100
102
101
- When setting up the `lvcsr` decoder, you need to specify the paths for the `sfst` file, the `sym` file (output symbol table), the path for `phones.bin` (phonetic table), and the path for `words.bin` (dictionary). Lastly, a callback function must be set to handle the decoded data.
103
102
104
-
- After registering the decoder, use the `speech.deinit()` method to clear the initialization.
103
+
- If a decoder is no longer needed, you can deinitialize it by calling the `speech.dec_deinit` method.
104
+
105
+
```python
106
+
speech.dec_deinit(nn.SpeechDecoder.DECODER_LVCSR)
107
+
```
105
108
106
109
5. Recognition
107
110
@@ -110,12 +113,15 @@ while not app.need_exit():
110
113
frames = speech.run(1)
111
114
if frames <1:
112
115
print("run out\n")
113
-
speech.deinit()
114
116
break
115
117
```
116
118
117
119
- Use the `speech.run` method to run speech recognition. The parameter specifies the number of frames to run each time, returning the actual number of frames processed. Users can choose to run 1 frame each time and then perform other processing, or run continuously in a single thread, stopping it with an external thread.
118
120
121
+
- To clear the cache of recognized results, you can use the `speech.clear` method.
122
+
123
+
- When switching decoders during recognition, the first frame after the switch may produce incorrect results. You can use `speech.skip_frames(1)` to skip the first frame and ensure the accuracy of subsequent results.
124
+
119
125
### Recognition Results
120
126
121
127
If the above program runs successfully, speaking into the onboard microphone will yield real-time speech recognition results, such as:
0 commit comments