add a kaldi rule (at least for sleep/wake)
Is your feature request related to a problem? Please describe. Dragon has a bunch of built in commands that help with use. It would be nice for switchers to have a Kaldi rule loosely based on the functionality available in base Dragon.
Describe the solution you'd like A Kaldi grammar with the following features:
- [x] Sleep/wake
- [ ] An automatic program opener. In Dragon, you say "open
" and it automatically finds and opens the program you want. It seems to be very good at automapping these utterances to program .exe's. I'm not sure how it works. - [x] An automatic program switcher. In Dragon. you say "switch to
". - [x] A universal button presser. I think this is just a matter of adding the buttons that are missing in
hitinnav.py. - [x] A "maximize window" command that works correctly.
To add sleep/wake we need to translate the grammar in this file into caster.
@daanzu if you have any pointers or know of someone who has done this please let us know.
This potential isn't too hard to implement. This would be relevant to all engines not just Kaldi. How Kaldi manages to make this work set_exclusiveness(). An exclusive grammar takes precedent over all other active grammars. A Rule/grammars that are exclusive are only available for recognition.
Note setting a grammar to exclusive overrides DNS's built-in sleep-wake function. When using DNS "natlink.setMicState("sleeping")" setting to match the state of the grammar.
This potential isn't too hard to implement. This would be relevant to all engines not just Kaldi. How Kaldi manages to make this work
set_exclusiveness(). An exclusive grammar takes precedent over all other active grammars. A Rule/grammars that are exclusive are only available for recognition.Note setting a grammar to exclusive overrides DNS's built-in sleep-wake function. When using DNS "natlink.setMicState("sleeping")" setting to match the state of the grammar.
The FunkContext with a mapping rule.
Overriding the DNS default commands for sleep/wake would be nice because you can use shorter commands such as "snore" to put the mic to sleep. Only downside is I'm not sure if we could get the taskbar icon toggling from green to blue. I think the Kaldi implementation is more important regardless.
Only downside is I'm not sure if we could get the taskbar icon toggling from green to blue.
Fortunately I believe this can be handled by natlink.setMicState(state) reading the documentation it controls the mic, where state is 'on', 'off' or 'sleeping' natlink.getMicState() returns current state. Therefore the DNS icon could be in sync with the exclusive grammar state.
I agree the though the implementation is more important for WSR/Kaldi.
I agree the though the implementation is more important for WSR/Kaldi.
Regarding Kaldi, would implementation involve changing content_loader.py or does this rule operate independent of how you load the other rules?
Fortunately we don't have to change anything in Caster to make grammars exclusive. It's a simple bool. It works on any other rule. The rule must be already loaded into the engine before it's set to be exclusive. One set no other commands except for those that are exclusive through one or multiple rules will be recognized.
@lexxish did you ever figure out getting sleep to work?
@lexxish
With straight dragonfly this would be pretty easy with Caster it's a bit different because we don't know the grammar name being used as it's different every boot. I've been working on programmatically switching DNS Modes in preparation for creating a mode unified mode manager for all engines. The following could be used in the sleep grammar.
from dragonfly import get_engine, Grammar
def find_grammar_name():
grammar_cache = None
if grammar_cache is None:
for grammar in get_engine().grammars:
for rule in grammar.rules:
if rule.exported:
if rule.name == "Mode Rules": # Rule name
print(rule.name)
grammar_cache = grammar
return grammar_cache
else:
return grammar_cache
in another function then you can use grammar_cache.set_exclusiveness(0) or grammar_cache.set_exclusiveness(1) to toggle exclusiveness
You can also check for the running engine type if there is differences that need to be handled based on engine implementation. For example with DNS:
if get_engine()._name == 'natlink':
import natlink
# Do something
* A "maximize window" command that works correctly.
What's wrong with the current behavior @kendonB?
An automatic program switcher. In Dragon. you say "switch to ".
Besides creating a GUI the backend information could be obtained from a tweaked function to use get_all_windows(): returning all pids list instead of Window.get_foreground()
def get_active_window_info():
'''Returns foreground window executable_file, executable_path, title, handle, classname'''
FILENAME_PATTERN = re.compile(r"[/\\]([\w_ ]+\.[\w]+)")
window = Window.get_foreground()
executable_path = str(Path(get_active_window_path()))
match_object = FILENAME_PATTERN.findall(window.executable)
executable_file = None
if len(match_object) > 0:
executable_file = match_object[0]
return [executable_file, executable_path, window.title, window.handle, window.classname]
@lexxish did you ever figure out getting sleep to work?
I have not tried yet. Will update you all if I do.
I do have some "switch to" like code I can post if anyone wants it. I use a phonetic distance library to choose the best match based on what is currently running. Also have "open" like command that searches a couple directories (e.g. desktop)...it's not perfect and I think the way "bring" allows you to specify programs is also nice for things you use a lot.
Another item that would be nice would be ability to use Kaldi for commands, but DNS for dictation - similar to how I believe Kaldi can be used with Google Speech Recognition.
Last item that would be nice to have (but deserves it's own issue number) is integration with accessibility APIs like DNS has. So you can say things like "Click X" when X is a button in a browser.
* A "maximize window" command that works correctly.What's wrong with the current behavior @kendonB?
An automatic program switcher. In Dragon. you say "switch to ".
Besides creating a GUI the backend information could be obtained from a tweaked function to use
get_all_windows():returning all pids list instead ofWindow.get_foreground()def get_active_window_info(): '''Returns foreground window executable_file, executable_path, title, handle, classname''' FILENAME_PATTERN = re.compile(r"[/\\]([\w_ ]+\.[\w]+)") window = Window.get_foreground() executable_path = str(Path(get_active_window_path())) match_object = FILENAME_PATTERN.findall(window.executable) executable_file = None if len(match_object) > 0: executable_file = match_object[0] return [executable_file, executable_path, window.title, window.handle, window.classname]
I could be wrong, but I think Caster's default maximize uses "alt+SPACE, x" to maximize rather then sending the foreground window a maximize message (https://docs.microsoft.com/en-us/windows/win32/learnwin32/window-messages). I don't think "alt+SPACE, x" works for every application, but can't think of a specific one right now. The same type of scenario for exists for closing windows in Caster too I believe, where we could send SIGTERM and/or SIGKILL message equivalents (probably two different voice commands) instead of using keyboard shortcuts and it would (hopefully) work more consistently.
I could be wrong, but I think Caster's default maximize uses "alt+SPACE, x" to maximize rather then sending the foreground window a maximize message
Back when implementing kaldi support I switched it from "alt+SPACE, x" to dragonfly cross-platform implementation. For Windows OS utilizes Win32. If something's not behaving correctly with those minimize/maximize commands let me know.
https://github.com/dictation-toolbox/Caster/blob/7d3834eed076d39db1f163582d4e457ab71ee5f4/castervoice/rules/core/navigation_rules/window_mgmt_rule.py#L13
https://github.com/dictation-toolbox/Caster/blob/7d3834eed076d39db1f163582d4e457ab71ee5f4/castervoice/lib/utilities.py#L77
Last item that would be nice to have (but deserves it's own issue number) is integration with accessibility APIs like DNS has. So you can say things like "Click X" when X is a button in a browser.
I will open up a new issue. Done https://github.com/dictation-toolbox/Caster/issues/814
Another item that would be nice would be ability to use Kaldi for commands, but DNS for dictation - similar to how I believe Kaldi can be used with Google Speech Recognition.
I don't have experience with Natlink, and don't currently have Dragon installed, but I'd be happy to help implementing this. Is there a way with Natlink to just get straight dictation recognition text from audio data passed to it? https://github.com/daanzu/kaldi-active-grammar/issues/23
Perhaps there should be an issue in KaldiAG for working on this?
Agreed
@lexxish and @kendonB I will attempt to implement the sleeping grammar and modes for all engines. These modes will override DNS's built-in modes but will be kept in sync with the DNS GUI.
The https://github.com/dictation-toolbox/Caster/pull/881 addresses the following request.
An automatic program switcher. In Dragon. you say "switch to ".