tm4e icon indicating copy to clipboard operation
tm4e copied to clipboard

Implement IGrammar tokenizeLine2 like vscode-textmate

Open angelozerr opened this issue 9 years ago • 6 comments

See https://github.com/Microsoft/vscode/issues/16206#issuecomment-265166655 and https://github.com/Microsoft/vscode-textmate/blob/master/src/tests/themedTokenizer.ts#L25

The tokenizeLine2 seems to provide:

  • binary result (perhaps it will improve memory?)
  • result is a merge between the grammar and the theme (perhaps it will improve performance?)

angelozerr avatar Jan 27 '17 08:01 angelozerr

@sebthom I see that you are very activated on TM4E. Thanks for your contribution!

If you have (a lot) time, I think it should be really nice to work on this issue. I had implemented tokenizeLine2, but I didn't consume it. It should b ereally nice to consume it.

Why using tokenizeLine2? I suggest that you read https://code.visualstudio.com/blogs/2017/02/08/syntax-highlighting-optimizations

angelozerr avatar May 01 '22 19:05 angelozerr

@angelozerr I had a look at tokenizeLine2 but I am unsure if the current implementation is supposed to work already.

I only ever get two int values back per line. E.g. I have the following test:

	@Test
	void testTokenizeLine2() throws Exception {
		final var path = "JavaScript.tmLanguage";
		try (var in = Data.class.getResourceAsStream(path)) {
			final var grammar = new Registry().loadGrammarFromPathSync(path, in);

			final var lineTokens = grammar.tokenizeLine("function add(a,b) { return a+b; }");
			for (int i = 0; i < lineTokens.getTokens().length; i++) {
				final var token = lineTokens.getTokens()[i];
				final String s = "Token from " + token.getStartIndex() + " to " + token.getEndIndex() + " with scopes "
						+ token.getScopes();
				System.out.println(s);
			}

            System.out.println("----------");

			final var lineTokens2 = grammar.tokenizeLine2("function add(a,b) { return a+b; }");
			for (int i = 0; i < lineTokens2.getTokens().length; i++) {
				int token = lineTokens2.getTokens()[i];
				System.out.println(token);
			}
		}

It outputs:

Token from 0 to 8 with scopes [source.js, meta.function.js, storage.type.function.js]
Token from 8 to 9 with scopes [source.js, meta.function.js]
Token from 9 to 12 with scopes [source.js, meta.function.js, entity.name.function.js]
Token from 12 to 13 with scopes [source.js, meta.function.js, meta.function.type.parameter.js, meta.brace.round.js]
Token from 13 to 14 with scopes [source.js, meta.function.js, meta.function.type.parameter.js, parameter.name.js, variable.parameter.js]
Token from 14 to 15 with scopes [source.js, meta.function.js, meta.function.type.parameter.js]
Token from 15 to 16 with scopes [source.js, meta.function.js, meta.function.type.parameter.js, parameter.name.js, variable.parameter.js]
Token from 16 to 17 with scopes [source.js, meta.function.js, meta.function.type.parameter.js, meta.brace.round.js]
Token from 17 to 18 with scopes [source.js, meta.function.js]
Token from 18 to 19 with scopes [source.js, meta.function.js, meta.decl.block.js, meta.brace.curly.js]
Token from 19 to 20 with scopes [source.js, meta.function.js, meta.decl.block.js]
Token from 20 to 26 with scopes [source.js, meta.function.js, meta.decl.block.js, keyword.control.js]
Token from 26 to 28 with scopes [source.js, meta.function.js, meta.decl.block.js]
Token from 28 to 29 with scopes [source.js, meta.function.js, meta.decl.block.js, keyword.operator.arithmetic.js]
Token from 29 to 32 with scopes [source.js, meta.function.js, meta.decl.block.js]
Token from 32 to 33 with scopes [source.js, meta.function.js, meta.decl.block.js, meta.brace.curly.js]
----------
0
16793600

I would have expected to at least get more than two ints back with tokenizeLine2. I also tested it with an old commit before I started my refactoring attempts to ensure that I didn't break anything along the way but the behavior there is the same.

any thoughts?

sebthom avatar May 04 '22 18:05 sebthom

To be honnest with you when I implemented that I have just copy paste code from vscode textmate and translate it from typescript to java without understand. I did the same things for tests if I remember.

I cannot help you more but I think it can be good to study it because vscode uses this strategy and not the the old strategy than tm4e is using.

angelozerr avatar May 04 '22 18:05 angelozerr

Is this possible that after this change embeded grammars will be better detected? Or this might be java oniguruma implementation problem?

zulus avatar Nov 13 '23 13:11 zulus

@zulus to be honnest with you, I don't know. I did that to try to have the same behavior than vscode-textmate.

angelozerr avatar Nov 13 '23 13:11 angelozerr

@zulus to be honnest with you, I don't know. I did that to try to have the same behavior than vscode-textmate.

Thanks, I'll try ;) Currently vue grammars behave differently in compare to vscode (and other text-mate based editors like nova in osx). For example I haven't javascript coloring inside v-if @event :attribute-bind etc..

zulus avatar Nov 13 '23 13:11 zulus