Inconsistent variable highlighting?
OS X 10.11.2, Sublime Text 3113, Gravity theme, latest MagicPython.
Trying out MagicPython from PythonImproved. In the first Python file I opened, I see:
X_train and X_test are highlighted purple, but y_train and y_test are white. Capitalizing the Y vars turns them purple.
I'm assuming that any word starting with a capital is treated as a "constant" and colored accordingly. Is this intentional? Wouldn't checking for ALL_CAPS be better?
Yeah, we've tuned the rules to highlight things like PROTOCOL_v2 as constants. I don't think we want to add any additional heuristics there.
Here's an excerpt from MagicPython unittests (since GitHub uses MagicPython, it's highlighted the same as it would be in an editor):
QQQQ QQQQ_123 QQQQ123 PROTOCOL_v2 QQQ.bar baz.AA_a _AAA foo._AAA _A __A ___A
QQQq QQQq123 self.FOOO() _ _1 __1 _1A __1A _a __a __ ___ ___a ___1 __aA ___Aa
I agree with highlighting PROTOCOL_V2, and almost every example in your unit tests. However, the example I used when opening the issue is a case I believe shouldn't be highlighted: X_test. The closest unit tests are QQQq and QQQq123.
There are a relevant number of cases, especially in statistics, where a variable may start with a capital. I'd recommend at the very least that QQQq be treated as a variable, and perhaps any name with lowercase letters?
QQQq is already lexed as a variable. I guess I can adjust the grammar to treat names that start with a capital letter and underscore and have the rest in lower caps as variables. This way X_something will be a variable, but XX_something will be highlighted as a constant. Would that work?
Hmmm...maybe X*_something should be a variable (capital, any number of letters, underscore, anything lowercase)? Lots of code uses Greek letters, for example Theta_coefficients.
Hey! recently discovered this cool project. Awesome work!
Regarding this issue, I actually use this 'bug' to highlight some variables. It helps with a quick high level view of the flow.
Coincidently, I am apparently viewing the same script as @pikeas and really like how I can keep track of X_train and X_test.
Just adding my two cents.
After months of deliberations (believe it or not, we actually talked about this many times) we've come to the conclusion that single-upper-case-letter constants are probably more of an exception that the typical use case in generic code.
The module math defines its constants as lower-case anyways (so it's e and not E). In general, in sciences and math single-letter constants are frequently lower-case, so we can assume that for readability that will be the style used in code for established constants (c, e, g, h, etc.), likewise there are some established single-letter constants that are traditionally upper-case (G, R, H_0). These are very specific domains, though. It's likely that within them there is no strong need to add emphasis to these constants in code as people using them are already conditioned to pay attention to these special symbols.
In a generic program, however, single-letter variables are more likely indicating a temporary intermediate computation result or similarly a result that is not meriting extra emphasis. It is considered good style to give important global concepts (classes, functions or constants) a meaningful human-readable name. If the name is long and readable, but only contains a single upper-case letter, it's more reasonable to conclude that this doesn't merit extra emphasis and is probably not a constant (e.g. X_train or even single-letter class X).
And I've just released MagicPython v1.0.0 with this change.
It is very good to see that you put the effort and take it so seriously. I agree now in principle. And in practice, I could easily solve my use case of highlighting interesting variables using some nifty packages that allow having multiple, simultaneous highlighting. In Atom, I use this: https://atom.io/packages/quick-highlight
Closing this issue. Thanks for reporting and discussion!
We consider something to be a "special constant" if it starts with "enough" (2 or more in this case) upper-case letters. Any leading underscores are ignored for the purpose of this definition. Also any number of underscores and digits are allowed in between the first 2 upper-case letters. To fully satisfy the requirement, the "leading" 2 upper-case letters must be followed by only upper-case letters or digits until the first underscore.
The reason for this "2 upper-case letters minimum" rule is outlined in the previous comments and is still valid.
I'll keep this open to stay visible and be a reference for further discussions and new issues.