Make source location of thrown errors easily accessible
It would be beneficial to diagnostic tools to expose source location directly on errors eg.
try
livescript.compiler code
catch
report-error e.message, e.location
and even better add second message property without line information e.g.
try
livescript.compiler code
catch
# short.message doesn't contain 'on line X'
report-error e.short-message, e.location
Right now in my WIP implementation of LSP (Language Server Protocol) to find out where error was generated I need to use regex to match their message or depend on not always accessible property hash.
For integrating LS into an IDE, wouldn't it be better if we provided an alternative to the throw-first-error model anyway, so that multiple issues could be highlighted? Or have you come up with a clever workaround for that?
That would be superb, but it is a lot more work. What I propose is a little upgrade of what livescript already can do. It doesn't look like a big thing so if you're ok with the idea I would like to discuss code style and I'll look into this issue myself.
Also about idea of producing multiple errors I've been thinking that at AST level it could be done quite robust e.g.
- start with leafs
- validate
- gather errors
- replace failed leafs with some stubs
- move level up
But lexer for me is still kind of black box.
It would be a lot more work to make the compiler recover from errors so that more than one can be reported. But it would not be much more work to design a new error logging interface intended for this use case and send location information through that (for the first error only, for now). That would move us in a better direction, instead of adding data to an error that in the ideal version of this interaction would not be thrown. If you're going to try to look into this, I'd much prefer a solution like that.
Ok. I'll try to look into that logging interface, but reporting multiple errors will bring a lot of changes. I would like to know what kind of modification would be no-go, so I could keep to minimum potentially controversial code
Well, I'm largely looking to you to show me what you think the ideal interface would be, since you're the one actively working on an IDE integration! But my specific expectations are that you'd only add a line to the couple of places that SyntaxError is thrown (two carp functions in ast.ls and lexer.ls, plus one weirdo in Decl in ast.ls), prior to the throw, sending whatever information you need to some method on something. Where that method lives, how it gets set up, and what its parameters are, I don't really have a view on right now; but if you're worried that your potential design may be controversial, feel free to sketch it out here and we can discuss.
It would be a lot more work to make the compiler recover from errors so that more than one can be reported.
That's the issue when virtually any text is a valid parse :P.
It came out rather long, it has four parts first is about data and it base on official LSP specification. Second is about how actual processing would be implemented and it probably be controversial. In third I analyze how could this be integrated into LiveScript. Last contains some questions.
Data
'Logger' could be used not only for errors but also for more generic information e.g. monitoring state of build process. Every message will need to implement this interface.
interface Information {
/**
* The information type: diagnostics, notification, compilation state, resource used etc.
* default to 'message'
*/
type: string;
/**
* The diagnostic's message.
*/
message: string;
/**
* The time of generating information.
* If missing current timestamp would be inserted
*/
timestamp?: number;
}
For our current usage (reporting errors) this interface will be used. In future it can also support warnings, hints and similar stuff.
interface Diagnostic extends Information {
/**
* The location at which the message applies.
*/
location: Location;
/**
* The diagnostic's severity.
*/
severity: number;
/**
* The diagnostics type e.g. error, warning, linting error, hint
*/
diagnosticType: string;
}
In LSP Location doesn't contain uri. Adding it to location gives opportunity to write messages to streams as soon as they are generated because it contains all information to determine to determine which part of code it referencing
interface Location {
uri: string;
range: Range;
}
to keep things simple those would be same as in LSP:
Implementation
To make this more useful in future I took inspiration from event systems, and stream processing and came with minimalistic design of a "message bus". It would be implemented as single class that could be used in future for different kind of communication needs. It would implement all following interfaces:
interface MessageBus extends InformationConsumer, InformationProducer {
/**
* sends information to every connected consumer
*/
send(inf: Information): void;
}
interface InformationConsumer {
/**
* Consumes information
* What actually happens is implementation dependent
*/
consume(inf: Information): void;
}
interface InformationProducer {
/**
* Connects producer to consumer
*/
connect(handler: InformationConsumer): void;
/**
* Disconnects consumer from producer
*/
disconnect(reader: InformationConsumer): void;
/**
* Connect function to this produced by wrapping it in anonymous consumer
*/
on(f: (inf:Information) => void): void;
/**
* Return InformationProducer that produces only filtered information
*/
filter(f: (inf:Information) => bool): InformationProducer
}
Integration
I'm writing this assuming that above would not be introduce into livescript only this interface:
interface Logger {
/**
* sends information to logging
*/
send(inf: Information): void;
}
I can see two main ways of integrating this to LiveScript
First is providing logger as an arguments using options as carrier e.g.
livescript.compiler code, logger: new Logger
Pros:
- no modification to existing classes
- user can provide its own implementation
Cons:
- we still need to provide some stub in every entry point using options param in case that user forget to suply it e.g.
ls.compiler code, logger: send: !->
Second is adding property to classes e.g.
class Lexer
logger: send: !->
Pros:
- only one modification per base class
- user can provide its own implementation
Cons:
- user needs to ensure that every class have set the correct logger
- but it can't be done until base Node class in ast is exposed otherwise every node type need to have its own property logger set
Questions
- Is data format ok?
- Do we limit message bus only to logging?
- Do we integrate actual logger or stub only?
- How to call it?
- How to integrate it?
I'm waiting for feedback.
Borrowing from the LSP standard broadly makes sense. I don't have time right this moment to review this proposal in depth—I will do so in the next day or two. Some thoughts off the top of my head:
- The
Information/Diagnosticstructs seem rather heavy (are bothseverityanddiagnosticTypereally needed?), and I would lean towards starting those as small as possible and incrementally adding fields if and when LS starts actually needing them. This probably even extends to having a separation at all betweenInformationandDiagnostic. These are additive refinements that can be deferred until later, not compromises that would require a redesign if they were changed. - I agree that LiveScript itself doesn't need the producer/consumer pattern, just a simple
Logger(I'm honestly a little confused about why you thought that pattern needed to be brought up here at all). Although, -
Loggershould probably just be a function instead of a class with one method. This is JavaScript, not Java. - I lean towards putting the logger on the options object rather than injecting it into internal compiler classes. I also think that the stub value shouldn't be a do-nothing function, but rather something that logs to the console (if present), and
o.logcan replaceconsole.login most places that the latter is used.
After cutting things down and making it more compatible with LiveScript, here next iteration:
namespace LiveScript {
function logger(diag: Diagnostic): void;
}
interface Diagnostic {
location: Location;
message: string;
type: string;
}
interface Location {
uri: string;
range: Range;
}
inclusive range
interface Range {
start: Position;
end: Position;
}
position is same as in LiveScript
interface Position {
line: number;
column: number;
}
Is it lean enough?
Looks pretty good to me! Some more questions:
- How are errors and warnings distinguished—the
typefield? What values do you expecttypeto take? Why not LSP'sseverity: numberapproach? - What value should
uritake when LiveScript is compiling:- stdin with
lsc -s - from the REPL
- inline
<script type="text/ls">elements in the browser - strings passed to
LiveScript.compile/.run - expressions passed to
lsc -e
- stdin with
- Why change to inclusive range when LSP specifies exclusive range?
How are errors and warnings distinguished—the type field? What values do you expect type to take? Why not LSP's severity: number approach?
type will be string field used to distinguished between different kinds of errors possible values
<[ error warning ]>
I dropped severity because I thought it would be either unreadable to newcomers e.q
# what does `1` mean? Is it bad?
log message: 'shit happened' severity: 1
or would require creating shared enum containing severity e.g
# enums.ls
export SEVERITY =
ERROR: 0
WARNING: 1
# other-file.ls
const { SEVERITY } = require \./enums
log message: "all failed", severity: SEVERITY.ERROR
strings introduce less changes and can work just fine for current use-cases
What value should uri take when LiveScript is compiling:
When we know from were source came from: filename, url of script, user supplied argument, we would set uri to that. In all other cases we could set it to some predefine value e.g. 'memory://stdio' or leave it unset.
Why change to inclusive range when LSP specifies exclusive range?
I wanted to make it more align with what is used now in LiveScript first_line,last_line. Are those not inclusive?
I wanted to make it more align with what is used now in LiveScript
first_line,last_line. Are those not inclusive?
I believe not:
$ bin/lsc -pe 'require \. .ast "1\n2\n3\n" .lines.0'
Literal {
value: '1',
first_line: 1,
first_column: 0,
last_line: 2,
last_column: 0,
line: 1,
column: 0 }
Thanks for correcting me. Range will be exclusive on the end. And what about lines indexing position, leave them as they are now 1-based?
On this question, slavishly adhering to what has come before doesn't seem all that important. I do dislike how lines are 1-indexed and columns are 0-indexed in the current scheme anyway. LSP specifies 0-indexing and I can get behind that, although if you were in favor of 1-indexing for some reason I would entertain that too. While we're changing things, in the above example, I kind of think last_line and last_column ought to be 1 and 1 (assuming the current mixed-indexing).
The simplest thing to do to implement this feature would be to leave first_line etc. as-is and compute derived ranges for the logger, in whatever scheme we agree to standardize on; then we can refactor away the derivations later. (Changing first_line and friends internally would mean looking at source map generation and making sure it stays correct—or, if there's an off-by-one error in there already, fixing that.)
I would also prefer 0-based indexing. And about example if we take approach from LSP that range would contain new-line character. If first_line and last_line would be == 1 then new-line would be excluded. From error reporting point of view new-lines most of time are undesirable on the end of ranges.
For now I thing it is safer to leave line as they are and just substract 1 on logging:
range: start: line: first_line - 1 end: last_line - 1
And what about auto computing omitted props e.g start.column, end.columnt? Would it be ok to set them to 0 by default?
I would say every node reporting a warning or error should have correct column values (it's a bug if not). Did you have a specific case in mind when column would be omitted?
No I didn't have anything particular in mind.