LiveScript icon indicating copy to clipboard operation
LiveScript copied to clipboard

Make source location of thrown errors easily accessible

Open bartosz-m opened this issue 7 years ago • 17 comments

It would be beneficial to diagnostic tools to expose source location directly on errors eg.

   try
      livescript.compiler code
   catch
     report-error e.message, e.location

and even better add second message property without line information e.g.

   try
      livescript.compiler code
   catch
     # short.message doesn't contain 'on line X'
     report-error e.short-message, e.location

Right now in my WIP implementation of LSP (Language Server Protocol) to find out where error was generated I need to use regex to match their message or depend on not always accessible property hash.

bartosz-m avatar Mar 17 '18 23:03 bartosz-m

For integrating LS into an IDE, wouldn't it be better if we provided an alternative to the throw-first-error model anyway, so that multiple issues could be highlighted? Or have you come up with a clever workaround for that?

rhendric avatar Mar 18 '18 00:03 rhendric

That would be superb, but it is a lot more work. What I propose is a little upgrade of what livescript already can do. It doesn't look like a big thing so if you're ok with the idea I would like to discuss code style and I'll look into this issue myself.

Also about idea of producing multiple errors I've been thinking that at AST level it could be done quite robust e.g.

  1. start with leafs
  2. validate
  3. gather errors
  4. replace failed leafs with some stubs
  5. move level up

But lexer for me is still kind of black box.

bartosz-m avatar Mar 18 '18 00:03 bartosz-m

It would be a lot more work to make the compiler recover from errors so that more than one can be reported. But it would not be much more work to design a new error logging interface intended for this use case and send location information through that (for the first error only, for now). That would move us in a better direction, instead of adding data to an error that in the ideal version of this interaction would not be thrown. If you're going to try to look into this, I'd much prefer a solution like that.

rhendric avatar Mar 18 '18 00:03 rhendric

Ok. I'll try to look into that logging interface, but reporting multiple errors will bring a lot of changes. I would like to know what kind of modification would be no-go, so I could keep to minimum potentially controversial code

bartosz-m avatar Mar 18 '18 01:03 bartosz-m

Well, I'm largely looking to you to show me what you think the ideal interface would be, since you're the one actively working on an IDE integration! But my specific expectations are that you'd only add a line to the couple of places that SyntaxError is thrown (two carp functions in ast.ls and lexer.ls, plus one weirdo in Decl in ast.ls), prior to the throw, sending whatever information you need to some method on something. Where that method lives, how it gets set up, and what its parameters are, I don't really have a view on right now; but if you're worried that your potential design may be controversial, feel free to sketch it out here and we can discuss.

rhendric avatar Mar 18 '18 01:03 rhendric

It would be a lot more work to make the compiler recover from errors so that more than one can be reported.

That's the issue when virtually any text is a valid parse :P.

vendethiel avatar Mar 18 '18 01:03 vendethiel

It came out rather long, it has four parts first is about data and it base on official LSP specification. Second is about how actual processing would be implemented and it probably be controversial. In third I analyze how could this be integrated into LiveScript. Last contains some questions.

Data

'Logger' could be used not only for errors but also for more generic information e.g. monitoring state of build process. Every message will need to implement this interface.

interface Information {
  /**
   * The information type: diagnostics, notification, compilation state, resource used etc.
   * default to 'message'
   */
  type: string;

  /**
   * The diagnostic's message.
   */
  message: string;

  /**
   * The time of generating information.
   * If missing current timestamp would be inserted
   */
  timestamp?: number;
}

For our current usage (reporting errors) this interface will be used. In future it can also support warnings, hints and similar stuff.

interface Diagnostic extends Information {
  /**
   * The location at which the message applies.
   */
  location: Location;

  /**
   * The diagnostic's severity.
   */
  severity: number;

  /**
   * The diagnostics type e.g. error, warning, linting error, hint
   */
  diagnosticType: string;
}

In LSP Location doesn't contain uri. Adding it to location gives opportunity to write messages to streams as soon as they are generated because it contains all information to determine to determine which part of code it referencing

interface Location {
  uri: string;
  range: Range;
}

to keep things simple those would be same as in LSP:

Implementation

To make this more useful in future I took inspiration from event systems, and stream processing and came with minimalistic design of a "message bus". It would be implemented as single class that could be used in future for different kind of communication needs. It would implement all following interfaces:

interface MessageBus extends InformationConsumer, InformationProducer {
  /**
   * sends information to every connected consumer
   */
  send(inf: Information): void;
}
interface InformationConsumer {
  /**
   * Consumes information
   * What actually happens is implementation dependent
   */
  consume(inf: Information): void;
}
interface InformationProducer {
  /**
   * Connects producer to consumer
  */
  connect(handler: InformationConsumer): void;


  /**
   * Disconnects consumer from producer
  */
  disconnect(reader: InformationConsumer): void;

  /**
   * Connect function to this produced by wrapping it in anonymous consumer
   */
  on(f: (inf:Information) => void): void;

  /**
   * Return InformationProducer that produces only filtered information
   */
  filter(f: (inf:Information) => bool): InformationProducer
}

Integration

I'm writing this assuming that above would not be introduce into livescript only this interface:

interface Logger {
  /**
   * sends information to logging
   */
  send(inf: Information): void;
}

I can see two main ways of integrating this to LiveScript

First is providing logger as an arguments using options as carrier e.g.

  livescript.compiler code, logger: new Logger

Pros:

  • no modification to existing classes
  • user can provide its own implementation

Cons:

  • we still need to provide some stub in every entry point using options param in case that user forget to suply it e.g. ls.compiler code, logger: send: !->

Second is adding property to classes e.g.

class Lexer
    logger: send: !->

Pros:

  • only one modification per base class
  • user can provide its own implementation

Cons:

  • user needs to ensure that every class have set the correct logger
  • but it can't be done until base Node class in ast is exposed otherwise every node type need to have its own property logger set

Questions

  1. Is data format ok?
  2. Do we limit message bus only to logging?
  3. Do we integrate actual logger or stub only?
  4. How to call it?
  5. How to integrate it?

I'm waiting for feedback.

bartosz-m avatar Mar 19 '18 23:03 bartosz-m

Borrowing from the LSP standard broadly makes sense. I don't have time right this moment to review this proposal in depth—I will do so in the next day or two. Some thoughts off the top of my head:

  • The Information/Diagnostic structs seem rather heavy (are both severity and diagnosticType really needed?), and I would lean towards starting those as small as possible and incrementally adding fields if and when LS starts actually needing them. This probably even extends to having a separation at all between Information and Diagnostic. These are additive refinements that can be deferred until later, not compromises that would require a redesign if they were changed.
  • I agree that LiveScript itself doesn't need the producer/consumer pattern, just a simple Logger (I'm honestly a little confused about why you thought that pattern needed to be brought up here at all). Although,
  • Logger should probably just be a function instead of a class with one method. This is JavaScript, not Java.
  • I lean towards putting the logger on the options object rather than injecting it into internal compiler classes. I also think that the stub value shouldn't be a do-nothing function, but rather something that logs to the console (if present), and o.log can replace console.log in most places that the latter is used.

rhendric avatar Mar 20 '18 04:03 rhendric

After cutting things down and making it more compatible with LiveScript, here next iteration:

namespace LiveScript {
  function logger(diag: Diagnostic): void;
}
interface Diagnostic {
  location: Location;
  message: string;
  type: string;
}
interface Location {
  uri: string;
  range: Range;
}

inclusive range

interface Range {
  start: Position;
  end: Position;
}

position is same as in LiveScript

interface Position {
  line: number;
  column: number;
}

Is it lean enough?

bartosz-m avatar Mar 20 '18 21:03 bartosz-m

Looks pretty good to me! Some more questions:

  • How are errors and warnings distinguished—the type field? What values do you expect type to take? Why not LSP's severity: number approach?
  • What value should uri take when LiveScript is compiling:
    • stdin with lsc -s
    • from the REPL
    • inline <script type="text/ls"> elements in the browser
    • strings passed to LiveScript.compile/.run
    • expressions passed to lsc -e
  • Why change to inclusive range when LSP specifies exclusive range?

rhendric avatar Mar 22 '18 02:03 rhendric

How are errors and warnings distinguished—the type field? What values do you expect type to take? Why not LSP's severity: number approach?

type will be string field used to distinguished between different kinds of errors possible values

 <[ error warning ]>

I dropped severity because I thought it would be either unreadable to newcomers e.q

# what does `1` mean? Is it bad?
log message: 'shit happened'  severity:  1

or would require creating shared enum containing severity e.g

# enums.ls
export SEVERITY = 
    ERROR: 0
    WARNING: 1

# other-file.ls
const { SEVERITY } = require \./enums

log message: "all failed", severity: SEVERITY.ERROR

strings introduce less changes and can work just fine for current use-cases

What value should uri take when LiveScript is compiling:

When we know from were source came from: filename, url of script, user supplied argument, we would set uri to that. In all other cases we could set it to some predefine value e.g. 'memory://stdio' or leave it unset.

Why change to inclusive range when LSP specifies exclusive range?

I wanted to make it more align with what is used now in LiveScript first_line,last_line. Are those not inclusive?

bartosz-m avatar Mar 22 '18 16:03 bartosz-m

I wanted to make it more align with what is used now in LiveScript first_line,last_line. Are those not inclusive?

I believe not:

$ bin/lsc -pe 'require \. .ast "1\n2\n3\n" .lines.0'
Literal {
  value: '1',
  first_line: 1,
  first_column: 0,
  last_line: 2,
  last_column: 0,
  line: 1,
  column: 0 }

rhendric avatar Mar 22 '18 16:03 rhendric

Thanks for correcting me. Range will be exclusive on the end. And what about lines indexing position, leave them as they are now 1-based?

bartosz-m avatar Mar 22 '18 17:03 bartosz-m

On this question, slavishly adhering to what has come before doesn't seem all that important. I do dislike how lines are 1-indexed and columns are 0-indexed in the current scheme anyway. LSP specifies 0-indexing and I can get behind that, although if you were in favor of 1-indexing for some reason I would entertain that too. While we're changing things, in the above example, I kind of think last_line and last_column ought to be 1 and 1 (assuming the current mixed-indexing).

The simplest thing to do to implement this feature would be to leave first_line etc. as-is and compute derived ranges for the logger, in whatever scheme we agree to standardize on; then we can refactor away the derivations later. (Changing first_line and friends internally would mean looking at source map generation and making sure it stays correct—or, if there's an off-by-one error in there already, fixing that.)

rhendric avatar Mar 22 '18 17:03 rhendric

I would also prefer 0-based indexing. And about example if we take approach from LSP that range would contain new-line character. If first_line and last_line would be == 1 then new-line would be excluded. From error reporting point of view new-lines most of time are undesirable on the end of ranges.

For now I thing it is safer to leave line as they are and just substract 1 on logging:

range: start: line: first_line - 1 end: last_line - 1

And what about auto computing omitted props e.g start.column, end.columnt? Would it be ok to set them to 0 by default?

bartosz-m avatar Mar 22 '18 17:03 bartosz-m

I would say every node reporting a warning or error should have correct column values (it's a bug if not). Did you have a specific case in mind when column would be omitted?

rhendric avatar Mar 22 '18 18:03 rhendric

No I didn't have anything particular in mind.

bartosz-m avatar Mar 22 '18 18:03 bartosz-m