apply-refact icon indicating copy to clipboard operation
apply-refact copied to clipboard

Windows files are written in CRLF newlines

Open ndmitchell opened this issue 5 years ago • 5 comments

Given a file on Windows with LF newlines, when I run it through apply-refact, it comes out with CRLF newlines. It would be better if apply-refact (or maybe ghc-exactprint?) could preserve whatever type of newline I have, in addition to everything else.

ndmitchell avatar Sep 12 '20 15:09 ndmitchell

The fix to this should be done in ghc-exactprint, since parseModule and exactPrint don't round-trip wrt LF vs. CRLF. I wonder if GHC ParseResult contains the newline encoding information, or is it lost? I don't see it in ParseResult. If that information is lost, perhaps we can post-process the output, depending on whether the first newline in the input file is LF or CRLF?

zliu41 avatar Nov 22 '20 07:11 zliu41

If that information is lost, perhaps we can post-process the output, depending on whether the first newline in the input file is LF or CRLF?

Mmm, ideally ghc-exactprint should preserve the EOL for each line and honour the previous EOL line if each additional line. Nobody sane would mix both line endings but who knows. Other alternative could be add as a a new parameter all info that may be lost in the actual data, line endings (and encoding?) and apply it to output uniformly, in a new function to keep backwards compatibility. Then apply-refact could decide what EOL is used (the first line f.e.) or ask clients for the same info.

jneira avatar Nov 23 '20 06:11 jneira

I agree with

Nobody sane would mix both line endings

I don't mind adding a new function with an additional parameter, but I guess users rarely want to specify whether they want LF or CRLF. They just want whatever the input file has. So I'd add it only if someone says they need it.

As to encoding, I don't think GHC can even parse any non-UTF-8 encoded source file, so we can safely assume that all source files are UTF-8.

zliu41 avatar Nov 23 '20 06:11 zliu41

Oh, you probably meant that the new function takes the already parsed module as input. In that case adding a new parameter is definitely useful (although ideally, that information should be part of Anns).

How about adding it to applyRefactorings'? Can you think of other things like this besides LF vs. CRLF?

zliu41 avatar Nov 23 '20 07:11 zliu41

Well, taking a look to System.IO there are three axis in the output settings (for completeness):

  • binary vs text encoding: subsumed within the other ones, it will be always text encoding for us.
  • encoding: we could assume it always is UTF-8 as ghc only supports it.
  • eol: agree adding a paremeter as we have commented here seems to be the right solution (at least in applyRefactoring')

jneira avatar Nov 23 '20 07:11 jneira