flex icon indicating copy to clipboard operation
flex copied to clipboard

mkskel.sh & Apple sed

Open stek29 opened this issue 8 years ago • 13 comments

Since 3f2b9a4 it only works with GNU sed.

On Apple sed (BSD too?) it breaks lines at r, producing invalid C file.

stek29 avatar Dec 13 '17 16:12 stek29

Ouutsh - This is getting tricky here, since ´mkskel.sh´ envokes the shell (incl. the internal field separator), sed and m4 with all their supposed EOL treatment (inherited from the compilation?), where the input file could follow the OS's EOL standard or it has converted EOLs (which might happen by, e.g., cloning with git).

@westes I should admit that this is a little bit beyond my knowledge of sed and company, in particular when it comes to OS cross-overs, since I am sitting in front of a Windows box using coreutils shipped by cygwin or msys which is always a bit of a stretch when it comes to a consistent EOL treatment.

jannick0 avatar Dec 14 '17 22:12 jannick0

Probably using perl at least makes sense, or doing tr '\r' '\n' before sed

stek29 avatar Dec 14 '17 22:12 stek29

No to perl.

The tr command is probably wrong in the general case but may be ok in inputs we care about.

On Thursday, 14 December 2017, 10:48 pm +0000, Viktor Oreshkin [email protected] wrote:

Probably using perl at least makes sense, or doing tr '\r' '\n' before sed

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-351859895

-- Will Estes [email protected]

westes avatar Dec 14 '17 22:12 westes

Yeah it is kind of a mess, unfortunately.

On Thursday, 14 December 2017, 10:43 pm +0000, jannick0 [email protected] wrote:

Ouutsh - This is getting tricky here, since ´mkskel.sh´ envokes the shell (incl. the internal field separator), sed and m4 with all their supposed EOL treatment (inherited from the compilation?), where the input file could follow the OS's EOL standard or it has converted EOLs (which might happen by, e.g., cloning with git).

@westes I should admit that this is a little bit beyond my knowledge of sed and company, in particular when it comes to OS cross-overs, since I am sitting in front of a Windows box using coreutils shipped by cygwin or msys which is always a bit of a stretch when it comes to a consistent EOL treatment.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-351858885

-- Will Estes [email protected]

westes avatar Dec 14 '17 23:12 westes

What about something like sed ':a;N;$!ba;s/(\r\n|\r)/\n/g' using sed address ranges to normalize EOLs at some stage(s) of mkskel.sh?

jannick0 avatar Dec 15 '17 15:12 jannick0

You'd also have to remember what the original state of the file is so that you can write it back in the way the caller expects, I think.

On Friday, 15 December 2017, 7:14 am -0800, jannick0 [email protected] wrote:

What about something like sed ':a;N;$!ba;s/(\r\n|\r)/\n/g' using sed address ranges to normalize EOLs at some stage(s) of mkskel.sh?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-352030029

-- Will Estes [email protected]

westes avatar Dec 15 '17 15:12 westes

Umm - then what about using gawk to remember the EOL structure of flex.skl as is?

# mkskel.awk
# sample call: gawk -f ./mskel.awk flex.skl > skel1.c

BEGIN{
	oRS = RS
	RS = "\f" 	# or '\v'; any character which is rare or even not contained in the input stream / file
			# such that gawk slurps the input stream ideally in one single step 
	lines = ""
	dbg = 0
	#dbg = 1
}

{
	lines = lines == "" ? $0 : lines RS $0
	c++
}

END{
	if ( dbg )
		print "input stream read in " c " step(s)" > "/dev/stderr"
	
	if ( lines == "" )
	{
		print "no lines from input file / stream read" > "/dev/stderr"
		exit 1
	}
	
	# compose string of char array skel  
	# where input lines are concatenated with original EOLs
	s = "/* File created from flex.skl via mkskel.sh */" oRS oRS
	s = s "#include \"flexdef.h\"" oRS oRS
	s = s "const char *skel[] = {" oRS
	
	# aEOL non-POSIX
	n = split(lines, aLine, "\r\n|\r|\n", aEOL )
	for ( i = 1; i <= n; i++)
		s = s "\t\"" aLine[i] "\"," ( i < n ? aEOL[i] :  "" )
	
	s = s oRS "\t0" oRS "};"
	
	print s
}

jannick0 avatar Dec 16 '17 00:12 jannick0

We can't assume it's GNU awk.

But if some fairly generic awk will do that, then I'm open to it.

And even some linux distributions have some pretty abominable excuses calling themselves "awk", so it's not just a BSD/OSX thing.

On Saturday, 16 December 2017, 12:28 am +0000, jannick0 [email protected] wrote:

Umm - then what about using gawk to remember the EOL structure of flex.skl as is?

# mkskel.awk
# sample call: gawk -f ./mskel.awk flex.skl > skel1.c

BEGIN{
	oRS = RS
	RS = "\f" 	# or '\v'; any character which is rare or even not contained in the input stream / file
			# such that gawk slurps the input stream ideally in one single step 
	lines = ""
	dbg = 0
	#dbg = 1
}

{
	lines = lines == "" ? $0 : lines RS $0
	c++
}

END{
	if ( dbg )
		print "input stream read in " c " step(s)" > "/dev/stderr"
	
	if ( lines == "" )
	{
		print "no lines from input file / stream read" > "/dev/stderr"
		exit 1
	}
	
	# compose string of char array skel  
	# where input lines are concatenated with original EOLs
	s = "/* File created from flex.skl via mkskel.sh */" oRS oRS
	s = s "#include \"flexdef.h\"" oRS oRS
	s = s "const char *skel[] = {" oRS
	
	# aEOL non-POSIX
	n = split(lines, aLine, "\r\n|\r|\n", aEOL )
	for ( i = 1; i <= n; i++)
		s = s "\t\"" aLine[i] "\"," ( i < n ? aEOL[i] :  "" )
	
	s = s oRS "\t0" oRS "};"
	
	print s
}

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-352144524

-- Will Estes [email protected]

westes avatar Dec 16 '17 00:12 westes

Ok, in this package mkskel.zip I tried to put together a POSIX compliant awk script which should do the trick that output EOL are identical to either input file EOL unless given on the awk command line.

Additional notes:

  • EOL consistency check for input file (if EOL not provided on command line, i.e. from outside of the script)
  • the awk script could replace mkskle.sh, thus it could make m4 obsolete for the preprocessing step. For this the only m4preproc define M4_GEN_PREFIX is migrated to a awk function. VERSION number mandatory on the awk command line.
  • POSIX compliance checked with gawk --posix (or gawk -P)
  • the package contains a makefile to check any differences between the output of mkskel.sh and mkskel.awk after running against flex.skl. Here I see the additional header line with the date stamp and quotation issues in c-comments, thus effectively no differences with impact on flex code
  • the current version of the script process flex.skl as it stands right now. TODOs in the script indicate where code could be removed or amended if corresponding changes in flex.skl were applied; this could shrink the code quite a bit I would expect.
  • the output file type is governed by the version of awk used which I think is not important here, since c compilers do not care about the nasty EOL issue I would hope.

@westes ... and as always please do feel free to amend as you might find appropriate. But I hope that helps.

jannick0 avatar Dec 17 '17 16:12 jannick0

Thanks. I'll have a look. Most likely after 2.6.5 is released which is next on my flex todo list, but we'll see how things go.

On Sunday, 17 December 2017, 8:00 am -0800, jannick0 [email protected] wrote:

Ok, in this package mkskel.zip I tried to put together a POSIX compliant awk script which should do the trick that output EOL are identical to either input file EOL unless given on the awk command line.

Additional notes:

  • EOL consistency check for input file (if EOL not provided on command line, i.e. from outside of the script)
  • the awk script could replace mkskle.sh, thus it could make m4 obsolete for the preprocessing step. For this the only m4preproc define M4_GEN_PREFIX is migrated to a awk function. VERSION number mandatory on the awk command line.
  • POSIX compliance checked with gawk --posix (or gawk -P)
  • the package contains a makefile to check any differences between the output of mkskel.sh and mkskel.awk after running against flex.skl. Here I see the additional header line with the date stamp and quotation issues in c-comments, thus effectively no differences with impact on flex code
  • the current version of the script process flex.skl as it stands right now. TODOs in the script indicate where code could be removed or amended if corresponding changes in flex.skl were applied; this could shrink the code quite a bit I would expect.
  • the output file type is governed by the version of awk used which I think is not important here, since c compilers do not care about the nasty EOL issue I would hope.

@westes ... and as always please do feel free to amend as you might find appropriate. But I hope that helps.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-352265659

-- Will Estes [email protected]

westes avatar Dec 17 '17 16:12 westes

Excuse me, but what was this issue about? Was it only about [^\r] incompatibility or was it something more? I think the fix should be easy—no need to bother with awk or perl. As I experimented with sed syntax when working with PR #321, I think I can take this one.

But here's one thing I need to know first: Which EOL (end of line) convention are we expecting for flex.skl ? LF only, CR+LF, or CR, or do we accept all three?

Explorer09 avatar Apr 24 '18 01:04 Explorer09

In theory we accept any line termination at all.

In practice, flex is built in an ubuntu container (although at some point i'll get the build to run in osx container as well because travis offers that feature). The *BSD folks who are also contributors to flex use standard LF line termination.

On Monday, 23 April 2018, 6:15 pm -0700, "Kang-Che Sung (宋岡哲)" [email protected] wrote:

Excuse me, but what was this issue about? Was it only about [^\r] incompatibility or was it something more? I think the fix should be easy—no need to bother with awk or perl. As I experimented with sed syntax when working with PR #321, I think I can take this one.

But here's one thing I need to know first: Which EOL (end of line) convention are we expecting for flex.skl ? LF only, CR+LF, or CR, or do we accept all three?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/westes/flex/issues/294#issuecomment-383771465

-- Will Estes [email protected]

westes avatar Apr 24 '18 01:04 westes

Maybe this issue #539

wendajiang avatar Aug 10 '22 07:08 wendajiang