code icon indicating copy to clipboard operation
code copied to clipboard

Mixed Formats (DOS and UTF-8)

Open tamer73 opened this issue 2 years ago • 25 comments

What Happened?

Error message says "No text Found. Maybe corrupt or no text file" while trying to open an 178kb php file with code version 7.1.0-1 on Manjaro Linux with everything up to date. Every other php file is working fine like expected! Looks like code cant read the file into the buffer so it event cant realize its a php file

Steps to Reproduce

  1. Opening 178kb php file inside code or from Dolphin Browser results in the following error: pantheon-code_issue

Expected Behavior

Just wanted to view the php code

OS Version

Other Linux

Software Version

Latest release (I have run all updates)

Log Output

No response

Hardware Info

CPU: dual core Intel Core i3-4130 (-MT MCP-) speed/min/max: 1450/800/3400 MHz Kernel: 6.1.44-1-MANJARO x86_64 Up: 29m Mem: 3.15/11.6 GiB (27.1%) Storage: 461.98 GiB (3.9% used) Procs: 194 Shell: Zsh inxi: 3.3.29

tamer73 avatar Aug 20 '23 16:08 tamer73

What happens when you press "Show Anyway"? Can you load the file with nano or another simple text editor?

jeremypw avatar Aug 20 '23 19:08 jeremypw

This error message is shown when the Gtk.SourceFileLoader throws an error while loading the file. I wouldnt have thought file size would be an issue on modern hardware.

jeremypw avatar Aug 20 '23 19:08 jeremypw

Yes i can open it anyway but it opens it without syntax highlighting so I need to manually set it to PHP. There's was some further strange behaviour afterwards and I let go and started it with Kate - the standard editor from Manjaro Linux without any issues.

But anyway I like the code editor because it's very tiny and powerful. I want to make it to my main p Code editor. Best search and replace I've ever seen without the need of regex for linebreaks for example.

Jeremy Wootten @.***> schrieb am So., 20. Aug. 2023, 21:02:

What happens when you press "Show Anyway"? Can you load the file with nano or another simple text editor?

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1685366539, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDPISMHEIE3KZQ726HLXWJNKVANCNFSM6AAAAAA3XMI7UM . You are receiving this because you authored the thread.Message ID: @.***>

tamer73 avatar Aug 21 '23 06:08 tamer73

I'm not at home and I will try it out on my laptop soon on an simple editor

Jeremy Wootten @.***> schrieb am So., 20. Aug. 2023, 21:02:

What happens when you press "Show Anyway"? Can you load the file with nano or another simple text editor?

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1685366539, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDPISMHEIE3KZQ726HLXWJNKVANCNFSM6AAAAAA3XMI7UM . You are receiving this because you authored the thread.Message ID: @.***>

tamer73 avatar Aug 21 '23 06:08 tamer73

@tamer73 Thanks for the info. Could you try running Code from the terminal command line (io.elementary.code) and see what output is produced when you try to load the problematic file? You should see a critical error message from the SourceFileLoader with more information. If you could make the problem file available it would help investigate the problem.

jeremypw avatar Aug 21 '23 08:08 jeremypw

Sure I will do that as soon as possible. I'm on vacancy right now😊

Jeremy Wootten @.***> schrieb am Mo., 21. Aug. 2023, 10:19:

@tamer73 https://github.com/tamer73 Thanks for the info. Could you try running Code from the terminal command line (io.elementary.code) and see what output is produced when you try to load the problematic file? You should see a critical error message from the SourceFileLoader with more information. If you could make the problem file available it would help investigate the problem.

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1685871046, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDLTI7PNK7LIKEMWZA3XWMKYDANCNFSM6AAAAAA3XMI7UM . You are receiving this because you were mentioned.Message ID: @.***>

tamer73 avatar Aug 22 '23 09:08 tamer73

sorry for answering late, but now im on my laptop and get the same result.

Opening in Terminal tells: ** (io.elementary.code:2360): CRITICAL **: 17:54:05.757: Document.vala:373: Es gab einen Fehler bei der Zeichensatzkonvertierung und ein Ausweichzeichen musste genutzt werden.

I can open the file with nano without any issues.

Sadly i cant provide the file, because its sensitive Code from my company!

I hope it helps anyway

Am Mo., 21. Aug. 2023 um 10:19 Uhr schrieb Jeremy Wootten < @.***>:

@tamer73 https://github.com/tamer73 Thanks for the info. Could you try running Code from the terminal command line (io.elementary.code) and see what output is produced when you try to load the problematic file? You should see a critical error message from the SourceFileLoader with more information. If you could make the problem file available it would help investigate the problem.

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1685871046, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDLTI7PNK7LIKEMWZA3XWMKYDANCNFSM6AAAAAA3XMI7UM . You are receiving this because you were mentioned.Message ID: @.***>

tamer73 avatar Aug 26 '23 16:08 tamer73

The translation: There was a character set conversion error and a fallback character had to be used.

Tamer Denizli @.***> schrieb am Sa., 26. Aug. 2023, 18:00:

sorry for answering late, but now im on my laptop and get the same result.

Opening in Terminal tells: ** (io.elementary.code:2360): CRITICAL **: 17:54:05.757: Document.vala:373: Es gab einen Fehler bei der Zeichensatzkonvertierung und ein Ausweichzeichen musste genutzt werden.

I can open the file with nano without any issues.

Sadly i cant provide the file, because its sensitive Code from my company!

I hope it helps anyway

Mit freundlichen Grüßen

Tamer Denizli Brecherspitzstr. 6a 81541 München

Mobil: +49 (0) 177 / 4444478

Am Mo., 21. Aug. 2023 um 10:19 Uhr schrieb Jeremy Wootten < @.***>:

@tamer73 https://github.com/tamer73 Thanks for the info. Could you try running Code from the terminal command line (io.elementary.code) and see what output is produced when you try to load the problematic file? You should see a critical error message from the SourceFileLoader with more information. If you could make the problem file available it would help investigate the problem.

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1685871046, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDLTI7PNK7LIKEMWZA3XWMKYDANCNFSM6AAAAAA3XMI7UM . You are receiving this because you were mentioned.Message ID: @.***>

tamer73 avatar Aug 27 '23 16:08 tamer73

Ah, OK. I wonder why Gtk.SourceLoader produces that error but nano does not. Not sure if we need show that information to the user or just load the file anyway. As you found, you can use "Show Anyway" to load the file. Can you see which character(s) have been altered? If you have the right language pack(s) installed you should have all the character sets you need I would have thought.

jeremypw avatar Aug 27 '23 18:08 jeremypw

Maybe because nano is console based? I will take a look at it soon. Just arrived home a few minutes ago and my cats are surprisingly very impatient😂

Think I will find some time tomorrow. Have a good evening 🙋🏻‍♂️

Jeremy Wootten @.***> schrieb am So., 27. Aug. 2023, 20:42:

Ah, OK. I wonder why Gtk.SourceLoader produces that error but nano does not. Not sure if we need show that information to the user or just load the file anyway. As you found, you can use "Show Anyway" to load the file. Can you see which character(s) have been altered? If you have the right language pack(s) installed you should have all the character sets you need I would have thought.

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1694734491, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDNPJZAXNQOTCNLSU63XXOILHANCNFSM6AAAAAA3XMI7UM . You are receiving this because you were mentioned.Message ID: @.***>

tamer73 avatar Aug 27 '23 18:08 tamer73

Is the original file encoded as UTF-8 or something else?

It may be possible to fix this by setting candidate encodings in the loader so that more than one encoding is tried. If you are able to produce a non-sensitive file that still gives the error that would help develop a fix.

jeremypw avatar Aug 27 '23 18:08 jeremypw

I will take a look as soon as possible

Jeremy Wootten @.***> schrieb am So., 27. Aug. 2023, 20:55:

Is the original file encoded as UTF-8 or something else?

It may be possible to fix this by setting candidate encodings in the loader so that more than one encoding is tried. If you are able to produce a non-sensitive file that still gives the error that would help develop a fix.

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1694736580, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDO556DD4ND4UKMOHA3XXOJZHANCNFSM6AAAAAA3XMI7UM . You are receiving this because you were mentioned.Message ID: @.***>

tamer73 avatar Aug 27 '23 19:08 tamer73

after loading the file anyway, its necessary to choose php file for syntax highlighting. But major issue here is that i can't save changes! Save as dont works too btw...

I can't see any changes but its hard to track 4477 lines of code.

After closing the editor after starting over zsh i get following errors: (io.elementary.code:2044): GLib-GObject-CRITICAL **: 12:49:06.607: ../glib/gobject/gsignal.c:2778: instance '0x5609752ab400' has no handler with id '575'

(io.elementary.code:2044): GLib-GObject-CRITICAL **: 12:49:06.614: ../glib/gobject/gsignal.c:2778: instance '0x560975511cd0' has no handler with id '3329'

(io.elementary.code:2044): GLib-GObject-CRITICAL **: 12:49:06.615: ../glib/gobject/gsignal.c:2778: instance '0x5609755368f0' has no handler with id '3445'

(io.elementary.code:2044): GLib-GObject-CRITICAL **: 12:49:06.615: ../glib/gobject/gsignal.c:2778: instance '0x5609755325e0' has no handler with id '3414'

I think there are 4 errors because i*ve tried to save the file including save as

But i think this will help:

file edit-test.php

edit-test.php: HTML document, ISO-8859 text, with very long lines (3330), with CRLF, CR line terminators

Hope this will help meanwhile - need to pickup gf now

Am So., 27. Aug. 2023 um 20:42 Uhr schrieb Jeremy Wootten < @.***>:

Ah, OK. I wonder why Gtk.SourceLoader produces that error but nano does not. Not sure if we need show that information to the user or just load the file anyway. As you found, you can use "Show Anyway" to load the file. Can you see which character(s) have been altered? If you have the right language pack(s) installed you should have all the character sets you need I would have thought.

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1694734491, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDNPJZAXNQOTCNLSU63XXOILHANCNFSM6AAAAAA3XMI7UM . You are receiving this because you were mentioned.Message ID: @.***>

tamer73 avatar Aug 28 '23 10:08 tamer73

@tamer73 I think you need to post the test file to e.g. https://pastebin.com/ or maybe use the "Attach files by dragging & dropping, selecting or pasting them" function at the bottom of the GitHub comment box (although I've only ever used that for pictures). Or you could send it as an email attachment to [email protected] or [email protected]

jeremypw avatar Aug 28 '23 14:08 jeremypw

I'm really sorry that i cant do that - its too sensitive data. Tried already to narrow it down to non sensitive parts but thats really to much effort for me now!

But i think i found the issue while trying to find the longest line!

awk 'length > max_length { max_length = length; longest_line = $0 } END { print longest_line }' edit-test.php

 1|127 ✘ awk: Kommandozeile:1: (FILENAME=edit-test.php FNR=361) ... Warnung: Es wurden unbekannte Multibyte-Daten gefunden. Ihre Daten entsprechen eventuell nicht der gesetzten Region ...

Translation of the warning: Warning: Unknown multibyte data was found. Your data may not correspond to the set region

I'm just wondering that i cant reproduce this on other editors.

Now installing notepadqq which was my favourite editor in the past. its an lightweight editor and takes less then geany but much more then elementary-code due to its dependencies on actual manjaro linux

Am Mo., 28. Aug. 2023 um 16:20 Uhr schrieb Jeremy Wootten < @.***>:

@tamer73 https://github.com/tamer73 I think you need to post the test file to e.g. https://pastebin.com/ or maybe use the "Attach files by dragging & dropping, selecting or pasting them" function at the bottom of the GitHub comment box (although I've only ever used that for pictures). Or you could send it as an email attachment to @.*** or @.***

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1695788630, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDNOKSQ54IZEIK6IOX3XXSSLVANCNFSM6AAAAAA3XMI7UM . You are receiving this because you were mentioned.Message ID: @.***>

tamer73 avatar Aug 28 '23 14:08 tamer73

ok now i narrowed it down to a php file with 125 Byte and it looks like the issue is relevant when this non utf-8 happens multiple times! Looks like an temporary buffer overflow when too much handling exceptions accur!

Notepadqq doesnt have issues with this file.

I have send an email to you with this file. It shows different handling of non UTF-8 with the same file on notepadqq and elementary/code! My locale is set to UTF-8

Finally a small file to reproduce the issue

Am Mo., 28. Aug. 2023 um 16:20 Uhr schrieb Jeremy Wootten < @.***>:

@tamer73 https://github.com/tamer73 I think you need to post the test file to e.g. https://pastebin.com/ or maybe use the "Attach files by dragging & dropping, selecting or pasting them" function at the bottom of the GitHub comment box (although I've only ever used that for pictures). Or you could send it as an email attachment to @.*** or @.***

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1695788630, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDNOKSQ54IZEIK6IOX3XXSSLVANCNFSM6AAAAAA3XMI7UM . You are receiving this because you were mentioned.Message ID: @.***>

tamer73 avatar Aug 28 '23 15:08 tamer73

Thanks for your efforts in narrowing down the cause :heart: - I'll try to get a fix out soon.

jeremypw avatar Aug 29 '23 08:08 jeremypw

Glad to help! You're welcome. You've put so much effort into this and I really like small footprint projects like this. Don't know any other graphical editor which is so tiny and powerful. Despite its so tiny I It has the best search and replace outside of regex.

Thank you too for this masterpiece 😊👍

Jeremy Wootten @.***> schrieb am Di., 29. Aug. 2023, 10:12:

Thanks for your efforts in narrowing down the cause ❤️ - I'll try to get a fix out soon.

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1696974318, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDMQZZQIMDTGHS65RKTXXWP7NANCNFSM6AAAAAA3XMI7UM . You are receiving this because you were mentioned.Message ID: @.***>

tamer73 avatar Aug 29 '23 08:08 tamer73

So it seems that your file is encoded in "DOS format" (according to nano) and the culprit line is converted by nano to

//######################################## �berpr�fen ########################################

and by Code to

//######################################## \FCberpr\FCfen ########################################

Two characters have been replaced by "unknown character" characters.

If you use "Save As" in Code to save the file (immediately after using "Show Anyway") with either the same or different name, close the original tab and then open the saved file it loads correctly and is recognized as PHP. This is actually intended behaviour for dealing with what Code thinks are potentially corrupted/non-text files - it stops you trying to edit them and potentially make things worse.

However, in this case the original file was misidentified as problematic due to DOS encoding which, it appears, the Gtk.SourceLoader does not handle properly by default. I'll see if there is a way round this.

jeremypw avatar Aug 29 '23 08:08 jeremypw

That's strange, in one of my trying I've deleted everything else and then code opened it without and messages. That's why I thought there must be some more issues together to get that issue.

Now I have opened that file with notepadqq over command line and no errors occurs there which would help...

Jeremy Wootten @.***> schrieb am Di., 29. Aug. 2023, 10:48:

So it seems that your file is encoded in "DOS format" (according to nano) and the culprit lint is converted by nano to

//######################################## �berpr�fen ########################################

and by Code to

//######################################## \FCberpr\FCfen ########################################

Two characters have been replaced by "unknown character" characters.

If you use "Save As" in Code to save the file (immediately after using "Show Anyway") with either the same or different name, close the original tab and then open the saved file it loads correctly and is recognized as PHP. This actually intended behaviour for dealing with what Code thinks are potentially corrupted/non-text files - it stops you trying to edit them and potentially make things worse.

However, in this case the original file was misidentified as problematic due to DOS encoding which, it appears, the Gtk.SourceLoader does not handle properly by default. I'll see if there is a way round this.

— Reply to this email directly, view it on GitHub https://github.com/elementary/code/issues/1370#issuecomment-1697025564, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMSUDJYO5TBQSPLMQXKMDTXXWUFJANCNFSM6AAAAAA3XMI7UM . You are receiving this because you were mentioned.Message ID: @.***>

tamer73 avatar Aug 29 '23 08:08 tamer73

I can convert you file so that Code shows the expected characters (I presume) using the command:

iconv -f ISO-8859-1 -t UTF-8//TRANSLIT edit-test2.php -o iconv-utf.php

I sent the output to a separate file to avoid overwriting the original. Opening the converted file shows:

//######################################## überprüfen ########################################

Doing an octal dump on the original shows that the problem characters are encoded as hexadecimal FC which is a non-text character so that may be why Code chokes on it without the pre-conversion.

I presume the file comes from a Windows system? Would you be wanting to return it to Windows after editing on Linux?

jeremypw avatar Aug 29 '23 10:08 jeremypw

Looking into this it is surprisingly complicated to fix. I can get Code to load the file with the Windows character set by forcing the loader to use that charset/encoding - but then "normal" Linux files have some characters misinterpreted. There does not seem to be any guaranteed way to get the encoding and charset automatically from the file before actually loading it and it seems the Gtk.SourceFileLoader only detects the encoding, not the characterset during actually loading it.

I see NotepadQQ has gone to a lot of trouble to handle a wide variety of encodings/charsets and allows the user to choose and convert between them so it is clearly possible.

However, Code is primarily targeted at developing software on Linux and there is limited resources for its development. As this is an edge case it may not be fixed soon. Probably the best we could do is to offer a choice of character sets to the user to try out on the file if it fails to load - this assumes the user knows what character set to choose is though.

The best way forward for you is probably to convert the file out of an old unsupported format and into a modern one that both Linux and Windows support out of the box.

jeremypw avatar Aug 29 '23 12:08 jeremypw

Thanks for taking a look at it. I'm ok with it and I get easily around it. Just wanted to inform you and at least we could get a little more light into this behaviour

tamer73 avatar Aug 29 '23 12:08 tamer73

The source of this file is close to twenty years old. Needs to be reprogrammed anyway so ain't no worries. This was created from my boss in a way no one would do today anymore. So everything's fine and there's no pressure from any side :-) Should I close that here?

tamer73 avatar Aug 29 '23 12:08 tamer73

Well I'll leave it open with a revised description as it is a valid issue, but it will probably have a low priority for fixing unless another dev can see an easy fix.

jeremypw avatar Aug 29 '23 15:08 jeremypw