GPUJPEG icon indicating copy to clipboard operation
GPUJPEG copied to clipboard

Special Characters in File Paths

Open ajacquinot opened this issue 1 year ago • 4 comments

Hello, I'm currently using the gpujpeg_image_load_from_file function to open images in a C++ project. However, I've encountered an issue when trying to open files that have special characters in their names, such as (((φ(◎ロ◎;)φ))).jpg. It seems that the function uses char* for the file path parameter and relies on fopen internally, which does not handle file paths with special characters as I wish.

Is there another way to open this file, or would it be possible to update the gpujpeg_image_load_from_file function to handle file paths with special characters? (Maybe by using wide-character strings (wchar_t*) and _wfopen on Windows)

Regards, Adrien

ajacquinot avatar Sep 04 '24 14:09 ajacquinot

Hi,

thanks for reporting. I've tried with _wfopen, file name argument converted with MultiByteToWideChar with various codepages, CP_UTF8, CP_ACP - nothing worked.

Anyways, it seems like the value in argv is already incorrect, I've tried with 'φ' ("\xcf\x86" in UTF-8) and the particular argv[x] is single byte 0x66 ('f' in ASCII). I've tried also couple of other things like passing the value of phi to the function typed directly in code to C-str with MultiByteToWideChar(CP_UTF8,... conversion and _wfopen or without it (assuming multi-byte encoding to be understood by fopen). But without a success.

Is there another way to open this file,

You can of course open the file by yourself and read the data to a byte array - gpujpeg_decoder_decode() needs just the pointer to the buffer and its length. No need to use gpujpeg_image_load_from_file.

or would it be possible to update the gpujpeg_image_load_from_file function to handle file paths with special characters

I'd like to if i knew how to. I spent some time trying but without much success. So I cannot tell if I try any further.

Anyways, if you have some expertise or idea - could you make a minimal working example (POC) showing how to open such a file? Something like:

#include <cstdio>
int main(int argc, char *argv[]) {
    wchar_t *filename;
    // code to convert argv[1] to filename
    FILE *f = _wfopen(filename, L"rb");
    if (f) {
        puts("success!");
    }
}

(The string doesn't necessarily be passe from the command-line - if you write it to the code directly, I believe it will be OK as well, passing to gpujpeg_image_load_from_file is enough for you. This could be to work-around if there is some other problem passing the string from terminal.)

I am not sure if I don't have some problem with locales. So as a starting point to check if we both have the same situation, can you try?

#include <cstdio>
#include <cstring>
int main(int argc, char *argv[]) {
    printf("%zu\n", strlen(argv[1]));
}

and call it with φ as the argument? For me, it writes 1, which should not be the case unless you have Greek 8-bit encoding (I don't believe much of non-Greek 8b encodings have mapped the Greek alphabet)

MartinPulec avatar Sep 06 '24 07:09 MartinPulec

Hi, I think I've resolved the problem. There are few points that I was not aware of:

  1. to my snippet above - to handle (in a generic way) unicode characters, wmain with wchar_t *argv[] must be used. Otherwise MSW transliterates characters to something like Windows-1252 (not sure if depending on locale, I used English), so characters that have direct representation are kept, I believe 8-bit encoded but that not get transliterated (eg. 'ě'->'e' or 'φ'->'f').
  2. using _wfopen really helps (fopen doesn't seem to be possible to use with UTF-8 in general, according to the docu it just supports ANSI and OEM codepage - unless OEM can be set to UTF-8 but I doubt that).
  3. working with wide chars (_UNICODE) in Windows is really tedious and I advise to avoiding that unless needed

In the end, I've really implemented gpujpeg_image_load_from_file and gpujpeg_image_save_to_file with _wfopen() as you suggested. Argument to those function should be passed as UTF-8 "narrow" cstr and it is converted to wide char in those functions. The implemented solution seem to work for me:

gpujpegtool.exe -e 8x8.tst '(((φ(◎ロ◎;)φ))).jpg'

produces the output JPEG named as requested.

Note that currently the updated functions and gpujpegtool can process special character just in the names of RAW and JPEG files. Formats like Y4M/PAM/PNM/TGA/BMP etc. still fopen() the input output so those must be fixed separately.

MartinPulec avatar Jun 10 '25 13:06 MartinPulec

Thank you for your response! I’ll test your fix once the next release is available. Sorry I didn’t answer your previous questions — I had to find a workaround by first using a temporary file without special characters in its name, then renaming it afterward. Thanks again for your work and the changes you’ve made!

Regards, Adrien

ajacquinot avatar Jun 10 '25 13:06 ajacquinot

Sorry I didn’t answer your previous questions

No problem at all, at least I got some more insight how the non-ASCII stuff is handled in Windows (although I am still not completely certain in this area).

MartinPulec avatar Jun 11 '25 06:06 MartinPulec