gallery-dl icon indicating copy to clipboard operation
gallery-dl copied to clipboard

"OSError: [Errno 36] File name too long" when len(filename) is less than 255

Open hikineet0 opened this issue 3 years ago • 3 comments

gallery-dl version: 1.22.4 Operating system: Debian Bullseye File System: ext4 Error:

[download][warning] OSError: [Errno 36] File name too long: '/home/user/gallery-dl/kemonoparty/fanbox/1234567/1921259_《1000円プランおまけPart1(陰毛無し)》水泳の授業で溺れた生徒を優しく介護してくれる先生【陰毛無しver、おっぱい+股間+同時攻めver、潮吹きあり、全裸あり、台詞(有・無)、おまけ(男無しver)】/1921259_54.jpg.part'

Getting "file name too long" error even when the len() of the filename (or directory name in this case) is well under 255. I believe this is due to the fact that name limits are based on the total bytes taken by the string and not the actual length of the string (see "Max. filename" in the info box on the right):

name = "1921259_《1000円プランおまけPart1(陰毛無し)》水泳の授業で溺れた生徒を優しく介護してくれる先生【陰毛無しver、おっぱい+股間+同時攻めver、潮吹きあり、全裸あり、台詞(有・無)、おまけ(男無しver)】"
len(name)
# 112
>>> len(name.encode("utf-8"))
# 280

With this being the cause, using {filename:.255} in the name format doesn't fix the issue as the length of the string is not the problem.

hikineet0 avatar Jul 19 '22 07:07 hikineet0

Indeed the common limit in file systems is bytes, not Unicode characters. You can just access array elements of the encode then decode that, ignoring a possible broken character at the end (not enough remaining bytes to complete it).

You can use a Python expression or module (https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md) to run it in the string format I think.

For example:

title.encode('utf-8')[:255].decode('utf-8', 'ignore'))

And in https://github.com/nicotine-plus/nicotine-plus/pull/2053/files there is a more complex implementation that keeps the file extension if needed

layercak3 avatar Jul 21 '22 09:07 layercak3

title.encode('utf-8')[:255].decode('utf-8', 'ignore'))

Does that work in the config file?

hikineet0 avatar Jul 21 '22 17:07 hikineet0

It should, with this special formatting syntax.

Hrxn avatar Jul 22 '22 17:07 Hrxn

@Hrxn @hikineet0 What does this setting look like in the config file? I'm not able to get it working.

Shouldn't this be default behavior?

rpdelaney avatar Dec 08 '22 16:12 rpdelaney

@rpdelaney "directory": ["{category}", "{service}", "{username} ({user})", "\fF {id}_{title.encode('utf-8')[:255 - len(id) - len('_') - len('imgext')].decode('utf-8', 'ignore')}"] Here's what I used.

hikineet0 avatar Feb 25 '23 09:02 hikineet0

Thanks @hikineet0 : I'm getting an error when downloading from instagram:

[instagram][error] DirectoryFormatError: Applying directory format string failed (Name Error: name 'title' is not defined)

rpdelaney avatar Mar 03 '23 16:03 rpdelaney

Okay, here's what I came up with. It seems to work on normal files but I don't have a test case to verify that this solves the problem with long unicode:

{
    "filename": {
        "len(\"{title} + '.' + {extension}\".encode('utf-8')) > 255": "\fF {id}_{title.encode('utf-8')[:254].decode('utf-8')}\u2026.{extension}",
        "": "{id}_{title}.{extension}"
    }
}

rpdelaney avatar Mar 03 '23 16:03 rpdelaney

I think this should be fixed with https://github.com/mikf/gallery-dl/commit/69865dcc0567807fc0921337a9a0879610e103a6?

[formatter] implement slicing strings as bytes (https://github.com/mikf/gallery-dl/discussions/4087) prefixing a slice '[10:30]' with a lowercase b '[b10:30]' encodes the string to bytes in filesystem encoding before applying the slice

Hrxn avatar Jun 21 '23 19:06 Hrxn