"OSError: [Errno 36] File name too long" when len(filename) is less than 255
gallery-dl version: 1.22.4 Operating system: Debian Bullseye File System: ext4 Error:
[download][warning] OSError: [Errno 36] File name too long: '/home/user/gallery-dl/kemonoparty/fanbox/1234567/1921259_《1000円プランおまけPart1(陰毛無し)》水泳の授業で溺れた生徒を優しく介護してくれる先生【陰毛無しver、おっぱい+股間+同時攻めver、潮吹きあり、全裸あり、台詞(有・無)、おまけ(男無しver)】/1921259_54.jpg.part'
Getting "file name too long" error even when the len() of the filename (or directory name in this case) is well under 255. I believe this is due to the fact that name limits are based on the total bytes taken by the string and not the actual length of the string (see "Max. filename" in the info box on the right):
name = "1921259_《1000円プランおまけPart1(陰毛無し)》水泳の授業で溺れた生徒を優しく介護してくれる先生【陰毛無しver、おっぱい+股間+同時攻めver、潮吹きあり、全裸あり、台詞(有・無)、おまけ(男無しver)】"
len(name)
# 112
>>> len(name.encode("utf-8"))
# 280
With this being the cause, using {filename:.255} in the name format doesn't fix the issue as the length of the string is not the problem.
Indeed the common limit in file systems is bytes, not Unicode characters. You can just access array elements of the encode then decode that, ignoring a possible broken character at the end (not enough remaining bytes to complete it).
You can use a Python expression or module (https://github.com/mikf/gallery-dl/blob/master/docs/formatting.md) to run it in the string format I think.
For example:
title.encode('utf-8')[:255].decode('utf-8', 'ignore'))
And in https://github.com/nicotine-plus/nicotine-plus/pull/2053/files there is a more complex implementation that keeps the file extension if needed
title.encode('utf-8')[:255].decode('utf-8', 'ignore'))
Does that work in the config file?
It should, with this special formatting syntax.
@Hrxn @hikineet0 What does this setting look like in the config file? I'm not able to get it working.
Shouldn't this be default behavior?
@rpdelaney
"directory": ["{category}", "{service}", "{username} ({user})", "\fF {id}_{title.encode('utf-8')[:255 - len(id) - len('_') - len('imgext')].decode('utf-8', 'ignore')}"]
Here's what I used.
Thanks @hikineet0 : I'm getting an error when downloading from instagram:
[instagram][error] DirectoryFormatError: Applying directory format string failed (Name Error: name 'title' is not defined)
Okay, here's what I came up with. It seems to work on normal files but I don't have a test case to verify that this solves the problem with long unicode:
{
"filename": {
"len(\"{title} + '.' + {extension}\".encode('utf-8')) > 255": "\fF {id}_{title.encode('utf-8')[:254].decode('utf-8')}\u2026.{extension}",
"": "{id}_{title}.{extension}"
}
}
I think this should be fixed with https://github.com/mikf/gallery-dl/commit/69865dcc0567807fc0921337a9a0879610e103a6?
[formatter] implement slicing strings as bytes (https://github.com/mikf/gallery-dl/discussions/4087) prefixing a slice '[10:30]' with a lowercase b '[b10:30]' encodes the string to bytes in filesystem encoding before applying the slice