s5cmd icon indicating copy to clipboard operation
s5cmd copied to clipboard

Fix wildcard sync command if files have special characters

Open briceflaceliere opened this issue 1 year ago • 2 comments

Problem encountered: I am using s5cmd to sync several million files, some of which contain special characters (such as \u00a0, quotes, etc.). However, the sprintf("%q") function used in generateCommand modifies these file names, causing the cp command to no longer find the files correctly.

Solution provided: To resolve this issue, I used the github.com/kballard/go-shellquote library to properly escape URL parameters without converting UTF-8 characters. This process is applied only when necessary, ensuring file names are preserved.

Test changes: I modified the tests accordingly. However, I am not certain of the potential impact on other parts of the project. I will be able to confirm if this resolves the synchronization issues once the migration of our millions of files is completed.

Related issues:

  • https://github.com/peak/s5cmd/issues/728
  • https://github.com/peak/s5cmd/issues/521
  • https://github.com/peak/s5cmd/issues/691

briceflaceliere avatar Oct 01 '24 16:10 briceflaceliere

Hey @briceflaceliere, thanks for sending a patch! Appreciated. Using shellquote looks good to me.

Could you add your scenario to cp integration test case to demonstrate the fix please? Thank you.

https://github.com/peak/s5cmd/blob/12d10381b36a27004e1f690992ebb7f6ba97c171/e2e/cp_test.go#L2142-L2146

igungor avatar Oct 18 '24 07:10 igungor

Hello is there anything what could be done to get this fix in the next release please?

rmey avatar Jun 16 '25 07:06 rmey