Copying subtitles with diacritics unexpectedly strips spaces
Environment
- OS name Windows 10, mpvacious_v0.23 Describe the bug A description of what the bug is.
To Reproduce Steps to reproduce the behavior:
- Load subtitles with diacritics. Copy the subtitles either automatically or using Ctrl+C.
- Subtitles have spaces trimmed if string contains diacritics. So, "héllo héllo" becomes "héllohéllo"
Expected behavior Subtitles are copied as-is
I can't seem to isolate where the problem lies. Individual components work correctly but collectively it doesn't work. A bat file outputs the content as expected:
@echo off
chcp 65001 >nul
echo héllo héllo|clip
I've also tried hardcoding the string and that produced strings with whitespaces correctly:
self.copy_to_clipboard = function(text)
...
mp.commandv("run", "cmd.exe", "/d", "/c", string.format("@echo off & chcp 65001 >nul & echo %s|clip", "héllo héllo"))
I tried using set /p instead of echo but that didn't do anything
I've tried powershell but that didn't seem to work at all - it just flashes a terminal window without copying anything, maybe I didn't get the params quite right
Any suggestions would be appreciated
Any suggestions would be appreciated
You can use powershell to send text to clipboard, like it was done in videoclip, here. Not sure if you tried this exact method or not.
Also, have you tried GNU/Linux? Here is a list of recommended GNU distributions.
Thank you for getting back to me! I've figured out the problem: this works fine before 0.20. And 0.21 is a breaking change. I've set nuke_spaces=no and it works as expected, strings with diacritics do not get spaces removed! 🎉
I assume this is because it finds a non latin character (é) and then assumes it's language e.g. Japanese?
Thank you for your suggestion, I'm using Linux as well but haven't switched completely due to other tools I'm using :)
I assume this is because it finds a non latin character (é) and then assumes it's language e.g. Japanese?
Yes
It would be beneficial if someone made a PR that replaces cmd.exe calls with powershell when copying text to the system clipboard. Initially, we implemented copying via cmd.exe for compatibility with windows 7, but today I think people don't use that anymore. I can't do this PR myself because I don't use windows.
If it ain't broke, don't fix it? In this particular issue, I don't think it's an issue with cmd, but rather the implementation of nuke_spaces. And I think this issue will be present in Mac / Unix systems too
I think it's a bug, because it's clearly confusing how spaces are being removed (and it used to work fine before 0.21). I think nuke_spaces should either check that
- ALL characters are non-latin and only then trim spaces
- Or maybe enable spaces as soon as ANY latin character is detected (if it's a mix of e.g. English and Japanese we still want spaces?)
If it ain't broke, don't fix it? In this particular issue, I don't think it's an issue with cmd
Some concerns have been raised regarding the use of cmd. https://github.com/Ajatt-Tools/mpvacious/pull/20