mpvacious icon indicating copy to clipboard operation
mpvacious copied to clipboard

Copying subtitles with diacritics unexpectedly strips spaces

Open artjomsR opened this issue 2 years ago • 6 comments

Environment

  1. OS name Windows 10, mpvacious_v0.23 Describe the bug A description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Load subtitles with diacritics. Copy the subtitles either automatically or using Ctrl+C.
  2. Subtitles have spaces trimmed if string contains diacritics. So, "héllo héllo" becomes "héllohéllo"

Expected behavior Subtitles are copied as-is

I can't seem to isolate where the problem lies. Individual components work correctly but collectively it doesn't work. A bat file outputs the content as expected:

@echo off
chcp 65001 >nul
echo héllo héllo|clip

I've also tried hardcoding the string and that produced strings with whitespaces correctly:

self.copy_to_clipboard = function(text)
    ...
    mp.commandv("run", "cmd.exe", "/d", "/c", string.format("@echo off & chcp 65001 >nul & echo %s|clip", "héllo héllo"))

I tried using set /p instead of echo but that didn't do anything

I've tried powershell but that didn't seem to work at all - it just flashes a terminal window without copying anything, maybe I didn't get the params quite right

Any suggestions would be appreciated

artjomsR avatar Jul 27 '23 22:07 artjomsR

Any suggestions would be appreciated

You can use powershell to send text to clipboard, like it was done in videoclip, here. Not sure if you tried this exact method or not.

Also, have you tried GNU/Linux? Here is a list of recommended GNU distributions.

tatsumoto-ren avatar Jul 27 '23 23:07 tatsumoto-ren

Thank you for getting back to me! I've figured out the problem: this works fine before 0.20. And 0.21 is a breaking change. I've set nuke_spaces=no and it works as expected, strings with diacritics do not get spaces removed! 🎉

I assume this is because it finds a non latin character (é) and then assumes it's language e.g. Japanese?

Thank you for your suggestion, I'm using Linux as well but haven't switched completely due to other tools I'm using :)

artjomsR avatar Jul 28 '23 11:07 artjomsR

I assume this is because it finds a non latin character (é) and then assumes it's language e.g. Japanese?

Yes

tatsumoto-ren avatar Jul 28 '23 20:07 tatsumoto-ren

It would be beneficial if someone made a PR that replaces cmd.exe calls with powershell when copying text to the system clipboard. Initially, we implemented copying via cmd.exe for compatibility with windows 7, but today I think people don't use that anymore. I can't do this PR myself because I don't use windows.

tatsumoto-ren avatar Jul 28 '23 20:07 tatsumoto-ren

If it ain't broke, don't fix it? In this particular issue, I don't think it's an issue with cmd, but rather the implementation of nuke_spaces. And I think this issue will be present in Mac / Unix systems too

I think it's a bug, because it's clearly confusing how spaces are being removed (and it used to work fine before 0.21). I think nuke_spaces should either check that

  • ALL characters are non-latin and only then trim spaces
  • Or maybe enable spaces as soon as ANY latin character is detected (if it's a mix of e.g. English and Japanese we still want spaces?)

artjomsR avatar Jul 29 '23 13:07 artjomsR

If it ain't broke, don't fix it? In this particular issue, I don't think it's an issue with cmd

Some concerns have been raised regarding the use of cmd. https://github.com/Ajatt-Tools/mpvacious/pull/20

tatsumoto-ren avatar Jul 30 '23 14:07 tatsumoto-ren