Aegisub icon indicating copy to clipboard operation
Aegisub copied to clipboard

[Feature Request] Allow word-splitting at the cursor position in the Spectrogram when in Karaoke Mode by pressing a key

Open MianSoft opened this issue 4 months ago • 2 comments

Description

I would like to request the ability to directly split at the cursor position in the Spectrogram while in Karaoke Mode, triggered by pressing a dedicated key (for example, the middle mouse button or a customizable hotkey). This would reduce errors and improve efficiency when creating Karaoke timing.

Image

Image

Reason

I have recently been using Aegisub to create TTML lyrics (see this project for details), and I encountered a significant workflow issue.

My usual process is to perform a quick rough segmentation for all lines first (see below):

他生了茧的手掌

{\k14}他{\k31}生{\k15}了{\k25}茧{\k47}的{\k27}手{\k27}掌

Some words in the song require additional splitting in the middle or at the end.
For example, the characters and below correspond to cases where “a word has finished singing but more words follow” and “the final word of the line is sung but the line itself hasn’t ended”:

{\k14}他{\k31}生{\k15}了{\k25}茧{\k47}的{\k27}手{\k27}掌

{\k14}他{\k31}生{\k15}了{\k3}{\k25}茧{\k47}的{\k27}手{\k27}掌{\k8}

Currently, this process forces me to constantly shift my focus between the Segmentation Bar and the Spectrogram. When a line contains many word-splits, the cognitive load increases significantly. Moreover, if the cursor is slightly misaligned, the split may merge with the wrong segment.

Translation by ChatGPT.

点击此处展开查看原文 | click here to expand and view the original text. 标题: [功能请求] 在卡拉 OK 模式下允许在频谱窗口通过按下按键以指针位置进行分词

功能描述: 所以我希望可以直接在卡拉 OK 模式的频谱窗口上,通过某个点击按键(例如鼠标中键或自定义的快捷键)后按照指针所在的位置进行分词,这样可以降低失误频率并提高创作效率。

Image

Image

原因: 我最近在使用 Aegisub 为歌曲制作 TTML 歌词(具体实现可以查看该项目),然后我在使用过程中遇到了一个痛点

由于我的习惯是先为所有行进行一遍快速分词(参考下方)

他生了茧的手掌

{\k14}他{\k31}生{\k15}了{\k25}茧{\k47}的{\k27}手{\k27}掌

一些单词在歌曲中需要在中间或结尾进行一次分词 比如下方的 字,分别对应了 某单词已经唱完了但是后面还有单词某行最后的单词唱完了但这行还没结束 的情况

{\k14}他{\k31}生{\k15}了{\k25}茧{\k47}的{\k27}手{\k27}掌

{\k14}他{\k31}生{\k15}了{\k3}{\k25}茧{\k47}的{\k27}手{\k27}掌{\k8}

在这个过程中我需要把目光放在分词栏与频谱窗口之间来回切换,而且如果一行内有大量分词会造成较大的理解负担 而且有时我只是想分词结果指针歪了一点就变成合并了某个词的分割

MianSoft avatar Sep 22 '25 21:09 MianSoft

i totally agree with him.

amanosatosi avatar Sep 25 '25 14:09 amanosatosi

here is what i thought can be added to aegisub.

How it could work - method 1

  • Add a new button near the audio tools.

  • When activated, it reuses the existing karaoke splitting logic (you can already split at any character).

  • In this mode, editing happens directly in the spectrogram:

    • Left-click sets a \k boundary for the current split/character.
    • Drag adjusts boundaries more precisely.

Example

Lyrics: あめにじんだあしもと (There’s a brief no-vocal gap between “め(me)” and “に(ni)”.)

  1. Initial split: あ | め || に | じ | ん | だ | あ | し | も | と

  2. Then click in the spectrogram to set:

    • First for “あ”
    • Next for “め”
    • Then add an extra click for the silent gap (like {\k15}了{\k3}{\k25}茧)
    • Then continue with “に”, “じ”, “ん”, etc.
  3. Commit behavior:

    • If some splits/characters are left untimed when pressing Enter/Commit, they automatically get all of the remaining duration as their \k value.
    • If the user clicks once after the last character, that extra boundary creates a final empty \k covering any leftover silence.

That way the workflow is clear: you only click what you need, and the system guarantees every segment gets a \k.

Method 2 (Vocaloid-style workflow)

  1. Make timing with mouse first

    • In the spectrogram, click/drag to mark the \k segments for every sung part.
    • Leave silent areas untimed (blank \k timing will be added for them.)
  2. Assign text afterward

    • Once the segments are in place, type or paste the lyric text.
    • Each segment gets one character/syllable in order.
  3. Control splits with keyboard

    • Press Space after syl characters to indicate “split here” (assign to next segment).
    • Press Double Space to insert an actual space character into the lyric and also advance to the next \k segment.

This matches the workflow of Vocaloid/UTAU editors: timing is aligned first, and the lyric text is “poured into” the prepared slots afterward.

amanosatosi avatar Sep 25 '25 16:09 amanosatosi