LinguaCafe icon indicating copy to clipboard operation
LinguaCafe copied to clipboard

For Japanese, bug where lemma is blank for word, word reading is off.

Open etherealite opened this issue 1 year ago • 7 comments

Woops, found another one.

image

Looks like this may have been parsed correctly but the lemma reading and word readings got crossed together or something. Lemma is empty. image

etherealite avatar Apr 30 '24 17:04 etherealite

That's an interesting one. Can you please copy paste this word here?

simjanos-dev avatar Apr 30 '24 17:04 simjanos-dev

Sure thing, I have it right here from the raw file.

過ごせる

In context.

9
00:00:44,060 --> 00:00:52,000
もうそういう人は、僕が頑張って働くからこそ、日本ではゴールデンウィークを過ごせる人がいっぱいいるんだ。

etherealite avatar Apr 30 '24 17:04 etherealite

Just a note. There's a known japanese issue with readings: #120.

simjanos-dev avatar Apr 30 '24 17:04 simjanos-dev

That's a weird one. I deleted that single word from my database(don't do this on your production db), imported it again, and it is correct. I'll investigate this more in the future with a fresh database and I'll use the subtitle file to test it. Please comment here if you find multiple of this. I used Japanese, but haven't seen this problem before, or just haven't noticed because it's rare.

I also realized that I know this word, I just haven't been reading for a long time. :(

simjanos-dev avatar Apr 30 '24 17:04 simjanos-dev

Hey, I'm super impressed that you can keep up more than one language at a time. I hope I don't forget as well lol.

You remember this thought right? 助けてくれてありがとう!

etherealite avatar Apr 30 '24 18:04 etherealite

Hey, I'm super impressed that you can keep up more than one language at a time.

I'm not sure what you mean, I only learn Japanese.

You remember this thought right? 助けてくれてありがとう!

Yes, I do!

simjanos-dev avatar Apr 30 '24 19:04 simjanos-dev

Sorry, but I cannot replicate this. This is what I see when I use an empty database, create an .srt file from your example, and import it as a subtitle:

ghissue

It is possible that it was imported from an other source first, and the inaccurate reading was generated there.

Are there maybe other words where you have kanji in your reading field? Or did you maybe use vocabulary import?

simjanos-dev avatar May 15 '24 16:05 simjanos-dev

That's alright, thanks for giving it a shot. I haven't had anything like this happen since, so I can't be of much help.

I'll be sure and report if I find this behavior again. You can close the issue.

etherealite avatar Jun 10 '24 09:06 etherealite