FoxEngine.TranslationTool icon indicating copy to clipboard operation
FoxEngine.TranslationTool copied to clipboard

Editing of one enrty cannot be successful what so ever

Open abuali129 opened this issue 9 years ago • 25 comments

I'm still continue with my project, and I develop a pattern for localizing the TPP to arabic, which is appear to be successful in every .subp and all entries in them. But there's one entry that whatever I do it is get corrupted inside the game,

Entry Id="2161021477" in the tape.subp Cassette tape is Skull Face's Objective [4] Track Secret Recording of Skull Face and Code Talker [2]

Whenever I modify this entry and put it on the game, the subtitles won't show, and also the rewind and forward buttons are getting corrupted. I provide a two .subp sample containing just the subjected entry, for both the original and modified one. Also see the videos to look at the original behavior and the corrupted

Videos original https://www.youtube.com/watch?v=VpmW2LG6oBk corrupted https://www.youtube.com/watch?v=xNhw1nteyLg

Samples Original https://mega.nz/#!zM8khIwb!L-ORh-oHNcA3H1NqgC7YUOijx6oeTvp4BelGiJW6MbU corrupted https://mega.nz/#!rEkFDSaL!OpJ2ntxyVx3ittkq2r6i72V2U_GoRYIU7LI5YBhx24E

abuali129 avatar Feb 10 '17 12:02 abuali129

The only difference between the two files you provided is the text content and length (if they are both utf-8 encoded).

The "corrupted" one seems to be missing the character ID prefixes ([C=37]) in each line. Perhaps the game can only "skip" to lines with a character ID.

Original:

<Line Text="[C=37]Forgive me, but my schedule has changed.">
    <Timing Start="576" End="830" />
</Line>

Modded:

<Line Text=".ばチをす ぬウ タガぐケろぅ オズぬち コォガ ぁタガ つケふぉ         ">
    <Timing Start="576" End="830" />
</Line>

Atvaark avatar Feb 10 '17 20:02 Atvaark

I already tried having the character ID prefixes, but same result. I found another entries that had the same issues with it. I'll make it ready just in the next moments.

abuali129 avatar Feb 10 '17 20:02 abuali129

Entry Id="856784307" Cassette tape is Truth Records Track Secret Recording with PAZ and ZERO

Original https://mega.nz/#!uRVTSDTa!UB7xpGYQY5WcEQrw02G0JF2MqC1-YX9YZPiCWPUkKtg Modified https://mega.nz/#!7A1WmChD!YxvS2283CXup87wsXOtNNUTA9Cc-eo5t155vx07DxZg Corrupted https://mega.nz/#!Td8RXZwZ!Un70HOvJIJcFP8efntkQ0HVSRXm-8FZnhCroHrzuoiw

Look at Lines# 138 & 336

in the Modified version, the file works probably, that's because I didn't touch these 2 lines.. in the Corrupted Version I add my text and same issue as before happened.

abuali129 avatar Feb 10 '17 20:02 abuali129

on the last sample that I provide, I did some tests. It appears that if the length of the whole file exceeds 17,572 the problem exists. I didn't take the length in mind on my project before, I thought that it is not gonna cause a problem... well do some test on the first sample just to get a good picture of what is happening

abuali129 avatar Feb 10 '17 22:02 abuali129

It could also be related to the characters in a line and the line length.

Your example is a lot larger than the original line. [C=20]ザはズぐぢケガく ホざアばをガく タア クスゴぉ

Could you try replacing the original line with substrings of varying length of your modded line? (1, 2, 3... characters) Maybe you can find out which length triggers the corruption.

Atvaark avatar Feb 11 '17 11:02 Atvaark

Yes I tried that already, I even put english words instead -with respect of crossing the maximum length-, l am pretty sure that the problem happens because I passed the maximum length in the entry. I did some tests on both files a I got a clear picture of what happened.

abuali129 avatar Feb 11 '17 13:02 abuali129

The conclusion of this, is that whenever the -length- if the whole enrty exceeds a specific lenght that each entry could take, no matter what substring is causing that, substring itself is not related directly but the length of the whole thing.

Hope if there's any means to increase the "length limit". And by the way not all characters equal in lenght, some of them as for one letter it add +4 to the length. I think that's related to unicode coding of the characters.

abuali129 avatar Feb 12 '17 19:02 abuali129

You're right with the different sizes for different UTF-8 codepoints.

As each entry in a subp file is saved as a single string with $-characters separating the lines, the max character limit per entry should be (assuming the entry has at least one line): 2^16-len(lines)-3

  1. 2^16-1 is max 16bit
  2. len(lines)-1 is the amount of $-characters required to separate the lines
  3. -1 for the NULL-terminator of the entry

As the game can't load these files correctly there have to be some other limits. Could you perhaps check which unmodified file has the largest entry and check if the supported size can be increased by changing the flags?

Atvaark avatar Feb 12 '17 21:02 Atvaark

update: seems like that reducing the length in the first sample cannot help either. I managed to shrink down the length of the first sample to 21987 by combining some line texts strings along with changing the timing value for them, maybe the method itself is not working?! I still don't know. here is the result https://mega.nz/#!vYcTHKaC!eHBW258SCIP0UXOWFPJ5xHrxvKfIYbjKHtPMP_v1ISw

As for flags that you mentioned, I reported earlier on another raised issue that Flags value is related to content, 1024 is for cutscenes, 768 for cassette tapes, and others have other uses.

abuali129 avatar Feb 12 '17 23:02 abuali129

Combining 2 lines will just save a single byte. 3472 of the 4525 UTF-8 codepoints used in your latest example are 3 bytes wide (the rest are 1 byte wide). So you won't save much space by combining them.

Did you check if the size limit you found is the same for each subp file or if some of them have different limits?

Atvaark avatar Feb 12 '17 23:02 Atvaark

The size limit, I'm not talking about the .supb because I have files that is have more bytes in it and it is working perfectly. untitled

But, the size limit of an entry inside the .subp is different from each one as for provided samples, the first sample length limit is 22,420, the 2nd sample length limit is 17,572

abuali129 avatar Feb 13 '17 00:02 abuali129

You're mapping arbitrary Japanese UTF-8 codepoints to Arabic letters, right? Could you try using only codepoints that are 1 byte wide instead of using the 3 byte ones? That alone could net you 6944 additional codepoints (to your latest example) before the corruption will start again.

Atvaark avatar Feb 13 '17 12:02 Atvaark

I had a third entry sample also which was corrupted but know I managed to fix it. If you want to look at it just let me know, also the 2nd sample is fixed just by removing some unwanted spaces, but the first one is something that cannot be repair here is the second sample fixed https://mega.nz/#!mQ8FVZwI!pC4-oZslXXptnyJWbYjiLqgwvgZQHI9NPDUXlD5a1UU

I tried using 1 byte letters as you suggested for the first sample, but still they can't cover all of the Arabic letters then I ended up using 2 byte letters with them, still the file is in corrupted status. Even if I merged line texts -which was the solution for the third sample- still no benefit. I managed to shrink to length to 21862 with 2,3 byte letters, and to 17,537 with merging line texts

abuali129 avatar Feb 14 '17 05:02 abuali129

How many distinct letters are there in the Arabic alphabet (+numerics and punctuation)? You should see that the most frequently used letters are encoded in 1 or 2 byte codepoints to save additional space. Either use this as source or analyze the frequency of your own subtitles.

Atvaark avatar Feb 14 '17 07:02 Atvaark

Only the letters and punctuation 140 in total, numbers and symbols are shared with Latin, also I cannot replace the one byte Latin letters as I use them almost. Anyway, looks like I will skip translating Entry Id="2161021477".

abuali129 avatar Feb 14 '17 07:02 abuali129

That's unfortunate.

Since I can't change the limits imposed by the engine I'd rather print an error if one of the subtitles doesn't fit in an entry.

I'll have to analyze all the unedited subp files to get some more facts about the limits.

Atvaark avatar Feb 14 '17 12:02 Atvaark

Any information I can provide for this? You just have to ask. And thanks hundred times for the awesome tool

abuali129 avatar Feb 14 '17 12:02 abuali129

Could you perhaps upload a zip archive with all subtitles? I don't have the game installed right now and would have to redownload it first.

Add me on Steam as sharing all these files publically here is likely against the Github ToS.

Atvaark avatar Feb 14 '17 12:02 Atvaark

All right, Steam id same as here?

abuali129 avatar Feb 14 '17 12:02 abuali129

There are three users by your name, I cannot identify you :)

abuali129 avatar Feb 14 '17 12:02 abuali129

Link

Atvaark avatar Feb 14 '17 13:02 Atvaark

Invitation sent

abuali129 avatar Feb 14 '17 13:02 abuali129

The entry with id 2161021477 is indeed the largest one in all subs.

The max sizes (in bytes) in the unmodified files are as follows: File: 513497 Entry: 7517 Line: 308

As long as these aren't exceeded the game should load them fine. Anything above these values needs some more testing.

Atvaark avatar Feb 15 '17 07:02 Atvaark

I will look at this in the evening, thanks.

abuali129 avatar Feb 15 '17 07:02 abuali129

I got 573,670 bytes for modified file working fine except for entry id 2161021477 however, along side the modified files I but back the original entry id 2161021477 the result is 572,211 bytes without any problems, however last 19 entries still at the original status

Entry Id="3976005522"
Entry Id="3983176914"
Entry Id="3985838335"
Entry Id="4015605908"
Entry Id="4033776865"
Entry Id="4038494047"
Entry Id="4044911970"
Entry Id="4084410907"
Entry Id="4123126871"
Entry Id="4131631805"
Entry Id="4181857144"
Entry Id="4201908311"
Entry Id="4205344688"
Entry Id="4209208445"
Entry Id="4230996696"
Entry Id="4272505980"
Entry Id="4275351727"
Entry Id="4277855698"
Entry Id="4289530536"

abuali129 avatar Feb 15 '17 20:02 abuali129