Hard-hitting crash (game_sa.dll @ 0x00011d7a)
Describe the bug There is a Top 10 crash (game_sa.dll, offset 0x00011d7a) with a count of 11,000 unique victims.
Pre-note: Before describing the issue, below you will find regularly updated offsets. I hope that listed offsets will allow crash victim server owners to find this issue & discover what's going on, triggered by bad model(s) on their server.
Update (18th Dec 2021): The crash offset is 0x000121ea with the new main release (r21048). I got reminded of this issue (as i commented again at the test PR #2319) as this crash is still the first non-memory abuse (server optimization issues) one from the top down. After a week on the current build, it's now at rank #11 as you can see at the new top crash stats and to give a better start to server owners, here's a list of top servers for this crash to occur: server crash list
The offset obviously shifts with new MTA revisions, but then generally starts with 00011d or 000121 (last 2 digits shifting), although as reported by a server owner with affected players (see clarified later in this issue) players probably won't see that anyways, as instead of a crash dialog they are said to have MTA suddenly close and face a window like "MTA was unable to replace a model [id x]" instead of crash dialog.
Dumptrace 1:
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
CONTEXT: (.ecxr)
eax=00000000 ebx=0028f1b8 ecx=11c0af28 edx=00000000 esi=0028f1b8 edi=1b96a2e8
eip=71e51d7a esp=0028f100 ebp=0028f128 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246
game_sa!CFileLoader_SetRelatedModelInfoCB+0xda:
71e51d7a 8be5 mov esp,ebp
Resetting default scope
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 71e51d7a (game_sa!CFileLoader_SetRelatedModelInfoCB+0x000000da)
ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
ExceptionFlags: 00000001
NumberParameters: 1
Parameter[0]: 00000002
Subcode: 0x2 FAST_FAIL_STACK_COOKIE_CHECK_FAILURE
PROCESS_NAME: gta_sa.exe
ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.
EXCEPTION_CODE_STR: c0000409
EXCEPTION_PARAMETER1: 00000002
FAULTING_THREAD: 00001380
STACK_TEXT:
0028dee4 00000000 56050048 0028e744 00000000 ntdll!ZwGetContextThread+0x12
FAULTING_SOURCE_LINE: C:\BuildAgent\work\675e5b8e8f135823\Client\game_sa\CFileLoaderSA.cpp @ 161
SYMBOL_NAME: game_sa!CFileLoader_SetRelatedModelInfoCB+da
IMAGE_NAME: game_sa.dll
FAILURE_BUCKET_ID: STACK_BUFFER_OVERRUN_STACK_BUFFER_OVERRUN_MISSING_GSFRAME_c0000409_game_sa.dll!CFileLoader_SetRelatedModelInfoCB
Dumptrace 2:
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
CONTEXT: (.ecxr)
eax=00000000 ebx=0028ea98 ecx=d925f008 edx=000000c0 esi=0028ea98 edi=20aebd08
eip=6faf1d7a esp=0028e9e0 ebp=0028ea08 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00200246
game_sa!CFileLoader_SetRelatedModelInfoCB+0xda:
6faf1d7a 8be5 mov esp,ebp
Resetting default scope
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 6faf1d7a (game_sa!CFileLoader_SetRelatedModelInfoCB+0x000000da)
ExceptionCode: c0000409 (Security check failure or stack buffer overrun)
ExceptionFlags: 00000001
NumberParameters: 1
Parameter[0]: 00000002
Subcode: 0x2 FAST_FAIL_STACK_COOKIE_CHECK_FAILURE
PROCESS_NAME: proxy_sa.exe
ERROR_CODE: (NTSTATUS) 0xc0000409 - The system detected an overrun of a stack-based buffer in this application. This overrun could potentially allow a malicious user to gain control of this application.
EXCEPTION_CODE_STR: c0000409
EXCEPTION_PARAMETER1: 00000002
FAULTING_THREAD: 00000f98
STACK_TEXT:
0028d730 00000000 0028d74c 0028d764 00000001 ntdll!NtGetContextThread+0x12
FAULTING_SOURCE_LINE: C:\BuildAgent\work\675e5b8e8f135823\Client\game_sa\CFileLoaderSA.cpp @ 161
SYMBOL_NAME: game_sa!CFileLoader_SetRelatedModelInfoCB+da
IMAGE_NAME: game_sa.dll
FAILURE_BUCKET_ID: STACK_BUFFER_OVERRUN_STACK_BUFFER_OVERRUN_MISSING_GSFRAME_c0000409_game_sa.dll!CFileLoader_SetRelatedModelInfoCB
As you can see, both traces match in not having a stacktrace (probably related to what's going on here) and the type of exception is really weird.
But the crash line is CFileLoaderSA.cpp @ 161 which is the end of the last function of that file, that file has been newly added to MTA codebase last year by saml1er as part of his PR #1265 (Fix model memory leaks)
To reproduce Unknown so far (collected from crash stats)
Version MTA:SA Client r20845
Additional context
About this particular crash, the PR author (saml1er) told me in the past:

I am not entirely sure about that.. considering it has become 1 of the most popular MTA crashes ever since, i think we at least need to dig deeper & make sure, or try to avert the crash. It's not so good to suddenly have many thousands of crashes more after those improvements against memory leaks, even if they are bad DFF's it should be worth averting.
This is a conversation with an affected server owner (trying to help us pinpoint it): GIF < here
The referred models he wrote "are responsible for like 90% of the crashes" (unconfirmed on our side) are attached below. repro_chance_models.zip
Even if it's is what saml1er thought it is, those leak fixes and related improvements made MTA less tolerant to a certain type of common DFF 'corruption', probably caused by certain DFF exporters like hinted at in the above discussion with that server owner. The leak fixes are immensely valuable and MTA should have never been put in a position of 'executing corrupted DFFs' to begin with, so this would be the only expected outcome - but then we can focus on averting by removing/jumping over the specific corrupted bits, if we manage to identify them. Some hints to identify it are in that conversation, and some might be in the attached model files above.
Sidenote: it is possible that the real, as in - more useful stacktrace (before something, or some additional related changes made after that PR, started cutting it) is seen below:

Because even though it was then a significantly different offset (game_sa @ 0x0000e99a), it's the same line number in the same newly added code file, same server owner (including that, at the time, most of those crashes happened on his server) and also a stack overrun - so it could be the same but with a significantly shifted offset, also due to as I said changes (including to that code file) which came after that PR.
Well, the only possible buffer overflow I can see here is the name[24] in CFileLoader. GetNameAndDamage might try to write more than 24 characters into the buffer, which, obviously, triggers the security cookie (the buffer overrun checker).
I'll have to first of all reverse the function to see what it actually does, and then we can wonder.
Also, as a side note: Adding proper DFF checking shouldn't be too hard. I'm kind of familiar with the file structure, and I have a DFF maker lying around that has built in checks, so I could export those that code into MTA.
Already found out the exact cause of this crash, with the help of @samr46
See this: https://i.imgur.com/IpySYUO.png
I'm a little lazy to type it all out properly, but here's the gist of it:
- Length of names inside DFF (individual elements in the hierarchy) higher than 22, triggers the crash
- Exporters like rwio don't trim these element names to a safe length for MTA (if the modeller typed extraordinarily long ones), but to 23, just 1 too high. But who knows other exporters don't trim it at all. Edit: Kams exporter doesn't trim it at all, even worse
Well, the only possible buffer overflow I can see here is the
name[24]in CFileLoader.GetNameAndDamagemight try to write more than 24 characters into the buffer
So your suspicions were on point
Now that we know the cause, it should be easy to fix right @Pirulax ?
Note: the repro files from the chat picture are attached below repro.zip
Remember that loading such a 'bad DFF' introduces a strong crashing chance, it may happen randomly or after, as he mentioned, a couple reconnects/DFF reloads. In the past i felt it also had something to do with streaming in, or the angle from which you view the model physically as it's replaced.
Closing PR #2319 in favour of this discovery
Further remarks from samr46:
I don't really know if vehicles and skins are affected in the same way as objects (never checked it)
But i guess all RW sections inside DFF uses this limit of 22 characters for titles
.COL files uses 22 characters limit as well
So while considering how to fix this, also worth looking at things like .COL, and see how it affects different model types, and if a limit fix or trim enforcement has to cover them as well (or even if it doesn't crash, if that is good practise to do regardless)
I think i found the real issue.. char name[24] at PR #1265's patch of CFileLoader_SetRelatedModelInfoCB

So the element name length of circulating DFF's just doesn't fit the buffer ( = buffer overflow).
If someone would be kind enough to merge that PR the issue would be fixed right away. (In another PR the buffer size could be increased as well)
Also, that buffer should be sufficient. First the node name plugin's buffer size should be increased.
If someone would be kind enough to merge that PR the issue would be fixed right away
The code review that was requested for prioritisation was completed, so merged. This issue should probably remain open as a reminder for the final buffer size tweaks you got planned.