Crashing programs for fun and profit

2022-06-28

Aw snap, how could that possibly have happened? Right, it's frustrating when programs crash over and over. But did you know that you sometimes really want things to crash?

No, no I haven't gone insane, not yet at least. But I recently found myself in need of crashing programs on purpose, with very specific reasons. Why? Ok, so here goes the story...

Difficulty: easy

I'm a long-time contributor to the MultiMC project. To make it short, MultiMC is designed to be an alternative Minecraft launcher with many, many advanced features. As such people often tend to use it for big modpacks, making the game unstable and prone to crash. On an average day, 3 or 4 people seek help with their crashing game, often caused by some mod. As long as the crash happens to be in Java code, it is actually quite easy to read:

java.lang.RuntimeException: Could not execute entrypoint stage 'main' due to errors, provided by 'elevator'!
	at net.fabricmc.loader.impl.entrypoint.EntrypointUtils.lambda$invoke0$0(EntrypointUtils.java:51)
	at net.fabricmc.loader.impl.util.ExceptionUtil.gatherExceptions(ExceptionUtil.java:33)
	at net.fabricmc.loader.impl.entrypoint.EntrypointUtils.invoke0(EntrypointUtils.java:49)
	at net.fabricmc.loader.impl.entrypoint.EntrypointUtils.invoke(EntrypointUtils.java:35)
	at net.fabricmc.loader.impl.game.minecraft.Hooks.startClient(Hooks.java:52)
	at net.minecraft.class_310.<init>(class_310.java:437)
	at net.minecraft.client.main.Main.main(Main.java:177)
	at net.fabricmc.loader.impl.game.minecraft.MinecraftGameProvider.launch(MinecraftGameProvider.java:460)
	at net.fabricmc.loader.impl.launch.knot.Knot.launch(Knot.java:74)
	at net.fabricmc.loader.impl.launch.knot.KnotClient.main(KnotClient.java:23)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.multimc.onesix.OneSixLauncher.launchWithMainClass(OneSixLauncher.java:210)
	at org.multimc.onesix.OneSixLauncher.launch(OneSixLauncher.java:245)
	at org.multimc.EntryPoint.listen(EntryPoint.java:143)
	at org.multimc.EntryPoint.main(EntryPoint.java:34)
	
[... lots of text]
Process exited with code 255.

As you can see here...

Jan... that's a wall of text and not easy to read AT ALL!

Right, right. Let me simplify. We can mostly ignore all the stuff at the bottom, that just lists MultiMC, some Java internals and the used modloader (Fabric in this case).

java.lang.RuntimeException: Could not execute entrypoint stage 'main' due to errors, provided by 'elevator'!
	at net.fabricmc.loader.impl.entrypoint.EntrypointUtils.lambda$invoke0$0(EntrypointUtils.java:51)

In this case, Fabric even goes as far as telling us which mod caused the crash, how nice! Now we know we can blame elevator. Case solved, right?

Not really, we still don't know why elevator is causing the game to crash, but if we look a bit further down, we find this snippet:

Caused by: java.lang.NoClassDefFoundError: me/sargunvohra/mcmods/autoconfig1u/ConfigData
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at net.fabricmc.loader.impl.launch.knot.KnotClassLoader.defineClassFwd(KnotClassLoader.java:186)
	at net.fabricmc.loader.impl.launch.knot.KnotClassDelegate.tryLoadClass(KnotClassDelegate.java:346)
	at net.fabricmc.loader.impl.launch.knot.KnotClassDelegate.loadClass(KnotClassDelegate.java:218)
	at net.fabricmc.loader.impl.launch.knot.KnotClassLoader.loadClass(KnotClassLoader.java:145)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at net.reveter.elevator.ElevatorMod.onInitialize(ElevatorMod.java:34)
	at net.fabricmc.loader.impl.entrypoint.EntrypointUtils.invoke0(EntrypointUtils.java:47)
	... 15 more

Oh look, we even have net.reveter.elevator.ElevatorMod here, so it indeed is elevator! The main feature here is the first line again, telling us that Java is unable to find the class definition for me/sargunvohra/mcmods/autoconfig1u/ConfigData. Pasting this into Google is enough to find the download for the required mod auto-config... and voilà, the game works after adding it!

I told you, really simple! Ok, I get it, it's a lot of text, and you indeed need some knowledge in order to know what those funny magic words mean, but after a while you get used to reading those errors. Especially if you know how to program in Java and help people on Discord.

Difficulty: medium

Though this wouldn't be worth a blog entry, if it was just about some simple Java exceptions... so lets step up the game a bit. From time to time someone uploads a log which ends like this:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ffa2e4c2ee7, pid=10840, tid=0x00000000000037cc
#
# JRE version: Java(TM) SE Runtime Environment (8.0_321-b07) (build 1.8.0_321-b07)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.321-b07 mixed mode windows-amd64 compressed oops)
# Problematic frame:
# C  [ig7icd64.dll+0x22ee7]
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of Windows
#
# An error report file with more information is saved as:
# C:\Users\User\Desktop\MultiMC\instances\unpredictable\.minecraft\hs_err_pid10840.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Process exited with code 1.

That looks a lot less friendly and a lot more scary. A fatal error has been detected - and a bunch of random hexadecimal numbers. Indeed, those errors are way harder to debug, unless you have seen this exact one before. Let's pretend we haven't seen this one, so how do we actually know what happened?

The first information we get is that some kind of fatal error has occurred... well, so that's what lead to termination. Who would have imagined?

We also get told the exact error: EXCEPTION_ACCESS_VIOLATION. And now it sounds even more scary! The hexadecimal number 0xc0000005 signals exactly the same, it's just the numerical representation of the error name. Pasting EXCEPTION_ACCESS_VIOLATION into Google sadly doesn't yield any useful results, only those "driver update, no malware, 100% free" sites (spoiler: never ever use any of those driver updaters, they all are malware!).

How exactly do we then know what EXCEPTION_ACCESS_VIOLATION means? We already know that it is an error, or well, an exception, but the access violation part is puzzling. What exactly is violating access... and what did it even try to access?

If you are really, really lucky, searching for "access violation" may actually bring up an explanation like this:

“Access Violation at Address” errors can happen on every version of Windows, including Windows 10. If you see this message, it means the software you’re trying to run is attempting to access a protected memory address

I don't link the source on purpose, because while it actually explains what an access violation is, it also suggests some "fixes" which you should absolutely not do. But anyway... so an access violation just means, the program attempted to access memory, it wasn't allowed to access.

To explain what really happened, let's pull up an incredibly incorrect but sufficient diagram of memory on a computer:

Memory diagram

UGH, I know how memory management works and this is definitely not how! Why are you using such an incorrect diagram?

Well, listen...

The numbers beneath the blocks define the beginning and ending address of each memory block available to a program. In reality this is much, much more complicated due to virtual addressing, paging and a lot of other shenanigans - but the idea remains the same. Explaining all the techniques used to distribute memory would be waaaayyy to complex, technically multiple programs can access the same address without accessing the same physical location and such. For now just imagine every program as boxed into its own range.

Each program is only allowed to access its unique range of memory - after all, you don't want Discord to directly read your credit card details out of your browser, do you? Now, as long as Java only accesses memory between 0x00FF and 0x01FE, everything is fine, nobody is going to complain. But what if Java suddenly tries to access 0x0000, which we can see belongs to Chrome?

The operating system (let's assume Windows here) is going to scream out loud - and prevent that access for the better. Java has no right to intrude Chrome's private memory space! What then follows is a so-called access violation exception: Windows tells Java that it just attempted something illegal. The default behavior would be to instantly kill the application, but Java opted to catch that error and output some diagnostic noise before terminating itself.

So if Java actually knows that it just attempted something illegal and has been stopped, why doesn't it just keep running and doesn't do it again? Well... the reason Java attempted to access some memory outside its range is that something went wrong and triggered that access. An illegal access should never happen on purpose (ok, we will see if that statement holds up...) and is therefore considered a bug. Moreover, if it already tried to access something outside its range, it can't know for sure that it didn't already mess up something inside its range too! The only reasonable action thus is to print the error and exit.

At this point I should probably note that Java code alone should never be able to access something outside its range. The JVM guards the Java code it executes and handles those errors internally, long before the operating system would ever see them.

That also explains why the error doesn't include a nice location or the name of whatever part caused the crash. The JVM itself simply doesn't know either! For all intents and purposes, it attempts to display as much information as it can recover, but due to the potential of internal corruption it should be taken with a grain of salt.

With that knowledge we can go back to the crash report. There is some information which we can take away, although it doesn't help that much. The wording "at" suggests, that the JVM at least knows to some extent where the error happened and helpfully spits out some magic numbers: pc=0x00007ffa2e4c2ee7, pid=10840, tid=0x00000000000037cc. pc means program counter and tells us where in memory the faulting code was. pid and tid stand for process identifier and thread identifier and tells us which process crashed and which thread. Though after all, this process is now gone and so is the thread. Unless you are actually developing some code which runs on the JVM, you can't do much with those numbers!

But Jan, you suggested that you know this report, how did you read it then?

Well, we didn't look at one last point of information here. Right at the bottom we find the hint See problematic frame for where to report the bug., and a bit further up we find the lines Problematic frame: C [ig7icd64.dll+0x22ee7]. You might even recognize the last 4 digits to be the ones from the pc, this can't be a coincidence!

So the JVM was actually able to tell us that the faulting code belongs to ig7icd64.dll! Some bits of information can actually be recovered. As indicated by the report, the JVM even writes a longer, more detailed report to another file, but from experience this usually doesn't help much.

Anyway, going down the rabbit hole further... what is ig7icd64.dll and why is it causing a crash? Obviously it's some kind of DLL... but what even is a DLL? Sigh, soooo many questions to answer. Quick incorrect diagram time again!

DLL's

UGH, that's not how it...

Don't you dare... Ok so, we got our 3 beloved programs again, Java, Chrome and Discord. DLL is an abbreviation and means Dynamic Link Library. DLL's are roughly the same as shared objects on Linux and dynamic libraries on macOS - in the end all of those fulfill the same purpose: sharing code or data with different programs.

There are some common operations that every program needs or at least some kind of programs need. In this case every library which is used by multiple programs is yellow and every library which belongs to a specific program red. Even though they can be shared, DLL's are not necessarily shared. However, it doesn't really make a difference, I've just included them for the sake of completeness.

To explain what's going on here, we will be looking at the yellow DLL's, text.dll and graphics.dll. Let's pretend text.dll is required for writing text, and graphics.dll is required for drawing. Chrome and Discord both require text.dll in order to write text. This functionality is provided by the operating system, so text.dll is part of it. This also means that both Chrome and Discord use the same text.dll - it just exists once on disk, and each process can load it individually. So we call text.dll a library for drawing text!

Basically the same applies to graphics.dll, which is required by Java, Chrome and Discord in order to draw on the screen. While those names are only imaginary, we should now be able to understand why ig7icd64.dll exists and what it is. We don't know where it comes from, and what exactly it does, but we know that it is something provided externally. And we also know that it somehow managed to crash Java!

We now could try to identify where ig7icd64.dll came from, but honestly... I'm as lazy as you are, so let's just stuff the name into Google!

Ah well, we immediately encounter those DLL's download sites... remember me saying don't use any driver updater? Yeah, please never download a DLL from these sites either. Anyway, just searching the DLL name also brings up a bunch of Minecraft related entries, so the issue seems to be common. And we learn: ig7icd64.dll is the Intel graphics driver! So here is our graphics.dll, it's just called ig7icd64.dll.

At this point you might get stuck. This is also the point where I have no idea what happened. So after all, this big fat crash report was able to tell us, that the Intel graphics driver caused the crash. The probably usual answer would be "Well, update your graphics driver then!". Valid, but even if you do, this won't solve the issue... how I know? Trust me, many people tried...

I can only tell you that by experimenting people found out, that Java 8 update 51 is the last working version. As far as I'm and many others are aware, Intel doesn't properly support old integrated graphics anymore and their latest driver has problems with the latest Java 8. And then the driver crashes Java!

Phew... quite a journey, huh? Take a deep breath, we'll recap what we learned so far:

  1. If a crash occurs in Java code, it is quite easy to read and decipher
  2. If a crash occurs somewhere else, the JVM attempts to recover as much information as possible, but interpreting this information is hard

This could be the end, you could have reached the final station... but let me tell you, the reason this blog entry exists just starts here.

You see, we went from easy to decipher to medium to decipher. But we are missing a stage: hard to decipher! Sadly there are times when you get no information about a crash whatsoever. Those crashes are nearly impossible to debug just based on the log, but we can try anyway!

Difficulty: hard

We'll start as usual, by taking a look at an example log:

[16:47:59] [ThaumicJEI Aspect Cache/INFO] [thaumicjei]: ItemStack Aspect checking at 77%
[16:48:04] [ThaumicJEI Aspect Cache/INFO] [thaumicjei]: ItemStack Aspect checking at 78%
[16:48:09] [ThaumicJEI Aspect Cache/INFO] [thaumicjei]: ItemStack Aspect checking at 80%
[16:48:14] [ThaumicJEI Aspect Cache/INFO] [thaumicjei]: ItemStack Aspect checking at 81%
[16:48:19] [ThaumicJEI Aspect Cache/INFO] [thaumicjei]: ItemStack Aspect checking at 83%
[16:48:24] [ThaumicJEI Aspect Cache/INFO] [thaumicjei]: ItemStack Aspect checking at 84%
[16:48:29] [ThaumicJEI Aspect Cache/INFO] [thaumicjei]: ItemStack Aspect checking at 86%
[16:48:34] [ThaumicJEI Aspect Cache/INFO] [thaumicjei]: ItemStack Aspect checking at 87%
[16:48:39] [ThaumicJEI Aspect Cache/INFO] [thaumicjei]: ItemStack Aspect checking at 88%
Process crashed with exitcode -805306369.

And that's it. No further information.

Even further up in the log we get no information, everything seems fine. The game starts, loads... and then suddenly crashes. How do you even approach such a problem?

Let's first make the situation clear: a user has just joined the chat, provided the log and is now asking for help. His setup seems fine, no mod conflicts, a proper java version is installed and the amount of memory available looks fine too. The only pieces of information we have are the log and the fact that his game crashed.

Well... there is one single thing which this log has in common with all the others above. Did you spot it?

The very last line says Process crashed with exitcode -805306369. We have seen this line before! Almost at least. Recall the 2 previous logs, the first one ended with Process exited with code 255. and the second one with Process exited with code 1..

Ok, I get it, the number tells us what went wrong!

Yeah... no. You would almost be correct, if exit codes were useful at all. They are not, usually. See, the exit codes are supplied by the application, so if the application detects an error, it can return whatever. The only standard everyone seems to agree on is that 0 is for success. I wouldn't bet my life on that though, somewhere out there probably is an application which does it differently.

Ok, but we'll assume 0 means success. Let's talk about 1, -1 and 255 then, as these are the next most common ones. All of those are usually used to indicate some kind of problem. But that's also really all the information we get from it - something went wrong.

You see, 255 and -1 may even be the exact same code! Wonderful, right? Let me very quickly demonstrate that with the following Rust C program:

int main(int argc, const char **argv) {
   return /* exit code -> */ -1;
}

Let's run it:

$ gcc program.c -o program
$ ./program
$ echo $?
255

Ok, why does echo $? yield 255, even though we exited with return -1;? Ready for another diagram...?

32 vs 8 bit

On Windows this would have actually worked correctly, but on Unix like OS (I use Arch btw!) the kernel simply only keeps the last 8 bit of the exit code. This means, while we write the whole 32 bits with 1, only the last 8 get actually used.

Uhh... but it still is 11111111, shouldn't that still be -1?

I mean... yes, but no. For whatever reason, the final exit code is unsigned - thus, it can only be positive! Which means instead of the most significant (leftmost) bit being interpreted as a -, it simply adds to the value, creating a solid 255.

Mind you, this whole adventure only works on Unix like OS (macOS, Linux, you name it...) but not in Windows! On Windows we would have actually gotten -1.

This however brings up an entirely new point: Windows does indeed save all 32 bits of the exit code, not erasing any. Which in return means we can actually stuff information into the exit code, without loosing too much of it! And believe me or not, Windows does actually put useful information there when it kills a processes without its consent!

So if, and only if, Windows decides to instantly kill a process, the exit code can be useful. And only on Windows! You may have actually noticed another difference in the logs before. The first two say exited instead of crashed, so MultiMC even knew that the process didn't terminate itself but rather was killed! Let's find out how...

After digging through the MultiMC source code a bit, you find the following snippet:

if (status == QProcess::NormalExit)
{
   //: Message displayed on instance exit
   emit log({tr("Process exited with code %1 (0x%2).").arg(exit_code).arg(exit_code, 0, 16)}, MessageLevel::Launcher);
   changeState(LoggedProcess::Finished);
}
else
{
   //: Message displayed on instance crashed
   if(exit_code == -1)
      emit log({tr("Process crashed.")}, MessageLevel::Launcher);
   else
      emit log({tr("Process crashed with exitcode %1 (0x%2).").arg(exit_code).arg(exit_code, 0, 16)}, MessageLevel::Launcher);
   changeState(LoggedProcess::Crashed);
}

Ok, so everything depends on the status variable. Where is that one coming from?

LoggedProcess::LoggedProcess(QObject *parent) : QProcess(parent)
{
   // <snip>
   connect(this, SIGNAL(finished(int,QProcess::ExitStatus)), SLOT(on_exit(int,QProcess::ExitStatus)));
   // <snip>
}

// <snip>

void LoggedProcess::on_exit(int exit_code, QProcess::ExitStatus status)
{
// <snip>

I tell you, searching through MultiMC's code is not easy, but here we go! Apparently status is a QProcess::ExitStatus, which comes from QProcess::finished, which in return is called by Qt when a process terminates. Yep, I'm as confused as you are!

The problem here is that MultiMC is using a huge amount of abstraction above the OS in the form of Qt. But you know what? Qt is documented, so we can take a look there...

The documentation is not helpful at all for QProcess::ExitStatus, only telling us that NormalExit means a normal exit and CrashExit means a crash - thanks captain obvious! It additionally hints at QProcess::exitStatus(), which includes this paragraph:

On Windows, if the process was terminated with TerminateProcess() from another application, this function will still return NormalExit unless the exit code is less than 0.

Not super useful either, but it at least talks about TerminateProcess, which for now we will keep in mind, it sounds interesting! The function also suggests that everything below 0 may be a crash, but it could as well just be related to TerminateProcess. I guess we have no choice other than looking at the Qt source code itself, oh hell...

I'll spare you the details on how exactly I found this code snippet, but here is what it all boils down to:

DWORD theExitCode;
if (GetExitCodeProcess(pid->hProcess, &theExitCode)) {
    exitCode = theExitCode;
    crashed = (exitCode == 0xf291   // our magic number, see killProcess
               || (theExitCode >= 0x80000000 && theExitCode < 0xD0000000));
}

So essentially the process is treated as crashed if, and only if, the exit code is exactly 0xf291 or somewhere between 0x80000000 and 0xD0000000. And out of those 0xf291 is only here because some Qt maintainer thought it would be neat to use it as an indicator when Qt kills a process, it has no further meaning whatsoever... nice! My best bet is that the other two are as well just based on "these are probably Windows errors at this point".

So Jan, you are telling me, that we found out exactly... nothing and this crashed thing is useless?

Yes. You are absolutely right. It has no meaning whatsoever.

But wait! This doesn't stop us. We still have the exit code, and it doesn't matter whether Qt thinks something has crashed, it is up to us to handle that! Despite that, Qt seems to mostly guess correctly whether a process really crashed.

What were we even looking for?

Ah right, extracting information from exit codes. Let's pull up that number from our "hard" crash log again: -805306369. However, this number looks different from what we have been dealing with the last few paragraphs. Namely, it has a minus sign and is decimal instead of hexadecimal. We can fix that though...

#include <stdio.h>

int main(int argc, const char **argv) {
    int to_convert = -805306369;
    printf("0x%X\n", to_convert);
    return 0;
}

You really had to write a program for that, didn't you?

I know, I know, I could have just googled it. But where would be the fun? Anyway, using %X as a format modifier interprets the number directly as unsigned and gives us the correct hex format: 0xCFFFFFFF.

Oh hey, we have seen something similar before! When Java told us it crashed with EXCEPTION_ACCESS_VIOLATION, the number was 0xc0000005. And now it starts with 0xC again! Let's dump that into google...

Ok, there seems to be some talk about STATUS_APPLICATION_HANG. Maybe searching for that gives us more information...

Aha! Here we go! This page even talks about STATUS_ACCESS_VIOLATION and a bunch of other exceptions. For whatever reason Microsoft decided that exactly 3 exceptions were not to be found anywhere in the Windows SDK, but rather only on this very specific page.

Well, we still don't know what STATUS_APPLICATION_HANG means or when it occurs. And I honestly have no idea how I found this out or if someone told me... but apparently, if you click "kill" in task manager when the application isn't responding it gets killed with this code, or when you click on the "terminate" button in the "application isn't responding" popup. Good to know, huh?

Luckily, this is documented absolutely nowhere, thanks Microsoft.

Are you tired of exit codes already? I am, especially at MultiMC not displaying any useful information about them. And I made just the right thing for this: nt-status-gen. I'll spare you the details, but here is a really quick rundown:

  1. gather all exceptions from ntstatus.h (and hardcode the 3 special ones)
  2. save their names and values together
  3. provide a function to look up their name by value
  4. make this a library a program such as MultiMC can easily consume

Jan... you have been talking about exit codes for 20 minutes now! I wanted to see you crash programs, not analyze crashes

We get there when we get there! Which is... now.

The part where he kills crashes you

We've done some great work so far. Let's recap:

That's great and all, but how do we test this functionality? Sure, integrating it into MultiMC is simple enough, we just use our library to translate the exit code (formatted and added comments for readability):

// Filter out some exit codes, which would only result in erroneous output
// -1, 0, 1 and 255 are usually program generated and don't aid much in debugging
if((exit_code < -1 || exit_code > 1) && (exit_code != 255))
{
    // Gross hack for preserving the **exact bit pattern**,
    // we need to "cast" while ignoring the sign bit
    unsigned int u_exit_code = *((unsigned int *) &exit_code);

    std::string statusName;
    std::string statusDescription;
    
    // Here the magic happens, u_exit_code contains the code we want to look up
    //
    // Internally this function also looks up a message from the OS to gather
    // even more information
    bool hasNameOrDescription = Sys::lookupSystemStatusCode(
        u_exit_code,
        statusName, // reference
        statusDescription // reference
    );
    
    if(hasNameOrDescription)
    {
        // We have information, spit it out
        emit log({tr("Below is an analysis of the exit code. THIS MAY BE INCORRECT AND SHOULD BE TAKEN WITH A GRAIN OF SALT!")}, MessageLevel::Launcher);

        if(!statusName.empty())
        {
            // This is what our library provides
            emit log({tr("System exit code name: %1")
                .arg(QString::fromStdString(statusName))}, MessageLevel::Launcher);
        }

        if(!statusDescription.empty())
        {
            // This is the internal message lookup
            emit log({tr("System exit code description: %1")
                .arg(QString::fromStdString(statusDescription))}, MessageLevel::Launcher);
        }
    }
}

// And a disclaimer for good measure
emit log({tr("Please note that usually neither the exit code, nor its description are enough to diagnose issues!")}, MessageLevel::Launcher);
emit log({tr("Always upload the entire log and not just the exit code.")}, MessageLevel::Launcher);

Phu, that's quite a bit of code. Though it all boils down to "look up the exit code, if it has a name, print it". We also filter out our common codes of -1, 1 and 255, since we know they don't mean anything.

So, does this Work? Let's take a look at the log MultiMC now generates:

[23:51:56] [Client thread/INFO]: Created: 256x128 textures/mob_effect-atlas
[23:52:02] [Client thread/INFO]: Stopping!
Process exited with code 0 (0x0).
Please note that usually neither the exit code, nor its description are enough to diagnose issues!
Always upload the entire log and not just the exit code.

Ahhh, wonderful, this looks way better! Let's force the game to crash by holding F3+C for 10 seconds:

#@!@# Game crashed! Crash report saved to: #@!@# C:\Users\Janrupf\Desktop\MultiMC\instances\1.14\.minecraft\crash-reports\crash-2022-06-27_23.53.32-client.txt
Process exited with code -1 (0xffffffffffffffff).
Please note that usually neither the exit code, nor its description are enough to diagnose issues!
Always upload the entire log and not just the exit code.

Yep, looks good!

Though... we only had exit code 0 and -1 here. Remember, we explicitly exclude those from translation in order to avoid confusion. And now we need to cause some real bad crashes in order to see whether our code works!

I don't have a computer with some bad graphics driver or corruption though. Sounds like we need another plan.

And that's where fault injection comes into play! What fault injection is? Here, take the Wikipedia definition:

Fault injection is a testing technique for understanding how computing systems behave when stressed in unusual ways. This can be achieved using physical- or software-based means, or using a hybrid approach.

Since I really don't want to pull out my memory while the computer is running, we'll go with the software based approach. But how? How do we crash another process on purpose...

Jan, congrats, it just took you 2 eternities to start talking about what I've been waiting for

The answer is: the same way a process crashes itself! Let's create a crashing program, shall we?

int main(int argc, const char **argv) {
    // 0 is not a valid memory address...
    int *whoops = 0x0;
    
    // And writing to it causes a crash!
    *whoops = 0x1337;

    return 0;
}

That should be good enough! And surely, running it on Linux:

$ gcc program.c -o program
$ ./program
[1]    74846 segmentation fault (core dumped)  ./program
$ echo $?
139

But wait, I said we only care about Windows from now on... so lets try the same on Windows:

C:\Projects\single-file-demos>cl crashing.c /nologo /link /out:crashing.exe
crashing.c
C:\Projects\single-file-demos>.\crashing.exe
C:\Projects\single-file-demos>echo %errorlevel%
-1073741819

Yes, I did just compile the program on the command line on Windows. Magic!

Alright, that seems to work too! echo %errorlevel% on Windows is basically equivalent to echo $? on Linux - so we have an exit code of -1073741819.

We'll use our magic translator program again, which this time outputs... 0xC0000005! Hey, we've seen this one! Recall from the "hard" Java crash, we learned that 0xC0000005 is EXCEPTION_ACCESS_VIOLATION. To be honest, this was predictable, we purposefully accessed invalid memory. But point proven, we now have source code which can crash with a known error! All we need to do is to get some foreign process to execute our code...

There are a million ways to achieve this, from replacing DLL's on disk to injecting them into processes, but I have chosen more simple ones. For Minecraft we specifically need Java to execute our code. And so Crash It! was born!

Back when we were looking at QProcess, we already stumbled upon the promising function TerminateProcess. Now sounds like an excellent opportunity to see how it works. Here is the definition:

BOOL TerminateProcess(
    [in] HANDLE hProcess,
    [in] UINT   uExitCode
);

Looks like it wants the process to terminate and the exit code to terminate the process with. What a fit!

Now, we somehow need to obtain a HANDLE to the process we want to terminate. And I happen to know which function we need... OpenProcess is just the right candidate! I'll spare you the details, but with a bit of plumbing we have this setup:

int main(int argc, const char **argv) {
    // Magic function which opens a process by it's name
    HANDLE java = CiOpenProcessByName(L"javaw.exe");
    
    // And down it goes!
    TerminateProcess(java, 0x1337);
    
    return 0;
}

Starting up the game and running our crasher, MultiMC outputs this:

[00:50:28] [Client thread/INFO]: Created: 256x128 textures/mob_effect-atlas
Process exited with code 4919 (0x1337).
Please note that usually neither the exit code, nor its description are enough to diagnose issues!
Always upload the entire log and not just the exit code.

WOOOHOOOO! What a journey!

Let's substitute 0x1337 with 0xCFFFFFFF aka STATUS_APPLICATION_HANG:

// And down it goes!
TerminateProcess(java, 0xCFFFFFFF);

And see the result:

[00:55:02] [Client thread/INFO]: Created: 256x128 textures/mob_effect-atlas
Process crashed with exitcode -805306369 (0xffffffffcfffffff).
Below is an analysis of the exit code. THIS MAY BE INCORRECT AND SHOULD BE TAKEN WITH A GRAIN OF SALT!
System exit code name: STATUS_APPLICATION_HANG
Please note that usually neither the exit code, nor its description are enough to diagnose issues!
Always upload the entire log and not just the exit code.

MultiMC successfully decoded the error code! There is some bit and sign magic going on when displaying the hexadecimal representation, but we can still clearly make out 0xCFFFFFFF. So we can now terminate arbitrary processes with arbitrary exit codes!

So we are done now, aren't we?

Ehm, yes. We could stop at this point. But I have a last promise to fulfill! I promised we get the remote process to execute some code which crashes it. Currently, we are just telling Windows to terminate it with some code - this technically does not crash the process but rather, well, terminates it.

So back to the drawing board. There are ways of really injecting code into the remote process, but I have another idea for now. Under the assumption that we can call any remote function, what if we call a null pointer? That should also cause an access violation, right?

Let's modify our self-crashing program to try:

int main(int argc, const char **argv) {
    // 0 is not a valid memory address...
    void(*whoops)() = 0x0;
    
    // And calling it causes a crash!
    whoops();

    return 0;
}

Instead of writing to 0x0, we try calling it. And voilà:

C:\Projects\single-file-demos>cl crashing.c /nologo /link /out:crashing.exe
crashing.c
C:\Projects\single-file-demos>.\crashing.exe
C:\Projects\single-file-demos>echo %errorlevel%
-1073741819

It crashes with the same reason! So, yes, by calling a null pointer we can indeed crash a process with an access violation! Now we only need to get Java to call a null pointer.

The obvious way would be to somehow make the entrypoint function of Java a null pointer - however, that's not really possible, our process is running already. Though who said you can only have one entrypoint?

In order to run multiple tasks in parallel, threads are used. And you know what threads have? Yep, an entrypoint which is a function! So if we create a thread, which ends up calling null, we should crash.

On Windows, a thread is creating using CreateThread. So I guess we'll just use that. The documentation reads:

Creates a thread to execute within the virtual address space of the calling process. To create a thread that runs in the virtual address space of another process, use the CreateRemoteThread function.

Ah, right. CreateThread operates on the process creating the thread. So unless we want to crash ourselves again, we need to create the thread in another process. Luckily, this seems to be a common use case (this is somewhat concerning actually...) as the documentation directly hints at CreateRemoteThread.

Here is how CreateRemoteThread looks like:

HANDLE CreateRemoteThread(
    [in]  HANDLE                 hProcess,
    [in]  LPSECURITY_ATTRIBUTES  lpThreadAttributes,
    [in]  SIZE_T                 dwStackSize,
    [in]  LPTHREAD_START_ROUTINE lpStartAddress,
    [in]  LPVOID                 lpParameter,
    [in]  DWORD                  dwCreationFlags,
    [out] LPDWORD                lpThreadId
);

Oof, that's quite some parameters! But here is the fun part, almost all of them can be zero! You see, a lot of those can be used to customize the startup behavior of the thread - which we don't need, we just want to crash anyway. Only the hProcess parameter is interesting, as it tells in which process the thread should be created.

Let's try it:

// And down it goes
CreateRemoteThread(
    java,
    NULL,
    0,
    NULL, /* <- our null function */
    NULL,
    0,
    NULL
);

Running Minecraft through MultiMC and our crasher...

[01:21:00] [Client thread/INFO]: Created: 256x128 textures/mob_effect-atlas
Process crashed with exitcode -1073741819 (0xffffffffc0000005).
Below is an analysis of the exit code. THIS MAY BE INCORRECT AND SHOULD BE TAKEN WITH A GRAIN OF SALT!
System exit code name: STATUS_ACCESS_VIOLATION
System exit code description: The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.

Please note that usually neither the exit code, nor its description are enough to diagnose issues!
Always upload the entire log and not just the exit code.

YESSSSS! We successfully crashed Java by injecting a real fault and MultiMC managed to recover some information. Obviously, parts are missing, such as which address has been accessed - the exit code simply does not contain this data. But this is way better than before where it just used to say Process crashed with exitcode -1073741819..

And the world now has a program which can crash other programs, because we all needed that!

Closing thoughts

This article turned out to be way more complex and challenging than I initially thought. Crash it! has become a somewhat complex piece of software and this article just breaks it down to its basics. If you really want to know how it works, I highly encourage you to check out the source code over on GitHub. And sure, play with it, I know you want to!

I've also been quite busy, which is why it took me 2 months (oh god) to finish this article. But here it is!

In the future I plan on going over the internals of Crash It! and talking about shell code injection and other fun things. Oh, and there is an Atom feed as well as a Twitter now, right below!