• Yukkafuzz

Tricky Bugsquashing

This week I spent maybe three or four hours attempting to squash one insidious bug. It was an infinite loop, which does not cause the game to crash; rather it causes it to just "hang", where the program is running but I can't take any actions. The only way to stop this is to open task manager and "end task" on Unity, the game engine I use to develop EXO. So every time the bug occurs, I have to restart the entire system. That also means I can't view any of the debug logs - messages I put in the code that print to the console - which is my most effective way to track down bugs. Instead, I was forced to narrow down the issue by "commenting out", that is, temporarily removing, parts of the code and running the program again to check whether it hangs or if removing that code stopped the issue. Still, that typically wouldn't take that long, but this particular infinite loop happened in the generation code, which complicates the task of just commenting out certain lines.

To understand this, let's explore how the generation system works in a bit of depth. First, generation is hierarchical. You can think about the motion of a body in, say, a solar system, as a hierarchy of orbits. The Earth and anything in its orbit move around the Sun slowly, completing their orbit once a year. Those objects orbiting the Earth, like the Moon, not only move relative to the Sun but also to the Earth, so first move the Earth and all of its orbiters, and then move the Moon in its orbit around the Earth and all of the other objects orbiting the Earth. The hierarchy can continue - what about a space probe orbiting the moon? Well, it moves with the Earth and the Moon and in its own orbit about the Moon.

That hierarchy can get quite large in multiple-star systems. A binary star is seems simple enough - both stars orbit a common point, and then they each have their own orbiters. But rarely, (actually, I'm not sure if science has confirmed exoplanets this way or not) planets may also orbit that common point rather than one of the stars specifically. Then, a trinary star adds another layer. One star is orbiting the center of mass of the other two stars, which are orbiting each other. A (highly unusual) quaternary star can go one of two ways: two binary stars orbiting each other, or one star orbiting the center of mass of a trinary star. Each of these layers, at least theoretically, could have other objects also orbiting those centers of mass.

Naturally, this orbit hierarchy provides the best way of generating artificial solar systems. First, generate the star. Then generate the planets and other objects orbiting the star. Then, for each of those objects, generate the objects orbiting them and those objects' orbiters. Continue this until everything that should have orbiting objects has created them.

Because each object does essentially the same thing to generate its orbit and its orbiters (and a few other characteristics), all types of objects share the code used to do those things. In fact, the code-sharing is also a hierarchy (a normal inheritance hierarchy, for my programmers out there). Typically, this organization is helpful for solving bugs, but in this case, it arguably made it more difficult. I wanted to temporarily remove the code that caused the bug, but all I knew was it was somewhere in this generation hierarchy. Sure, I can stop the bug occurring by telling the star not to generate any orbiters. But that stops all of those objects' orbiters from being created as well, and so great - I've narrowed it down that generating the characteristics of just the star does not cause the bug. There are probably around 500 objects in the system, and I've eliminated one of them as a possibility. Whoop-dee-doo.

Still, that's the starting point. As it happens, the system that was causing the bug was quite large (I was also testing some new changes to generation, which is partly the cause) - there were more than 40 objects orbiting the star. That just added to how long everything took, because generating a system that large takes non-negligible time. The strategy from here looks like this: limit how many orbiters the star can generate, starting with half. If the bug still occurs (it did), then halve it again. Now the star only generates ten orbiters. Does the bug occur? No, so bump the number by half of what you just changed it by (up to 15). Bug? No, up it by half again (up to 17.5, rounded to 18). Bug? Yes, now try 16. Bug? No, and congratulations! We've narrowed down that the bug occurs somewhere in the generation of the 17th orbiter. (That process is known as binary search). But guess what? That orbiter is a gas giant the size of Jupiter! You know what that means?


But let me take a step back once more. Another reason this bug was so tricky is that it did not occur for every object of a specific type, which all share 100% of their code. At first, I thought maybe a somewhat unusual type had a guaranteed infinite loop in it, like a comet or a derelict space ship. But after I examined the system the bug was occurring in, it was clearly not the case. Some rarer objects were generated successfully before the 17th orbiter we just decided has the cause. And the other objects' code simply could not possibly have an infinite loop - I manually went through their code to check, several times, since it really seemed like that would be the cause. But no.

Back to the moons. This is where the code sharing makes things a little difficult. The gas giant shares the code for generating orbiters with the star (and with just about every other object). I wanted to stop the gas giant from generating its orbiters to confirm whether it or one of its orbiters was causing the bug, but I didn't want to stop the star from generating its own, since that would preclude the gas giant's existence. It's just a little bit of copy/paste, a little bit of if statements and that can be done, but it is another extra step.

Unfortunately for me, the 17th orbiter, the gas giant itself, was not the cause of the bug. The program ran fine when not generating its orbiters. So I repeated the binary search narrow-down process, this time with the thankfully small number (5) of moons. I could smell it - I was getting close! But then it really threw me off the trail. The problem was occurring with the first moon, which, as it turned out, was the one moon to have another, baby moonlet orbiting it. When I figured all of that out, I thought "it must be caused by a moon orbiting another moon!" And I went deep into the code, searching the pieces related to that to try to find an incorrectly written "while" loop, which are almost always the culprit to this kind of problem.

But I didn't find anything. In fact, I didn't even find any while loops at all, much less an incorrect one. I needed more clues to find my mistake. I had narrowed it down to the one actual object for which generating its properties (not its orbiters) created this bug, but I did not know where in the process of property generation it was happening. More copy/paste, if-statements later, I could comment out, property by property. Don't forget that whenever the bug occurred I had to restart Unity.

Finally, I found the property causing the problem, which was the atmosphere (which was practically non-existent by the way, for a moonlet this small). But even scouring these few lines of code, I couldn't tell what the problem was. So it was time to bring out the rarely used (for me) big guns: the debugger. It's a handy feature that lets you execute code line by line - find which loop is actually going infinite - and look at the contents of an object, so maybe I could tell what was unique about this tiny moon, compared to all the other moons of the same type.

So I pulled up the debugger, hit run, waited for a couple of seconds, and then,

Some things aren't as easy as they sound. So I spent another 20 minutes trying to figure out the problem. It ended up that a quick update of my IDE (Visual Studio) - the program I use for writing code - did the trick.

With the debugger finally running, I found that the loop going infinite was a loop I use for generating practically everything in the entire game! It was one of my random value generation functions that let you put some bounds on the generated value. (Programmers: I use normal distributions, not linear distributions, so this is not reinventing the wheel exactly.) That code had been used over and over - was there really a problem with it? Well, sort of. I quickly found the reason the loop went infinite was that the bounds' minimum was greater than the bounds' maximum, which was caused by a missing line of code in the atmosphere generation function. The thing is, only unusual other random values could cause the atmosphere code to put a minimum that was too large, so it hadn't been occurring with everything.

I added the missing line to the atmosphere function, but I also added a line to my bounded random value function that would crash the program if the minimum were greater than the maximum. One test run of the program to confirm I had fixed the issue, and finally, after quite the ordeal, that bug was over.

Hope you enjoyed reading about the work that can go into a game like this!


P.S. The moral of the story is: assert your preconditions.

One Knight Studio © 2019