You are here

Broken Synapse: writing a DSO decompiler

ivan's picture
I've been a huge fan of Frozen Synapse ever since it was released back in 2011. It's a strategy game which looks like chess, only players move their pieces at the same time and discover the outcome at the end of the turn.
I had thought about looking into the game's internals a few times these past years and I finally found some time to do it. Here's what I figured: like a poorly written piece of poker software, I thought it was likely that the game client received information about the position of enemy units that would give a tactical edge to anyone reading them. Not that I intended to cheat, but I wanted to know it someone could. This post retraces the steps I took to accomplish this.

The first thing I do is use Wireshark to look at the network traffic. A simple filter (ip.addr == 62.197.39.230, the game server IP being displayed in the settings) makes it possible to see all the data sent to and received by the server. The good news is, nothing is encrypted. The protocol looks like this:

textcom	prelogon
textcom	command		saltRec		29007	2
textcom	login		[user]		[password]	33
textcom	loggedIn	30787		[user]
textcom	command		setMyOS		windows.steam
writeFile		psychoff/activeGames.txt	121	gz	161
textcom	command		refreshPeopleOnline

I spend some time figuring out which algorithm is used to hash passwords. It turns out it's MD5(salt + MD5(password)), but the password's MD5 is converted to uppercase before being re-hashed.
After a few hours, I have a Python client which can connect to the game server, get information regarding ongoing or finished games, etc. Frozen Synapse leagues could probably use this script to automate result checking, etc. It can also retrieve the files describing elapsed turns. These are the files I think may contain information which is not displayed to the player; sadly they use an undocumented binary format:

0000h: 06 00 00 00 88 31 33 32 31 38 33 34 09 6D 75 6C  ....ˆ1321834.mul 
0010h: 74 69 74 75 72 6E 09 63 61 66 66 69 6E 61 74 6F  titurn.caffinato 
0020h: 72 09 32 09 34 09 30 09 4D 61 74 63 68 6D 61 64  r.2.4.0.Matchmad 
0030h: 65 09 30 09 31 09 30 09 53 6E 69 70 65 72 5A 77  e.0.1.0.SniperZw 
0040h: 6F 6C 66 09 30 09 2D 31 09 30 20 31 09 35 32 20  olf.0.-1.0 1.52  
0050h: 36 32 09 36 31 30 20 32 34 38 09 34 32 37 36 09  62.610 248.4276. 
0060h: 31 38 38 09 31 38 09 31 32 39 09 63 61 66 66 69  188.18.129.caffi 
0070h: 6E 61 74 6F 72 09 53 6E 69 70 65 72 5A 77 6F 6C  nator.SniperZwol 
0080h: 66 09 2D 34 30 2E 30 09 30 09 30 09 30 00 00 00  f.-40.0.0.0.0... 
0090h: 00 08 00 00 00 00 00 00 00 0D 0A 6D 61 63 68 69  ...........machi 
00A0h: 6E 65 47 75 6E 00 01 00 00 00 01 00 00 00 00 FF  neGun..........ÿ 
00B0h: FF FF FF 00 00 80 BF 03 00 00 00 00 00 00 00 00  ÿÿÿ..€¿......... 
00C0h: 00 00 00 40 AB 19 44 D0 44 CE 41 01 00 00 00 00  ...@«.DÐDÎA..... 
00D0h: 00 80 BF 00 00 00 00 01 00 00 00 2E 00 00 00 00  .€¿............. 
00E0h: 00 00 00 01 00 00 00 40 AB 19 44 D0 44 CE 41 5B  .......@«.DÐDÎA[ 
00F0h: 67 16 44 E2 23 6E 42 00 00 00 00 00 00 00 00 00  g.Dâ#nB......... 
0100h: 00 00 00 5B 67 16 44 E2 23 6E 42 01 00 00 00 00  ...[g.Dâ#nB..... 
0110h: 00 80 BF 00 00 00 00 02 00 00 00 39 00 00 00 69  .€¿........9...i 
0120h: CE 9F C2 9C DE 7B C0 00 39 00 00 00 37 DC 80 C2  ϜÞ{À.9...7܀ 
0130h: 34 AF 3D 42 00 00 00 00 00 0E 72 6F 63 6B 65 74  4¯=B......rocket 
0140h: 4C 61 75 6E 63 68 65 72 00 02 00 00 00 01 00 00  Launcher........ 
0150h: 00 00 FF FF FF FF 00 00 80 BF 05 00 00 00 00 00  ..ÿÿÿÿ..€¿...... 
0160h: 00 00 00 00 00 00 7E 5F 2D 44 DB 39 B7 43 01 00  ......~_-DÛ9·C.. 
0170h: 00 00 00 00 80 BF 00 00 00 00 03 00 00 00 2E 00  ....€¿.......... 
0180h: 00 00 00 00 00 00 27 00 00 00 4B 01 E7 43 FD C2  ......'...K.çCýÂ 
0190h: A0 43 05 00 00 00 01 00 00 00 7E 5F 2D 44 DB 39   C........~_-DÛ9 

People familiar with the game will recognize some elements here and there: game ID, player names, and also some unit names a bit further. It's fairly obvious that the file is split into two parts: a header which is composed of tab-separated text, followed by unintelligible data.
I start with the header: the first field (6) is likely a version number, and the next byte (0x88) seems to match the header's size. To figure out what the rest is, I get the information for a lot of games, draw a big table on a sheet of paper and make educated guesses. Again, a Python script is worth a thousand words. The following table describes the header's structure:

Field index Description
0 The identifier of the game.
2 Name of the opponent.
3 Which "side" of the map belongs to the player.
4 The current turn.
5 Whether the player has committed his turn.
6 Description of the game.
7 Whether we are currently in the bidding phase.
8 Whether the game is finished.
9 Whether we are spectating the game.
10 The name of the player who was "replaced" if we are spectating.
12 An estimation of who is favoured in the game based on the players' ELO and match history.
13 The win/loss record between the two players. The two numbers are separated by a space.
14 The overall win/loss record for the first player.
15 The overall win/loss record for the second player.
16 The rank of the first player on the leaderboard.
17 The rank of the second player on the leaderboard.
18 The level of the first player.
19 The level of the second player.
20 The name of the first player.
21 The name of the second player.
22 The score of the game.
23 Whether the game is using timed turns.
24 The duration of the turns in case they are timed.
25 Whether the opponent has committed his turn.

The second part of the file is way harder. Based on the strings in it (machineGun, rocketLauncher), we can assume that we're looking at information related to units. I stumble around for a long while, but ultimately fail to make significant progress. Even though I find out the unit's starting coordinates and owner ID, I never get how the rest, which I'm sure is waypoints, is structured.

First attempt

Time to dive into it: I run the game with a debugger and place a breakpoint on networking functions such as recv, hoping to find which function parses the received data. Sadly, I end up looping inside a huge switch-case. Frozen Synapse's Wikipedia page indicates that the game relies on an engine called Torque Game Engine which uses its own script language. I'm reversing the interpreter. A quick glance inside the game's folder reveals a lot of ".cs.dso" files which contain the bytecode generated by those scripts. In all likelihood, the processing I'm looking for takes place in there.

Second attempt

Reversing a VM is far from being easy, but I get a lucky break: Torque Engine 2D was open-sourced in 2013. After a lot of reading, I identify the following places of interest in the project:

To sum up briefly, the Torque Game Engine's VM doesn't have registers. It uses three stacks which are used to perform all its operations: one for the strings, one for the integers and one for the floats. For instance, to add two numbers, you would push them both on the float stack, then use the OP_ADD opcode to let the VM know that the two operands must be popped, and that the result should be pushed.

The DSO file structure isn't very intricate: the header contains a version number, tables of immediate values used in the script and instructions. Here is the detailed description, starting from offset 0:

Name Size of the field (bytes) Description
version 4 The version of the VM which generated the bytecode. It can only be run if there is an exact match (no backward-compatibility).
gst_size 4 The size of the global string table.
global_string_table gst_size The global string table. All the strings used in the script are located here, separated by null bytes. Strings are referenced via an offset inside this "table". For instance, if the opcode OP_LOADIMMED_STR is followed by N, then the string which should be used starts at offset 8+N and stops at the first null byte encountered.
fst_size 4 The size of the string table used by functions.
function_string_table fst_size The function string table. It works exactly like the previous one, except that it contains strings which are used in the scope of functions (i.e. if we are in a function and a string is referenced, then it should be retrieved in this table instead of the previous one). I'm not really sure why they went through the trouble of splitting them into two tables.
gft_size 4 The global float table.
global_float_table 8 * gft_size An array of floats used in the script. They are accessed through their 0-based offset (2 would designate the third float, etc.).
fft_size 4 Size of the float table used by functions.
function_float_table 8 * fft_size The function float table. Like the strings, floats used in functions are located in a separate table.
code_size 4 The size of the bytecode in opcodes.
linebreak_count 4 The number of breakpoints.
code ? The opcode stream which constitutes the script. Opcodes can either be encoded as a single byte (for values between 0 and 254), or as 0xFF followed by 4 bytes (for values greater than 254). They are referenced by their 0-based index in this stream : the opcode which is executed after a jump to location N is code[N].
linebreak_pairs 2 * 4 * linebreak_count Information relative to breakpoints. Based on the name, we can guess that they contain pairs of opcodes on which the VM should break and their associated line number, but I didn't spend too much time on those.
it_size 4 Number of entries in the IdentTable.
ident_table ? References to the strings are absent from the opcode stream (a value of 0 is left everywhere string offsets should be found), probably because of how the bytecode is generated. This table is used to "patch" the opcode stream to put the string offsets back in. Each entry has the following structure:
Name Size of the field (bytes) Description
real_offset 4 The correct offset to insert.
count 4 The number of places to patch.
locations 4 * count A list of offsets in the code stream where the null value must be replaced by real_offset.

This script can be used to parse a DSO file.

The nice thing is, just like Java's bytecode, the name of functions and variables is present in the file. This is why it's possible to recreate a source code very close to the original.

Writing a decompiler

Now that all the information contained in the DSO file has been retrieved, I can start working on a decompiler. To make things easier, I use the open-source Torque Engine to generate increasingly complex scripts and I put breakpoints in the VM to study the execution process. Every time I encounter a new opcode, I implement it in Python. I start by writing an emulator which helps me keep track of the VM's stacks.
I implement the arithmetic operations first (OP_ADD, OP_SUB, etc.). When the VM encounters one, it uses the two top values from the float stack as arguments. For instance, the following bytecode computes a sum:

OP_LOADIMMED_FLT [offset in the float table]
OP_LOADIMMED_FLT [offset in the float table]
OP_ADD

The two first opcodes load immediate values from the float table into the float stack, then OP_ADD pops them and pushes their sum. My decompiler does the exact same thing, except it saves the operation instead of the result (i.e. instead of pushing "5" on the float stack, I would push the string "2 + 3"). And the code reappears mechanically.
I handle comparisons (OP_CMPEQ, etc.) the same way, except that they use the integer stack.
The Torque VM doesn't optimize anything, which makes the process a lot easier.

Let's look at how values are assigned now:

OP_LOADIMMED_STR [offset in the string table]
OP_SETCURVAR [offset in the string table]
OP_SAVEVAR_STR

In this example, two strings are used: the first one is the assigned value (pushed on the string stack), and the second one is the name of the destination variable: this means that the emulator also has to keep track of the current variable. The last opcodes pops the string and affects it to the variable. It's trivial to figure out the original code here: just put an equals sign between the two strings.

You get the idea: I'm going to fast forward to the most difficult part. Control structures (if/then/else, for/while) and function declarations. In order to handle those, I have to stop matching the original VM's behaviour.
The reason is, after an "if" statement, the VM can keep executing opcodes normally, whereas I have to close the bracket. Or when the OP_RETURN is encountered, the VM knows it has to leave the function, but as far as I'm concerned there may be other exit points so I can't use these opcodes to figure out whether the function stops here or not.
My solution was to create additional opcodes, such as META_ENDFUNC ou META_ENDWHILE to "remember" where structures end. The problem when you insert opcodes into the stream is that any absolute jumps get desynchronized. I made the assumption that the code never jumps out of a control structure, and it seems to hold so far. Since my opcodes are only inserted at the end of control structures, and since jumps don't go further than that, no desynchronization occurs as long as I remember to remove the opcodes I inserted when I reach them. Had the language possessed a goto keyword, this assumption would have been false.
The other problem is to reliably identify what kind of structure we're facing when an conditional jump is encountered. In order to figure this out, we can look at the opcode which is located just before the jump's destination. I ended up with the following table:

Opcode before the jump destination Control structure
OP_JMP The bytecode looks like this:
    [test which pushes its result on the stack]
*   OP_JMPIFNOT
|   ["True" branch, executed if the test succeeds]
|   OP_JMP            -------------------------------,
`-> ["False" branch]                                 |
    [opcodes which follow the control structure]  <--'

This is an if-then-else structure.

OP_JMP This is very similar to the previous one: if the opcode before OP_JMP is OP_LOAD*, this may be a the ternary operator being used (a ? b : c). The only way to know is to emulate the code of the branch and look at the stacks afterwards.
OP_JMPIFNOT or OP_JMPIFFNOT A for or while loop:
      [test to check if we should get out of the loop]
*     OP_JMPIFNOT
| ,-> [loop instructions]
| |   [test to check if we should get out of the loop]
| *   OP_JMPIFNOT
`---> [opcodes which follow the control structure]
Anything else This is a simple if with no else:
    [test]
*   OP_JMPIFNOT
|   ["True" branch, executed if the test succeeds]
`-> [opcodes which follow the control structure]

I have hidden other issues that I faced, but the important thing is that I managed to write a decompiler which regenerates a source code very close to the original. Interested readers can look at this script to learn more.
The code is neither clean nor elegant, but it works.

Decompiling Frozen Synapse

Now that I can decompile .cs.dso files, I can read the game's scripts to find out all I want to know. Victory! Or so I thought. My decompiler fails miserably and I understand why: Frozen Synapse's DSO files were generated circa 2011, a few game engine versions ago. The old bytecode is not compatible with the latest VM.

But it can't have changed that much, can it? I brace myself and open Frozen Synapse with IDA Pro. Tip: I choose to reverse the Linux version of the game, hoping that the developers have forgotten to strip the game's symbols and I'm rewarded for it. Thanks to this, I don't get lost in the compiled game engine and I can focus on finding the differences with the 2013 Torque Game Engine. Two major differences turn up:

  • The opcode list has been shifted in two places because of two new opcodes.
  • Offsets in the string table are encoded on 4 bytes in the old version, whereas they are encoded on 8 bytes in the new one. Comments in the VM code indicate that this may be related to a 64 bits version of the engine.

So I fix my decompiler to make it support both versions of the bytecode - a lot of pain summed up in a single sentence.

In the end, I manage to recreate Frozen Synapse's source code. Better: the code is so close to the original that it compiles back to DSO files, and I can use it instead of the compiled scripts! (The Torque VM recompiles scripts on the fly when a more recent version is found in the game folder.)

Modding the game

Apart from looking at how the game works, I can now modify Frozen Synapse in a very clean manner. All I have to do is open any script with the notepad, edit the code, and voila! The first thing I do is grant myself an achievement which is impossible to obtain today, since it involves beating a player who isn't active any more.

But remember, the goal was to discover the position of enemy units. Instead of setting up a proxy, parsing turn files, etc., why not simply disable the fog of war in the game's client? Easy as pie: here is an excerpt from psychoff/gameScripts/mtInGame.cs 2.0:

function loadMTStage4()
{
    if ($manager.visMode != 0.0)
    {
        error("****** Forcing \"Light\" visibility...");
        $manager.visMode = 0;
    	$gmPrefix = "\"Light\" ";
    	MatchInfoVisGraphic.setBitmap("common/gui/images/15black");
    }
    else
    {
        $gmPrefix = "Light";
        MatchInfoVisGraphic.setBitmap("common/gui/images/15white");
    }
   // ...
}

Witness the result in all its glory:

Conclusion

I realize that publishing this article will cast some suspicions on my last Frozen Synapse wins. I hope people will believe me when I say that I worked on this for the intellectual challenge and that I only used that mod once or twice online to verify that it worked.
My conclusion is that when designing a client-server architecture, any information which should never be displayed to the user must simply not be sent to him.
As I write this, Frozen Synapse 2 has just been announced and I'm very curious to find out whether Mode7 Games will use Torque Game Engine again, and whether I will be able to re-use my decompiler!