Blog
Testing a Robust Netcode with Godot
The biggest challenge I faced in developing Little Brats! was the online multiplayer part: synchronizing computers with sometimes consequent latency while maintaining the “fast-paced action game” aspect was far from simple. I'll tell you all about it!
Lag compensation, prediction/reconciliation, etc.
I'm not going to do a detailed tutorial on these points, as there are tons of them already, but to give you an idea of the principle: when a client computer performs an action (in my case, for example, pressing the button to slap another kid), the server will receive this action, calculate what's going on, and send the result back to the client...
The problem is that even with a slight latency between the two computers, say 10ms, you end up with 20ms between pressing a button and receiving the result. It may not sound like much, but if you put a 20ms delay between each press of your keyboard keys and the execution of the resulting action, you're going to lose your mind in no time.
In principle, this is compensated for by several techniques: in my case, the client “validates” the action performed by default and applies it in its local scene. This is a prediction. When the server receives the action, it rewinds the game by the duration of the latency (so, for example, 10ms backwards), applies the action, and runs the game universe again for the equivalent of 10ms, all in the background, without this being visible on the server's game. Then it sends the final state to the client, which when it receives it will either validate its own state if its prediction was correct, or correct it if the server returned a different result (this is called “reconciliation”).
I'm not going to lie to you: it's very VERY complicated to implement all this reliably and “invisibly”. In other words, the people playing the game have to be as unaware of it as possible, and there has to be no camera “jumps”, inconsistencies, weird stuff... In practice, this will always happen a little, of course: if your client had calculated that you'd managed to hit a kid, but on the server that kid had already moved out of the way, well, you'll see your action “cancelled”, and the kid will finally go as if nothing had happened.
It was because of a delay on this network aspect that I was forced to postpone the release by 2 weeks, because my code was far too unstable, and in some cases the game was left broken or in strange, buggy states. Anyway.
The thorny question of testing
Obviously, the best way to test a multiplayer game is with several people. But in the meantime, when you're in the middle of development and incrementing little by little, it's still necessary to test “alone” behind your PC.
Of course, you can have several machines to test on, but it's still relatively tedious, especially if you have to update the game on all machines every time you modify a line of script.
In short, the easiest thing to do is to run two or more instances on
your own PC and create a game on localhost
. Godot makes it very easy
to launch several instances while keeping the debugger open on each of
them, which is extremely handy. All this already makes it possible to
test good communication between several instances, and that's no mean
feat. Except, of course, that latency between two instances on the
same machine is very, very low, and communication between the two
instances is 100% reliable (we'll never “lose” a packet within the
same computer): we're not at all in real network conditions.
This is where a handy command (on GNU/Linux) comes in: tc
, for
traffic control settings
. Let's take a look at this command:
# tc qdisc add dev lo root netem delay 50ms loss 1%
This command, run in root, “artificially” adds 50ms of latency locally, as well as artificially losing 1% of packets in transit locally. In this way, we can simulate more or less degraded network conditions while remaining on a single machine, and thus test the robustness of our network code. Pretty handy, don't you think?
To remove this artificial degradation, simply do:
# tc qdisc del dev lo root
Godot, reliable/unreliable
Godot provides a high-level network API that abstracts from low-level
network protocols such as UDP or TCP. Here, Little Brats! uses the
ENetMultiplayerPeer
class, which uses the ENet library, itself based on UDP.
To explain the difference between UDP and TCP, take a look at one of the many memes on the subject:
Basically, TCP transmits packets reliably but slowly, and UDP unreliably (packets can be lost) but quickly. With its high-level API, however, Godot lets you choose the reliability mode:
reliable
: guarantees the arrival of all packets in the order they were sent. So we have an overlay to UDP where there's an internal mechanism that checks the reception of packets and resends them if reception fails. This can obviously be slower, especially on a degraded network with many losses. This gives us a kind of TCP equivalent, but on a UDP basis.unreliable
: packets can be lost, a kind of “raw” UDP mode.unreliable_ordered
: still unreliable, but at least the order of arrival of packets is guaranteed (a mode I've never used myself).
In practice, how does this work when the game is running? Well, I thought I'd set up a few tests to measure this.
I've set up a simple program that simply calls a remote function (via
an rpc
call in
Godot)
every 4 frames, 50 times in a row. On the client side, we note, for
each frame, how many times we've received this call. The following
diagrams show the number of packets received at a given time (and, for
correspondence, the packets sent by the server below).
(If you're interested, you can download the Godot project to run the tests yourself.)
Obviously, if we have a perfect network (no latency and no packet
loss), we end up with the server's sends almost perfectly synchronized
with the client's receives, whether in reliable
or unreliable
mode:
If we add a little latency (50ms), we can see the time lag between
the two “combs”. On the other hand, reception remains more or less
regular, and once again, there's no difference between reliable
and
unreliable
:
Of course, the difference in behavior lies in the addition of packet
losses. Here, for example, is the effect of a 1% loss rate when using
the unreliable
mode:
And if we push to 5%:
We can see that some calls are lost. These are much more than 1% or 5% of lost calls, because a call is made up of several packets (and it only takes one lost packet to cause the entire function call to be lost). On the other hand, for packets actually received, there's a good degree of regularity.
What's interesting is what happens in reliable
mode with 1% loss:
And with 5%:
Do you understand what's going on? When a call gets lost, Godot will try to resend it until it gets through... I don't know the implementation details, but I imagine that there's some sort of packet indexing and that, on the client side, Godot waits until it has received all the packets in order before calling the functions.
As a result, when a packet is lost and the loss is “fixed”, all the late calls are received at once! This method is therefore very effective in ensuring that no function calls are lost... but it does come at a cost: some packets may arrive very late, delaying subsequent packets!
In practice, I use unreliable
mode when the server sends the state
of the game to clients: in this case, if a state is lost, it's not a
big deal, but it's more interesting for the client to have the next
state “well synchronized” than to receive several states at once.
I use the reliable
mode for sending client inputs to the server: the
server needs to be able to recalculate the state of the game reliably,
and it's not acceptable for some client inputs to get “lost”. This may
cause a bit of latency, and a bit more work for the server, which will
have to “rewind” the game a bit further if an input arrives very late,
but that's the price to pay for a stable game.
And of course, it goes without saying that reliable
mode is used for
everything that requires a guarantee of reception: opening
communication between server and client, sending signals such as game
start, stop, score, etc.
Going even further?
We already have a good basis for testing with tc
, but we can do
better (or worse, depending on your point of view): in practice, the
quality of a connection between two computers can vary over time (a
network that becomes congested, someone playing on a phone in transit,
etc.). What happens if, all of a sudden, the latency of one of the
computers increases from 15ms to 50ms? If we start losing 2% of
packets instead of 0%?
To test this, I've set up this little script, to be run in root while the game instances are running.
#!/bin/bash
while true; do
delay=$((RANDOM % 91 + 10))
loss=$((RANDOM % 4))
interval=$(awk -v min=2 -v max=5 'BEGIN{srand(); print min+rand()*(max-min)}')
echo "Real network simulated with: delay=${delay}ms loss=${loss}%"
tc qdisc add dev lo root netem delay ${delay}ms loss ${loss}%
sleep $interval
tc qdisc del dev lo root
done
This script will modify network quality regularly (between 2 and 5 seconds, at random), adding a latency of between 10 and 100ms, and a loss rate of between 0% and 3%. So yes, this simulates a really variable and rotten network, but after all, if the game runs in rotten conditions, it should run all by itself in correct conditions!
I won't show you any graphs for this variant, as you'd have to leave it running for a long time to see anything, and the “comb” becomes too dense, but I think it's an interesting little piece of scripting.
Conclusion
Of course, none of this is a substitute for real multiplayer testing, with people connected in more or less distant cities. If only for the interaction aspect, if not for the network aspect.
But with a few tc
commands, you can already simulate a real,
imperfect network, and thus debug and fix a lot of things without
leaving the comfort of your single workstation (well, provided you
have enough RAM to run several instances of the game).
And for those of you who wanted to understand a little better the
concrete behavior of Godot's reliable
and unreliable
modes, I hope
these explanations and graphs have helped you.
See you soon for new adventures :)