Event driven design for gaming applications

Imagine you are designing a brand-new online platform where group of friends can play card games together.
Where do you start?

A good idea could be to start from your end users.
In the case of Controbuio the main user journey can be summarised as:

Users authenticate, create a game room and invite their friends to join them.
Once at least 2 players are in a game room, the dealer can kick off a game.

While this is pretty vague as many things remain unclear or unspecified, at the very least we know about 3 entities to be modelled: users, rooms and games.

So, how do you model a game?

Games are state machines

A poker game - perhaps any game - can be modelled as a state machine.
When a game is about to start, there’s a bunch of players sat in circle and a deck of cards. Then the deck gets shuffled, the blinds are collected, and the game moves to a state where you’re waiting for a player to act.
Depending on what that player does (and what other players have previously done - the current state) the game moves into a new state.

Events in input

How do you move from a state into another?
The answer is: something has to happen.
Something like “Joe wants to bet 100$”, “Jack has left the room” or simply “Five more minutes have passed”.
We call these “events”.

At any given point in time, the combination of the current state with an event might trigger a transition into a new state.

Events in output

How will other parts of the system know what is happening? How will players know when it’s their turn to act?

Turns out that all they want is to be notified of a state transition.
In fact, state transitions can also be modelled as events such as “The deck was shuffled”, “Joe raised to 40$”, “Jim took too long to respond”, “George won 100$”.

Games are pure functions

The internal representation of the state of a game is completely irrelevant to the rest of the system and should be made transparent.
For this reason, the game logic can be extracted and encapsulated in a single, independent component.

From the outside, such a component can be seen as a function that takes a stream of events in input and produces a stream of events in output.
This function is (or should be) pure as it doesn’t produce any side effect and its output exclusively depends on the stream of events in input.

Game rooms are state machines

Initially there are no rooms, no one is connected, then the first player comes along and asks to join a room. Suddenly we enter a state where a room does exist and has one player in it.

A lobby is a much simpler state machine compared to a game, but the concept is identical and the same considerations largely apply.
In fact, one could think of a lobby as a function consuming events such as “Joe wants to join room X”, and producing events such as “Joe joined room X” or “Joe was not admitted to room X as the room is full”.

An event-centric architecture

As we isolated the two main components (the lobby and the game), we can finally glue things together by connecting to the remote players.

What we need (on the backend side) is a middle-layer component - let’s call it the web app - that is neither particularly smart nor special, but whose only job is to handle the end users, perform authentication, and exposing a bi-lateral channel (a websocket for instance) for the remote players to interact with the system.

Decoupling

In theory, our 3 components could live in the same codebase and be deployed as a monolith. This is obviously the simplest solution, however, there are some advantages in having independently deployable units.

As a start, each unit could be running on a different platform and even be implemented in a different programming language.

Another advantage is scalability and resource optimisation.
A single instance of your application might not be able to cope with thousands of simultaneously connected users.
Moreover, different components use resources differently: the game logic - for example - is much heavier on CPU compared to the web app.
The ability of scaling components independently, or even configuring their environment in different ways could have a big impact.

An even more important advantage is isolation.
Services communicate indirectly by publishing and subscribing to a centralised event stream, which means that they don’t even need to know about each other:

New services could be plugged in without changing a single line of code somewhere else
As long as the model doesn’t change, an existing service could be replaced or even removed without impacting the rest of the system

The price to pay to achieve this level of decoupling is, of course, an increased infrastructure complexity, and the dependency on an external tool, such as a message bus.
There are several options out there, Kafka for example is a popular one. I eventually chose Redis streams to start with, being cheaper and simpler than Kafka, yet good enough for some use cases.

Event sourcing

It is clear at this point that the state of the entire system can be defined entirely in terms of the sequence of events that produced it.
Replaying an identical stream of events will lead to the same state, which is basically the main idea behind event sourcing.

In short, persistence comes for free as long as the stream of events is persisted, leading to some advantages:

As events are immutable, you’re safe from race conditions and free from locking
You can go back in time just by replaying all events up to a certain point

Event sourcing comes with a cost, of course.
In practice, it might become too expensive (financially and computation-wise) to strictly rely on the persisted event stream for re-building the state of a game.
For this reason, it might make sense to maintain up-to-date state snapshots.

An algorithmic approach

When you’re building your shiny new game application you might be tempted to start working on the fun part of it right away: the game algorithm.
This doesn’t necessarily pay off, in fact it can lead to fundamental design issues that are expensive to resolve.

Let’s forget about state machines, events and all the rest for a moment, and let’s focus on the game logic instead.
The first steps of a texas hold ‘em algorithm are:

Shuffle the deck
Collect the blinds
Distribute the cards
Ask the first player to act
a) if the player calls then do …
b) if the player raises then do …
c) if the player folds then do …

From this perspective, it might feel rather natural wanting to establish some sort of direct communication with the remote player: we need a player to act in order to move on, so we’ll need to talk to the player, right?
The first idea that comes to mind could be to send the player a message and wait for a response.
This could work, of course.

Now, let’s imagine that while we’re waiting for a player’s response, all other players suddenly decide to abandon the game, possibly out of boredom or poor internet connection.
In this case, the only remaining player can just grab the whole pot and the game terminates immediately. However, we are blocked waiting for the player to respond, so we need some sort of mechanism to keep an eye on what other players are doing and conditionally interrupt our wait.
This is solvable in a variety of ways, but in practice it might get pretty complicate in a pull-based model, where you ask for what you need when you need it.

On the other hand, our event-driven architecture offered an elegant solution to this problem, as our game state machine never blocks, but continuously consumes and reacts to new events, including “Joe wants to bet 100$” and “Jack disconnected”.

Conclusion

Despite being a great fit for Controbuio, event-driven design is not a one-size-fits-all.

What this experience proved to me, is that shifting away from an algorithm-centric mindset towards a data-centric one might lead to a different solution, possibly a better one.

If you’d like to know more, get in touch at info@epifab.solutions