Monogame Networking .. a Decade Later

Today I’ve been investigating options for integrating a multiplayer layer into my Monogame based game engine. When I first opened my browser to take a look, I popped open my bookmarks and saw a series of sites and postings circa 2012-2014 that talked exclusively about Lidgren, Raknet or ‘roll your own’. Or worse yet, there were numerous links to the now dead XNA.Networking API.

Enter Unreliability

After a bit of link purging, I began a new phase of research and stumbled upon the excellent BenchmarkNet project (https://github.com/nxrighthere/BenchmarkNet) which is a testing app for reliable UDP libraries.

Now, I must admit, I’m partial to UDP and reliable UDP in particular. This is a topic that is somewhat controversial but most high-end games are using some variation of TCP/UDP or reliable UDP. Sometimes together. Most ‘roll your own’ systems eventually become reliable UDP. I won’t rehash the arguments – but an excellent post can be found here and discussion here.

In my personal experience, TCP in game dev has given me headaches due to re-transmit issues and lack of packet prioritization. I’ll admit though that every game or project I worked on in the 2000s was fully or majority TCP – including the failed Shadowrun MMO and RunUO (Ultima Online). Times have changed though and reliable UDP is no longer a bad word (or so I hope). So let’s look at some of the primary options…

Let the games begin

Below are the latest results pulled from the 64 connected client test on BenchMarkNet’s github wiki.

As you can see, most of the libraries perform within 10% of each other except for a few particularly bad performances turned in by UNet and Lidren with issues related to memory consumption and CPU utilization respectively.

With the spread so narrow, I began to look at other things that I find important when picking out a library — source code access, license, and features. I won’t go through each one but I ruled out all but two options due to performance, license, lack of access to source, or monetization schemes I was uninterested in.

And the winner…

In the end I noticed that LiteNetLib often had the lowest CPU utilization while Neutrino was often not far behind but with a lower Bandwidth utilization. Better yet, both are OpenSource and MIT licensed! In addition to this, both libraries are exceptioally cross-platform, feature complete, have tight serialization, and work in either client-server or P2P configuration.

 

Ever Present Multiplayer – The Local Server

The approach that I’m leaning towards is the local game server pioneered by Id with Doom and Quake. This server embedded in the client allows you to code the game as if it was multiplayer no matter what while also supporting online gameplay modes. I think this approach would mesh well with the existing Entity Component System  (ECS) by jumping on the same hooks used by the AI for input and rendering. My thinking at the moment is that the new NetworkSystem can create AINodes (or a variant of them) which will represent the other players or the decisions of the AISystem. Either way, their logic remains largely the same and ‘just works’.

If my logic is sound, I can deploy to the xbox with the local server and if/when I get network API access on the xbox, I can point to a remote server and it should ‘just work’.

 

In any case, I’ll post back with my results on this whole networking refactoring!

P.S. Short aside: you might be wondering, what happened to the whole ‘migrating PC Game Engine to UWP’ project? Well, it turns out it was pretty painless. After a few minor changes (i.e., the Window class not having a Position) – I managed to get the engine up and running in under an hour. It turns out all of the planning and anguish I had spent over selecting only cross-platform libraries was worth it. This is a first…

-Jonathan

4 thoughts on “Monogame Networking .. a Decade Later”

    1. Hmm “Repository owner locked as resolved and limited conversation to collaborators”. Very unexpected to met with such on Github!

      So I will repost my issue report here:

      I’ve profiled performance a little more and looked deeper in the source code (including LiteNetLib).

      Turns out this benchmark is flawed towards the network libs with lower client overhead and cannot be used to measure the actual network library performance.
      It creates an unrealistically high amount of clients on a single machine (each representing a huge overhead) making the resulting measurement useless. The results of the benchmark represents how good the 1 server and N clients perform on the single PC – which is a far cry from what we actually want to measure – the performance on the network library, how much job it can handle on a single server in comparison to competitors.

      If the client implementation uses more CPU, the benchmark will render a lower results for that networking library. The more clients are used, the more results are skewed (as 1000 lower overhead clients will take overall less load and so allow better bandwidth than 1000 higher overhead clients – even if the server implementation in the latter case is many times faster!). If if the difference in client performance is minimal, when we multiply this difference on 1000 clients it might become the bottleneck!

      Let me demonstrate another perfectly valid one-liner difference to the source code which totally change the results and will emphasize my words.
      LiteNetLib uses 100000 microseconds as its poll duration while Lidgren uses just 1000 microseconds (1 ms) (in both cases this is done in while (true) loop). This way Lidgren supposedly has less latency but trades a little more CPU for that. (BTW, Neutrino doesn’t use polling as it uses async calls instead).

      Now multiply this little overhead on 1000 client threads which uses much more CPU. The results will be skewed. How much skewed? Let’s replace 1000 microseconds in Lidgren code to 100000, build it and use in the benchmark. My measurements:

      (with 1000 clients, Intel Core i7-4770K@4.26GHz, “Ultimate Performance” Windows 10 power mode with 100% CPU power mode and minimum background processes)

      LiteNetLib – 15 seconds to connect 1000 clients, 30-33% sustained CPU usage afterwards
      Neutrino – 16 seconds to connect 1000 clients, 32-34% sustained CPU usage afterwards
      Lidgren (RELEASE, with default 1000 poll duration) – 93 seconds to connect 1000 clients (very slow after about 800 clients), 90% sustained CPU usage afterwards
      Lidgren (RELEASE, with custom 100000 poll duration) – 15 seconds to connect 1000 clients, 17-18% sustained CPU usage afterwards

      And… Lidgren turns out to be the fastest and most memory-effective network library in this benchmark suite now.

      As you can see, the overhead related to more often socket polling (which supposedly reduces the latency in real world scenarios) is so high that it makes default Lidgren performance bottlenecked on polling in 1000 clients in this benchmark.
      Needless to say, this is not a realistic scenario – in real world we will have multiple machines each running their own client and a separate server.

      1. Hello Vladimir!
        Excellent research, I really appreciate the work you put into these benchmarks. You should post the code as a gist or git repo. I’d be interesting in looking at your changes.

        You’re right with regards to the connection overhead. I think that perhaps there should be a way to isolate that overhead but still report it. As I mentioned on Twitter, I think that connection induced CPU spikes (and inherent latency) are important to know about if you’re trying to build a multiplayer based game server. In other uses cases, I agree (though I have to run the benchmarks with your suggested modifications myself), it’s likely that Lidgren would become the best performer after the session start (i.e., a lobby based game or a trading session).

        I’m curious, did you build the other libraries in release with similar optimizations? It’s important to compare apples to apples.

        1. Thanks for your response, Jonathan!
          This benchmark and my further research is particularly interesting to me as I’m developing an online game with target of about 300 players online per server (.NET Core) so I have to be certain I’m using the best available tool for the job. From my internal testing Lidgren was always very fast and stable, without unnecessarily CPU or RAM usage. So I stick with it for the past few years, but always looking for possible alternatives, especially now as many new libraries developments reached a usable state and the Lidgren’s active development finished years ago.

          Regarding posting my code changes as gist – in my copy-paste from Github the links on the particular code lines disappeared so it might sound like I did something extraordinary or applied some “optimization”. But the only actual change in Lidgren source code I did is to match its duration of the socket poll call with LiteNetLib (Lidgren https://github.com/lidgren/lidgren-network-gen3/blob/cf976b8566dbc233630cdcaadc6d4fe67581724a/Lidgren.Network/NetPeer.Internal.cs#L405 and LiteNetLib https://github.com/RevenantX/LiteNetLib/blob/f92d54bae34d9432a525b38a7678db9bc6b59fd5/LiteNetLib/NetSocket.cs#L19 ). As I wrote above, Lidgren trades some CPU time to make 100 times more socket poll calls per second, which supposedly reduces the latency (but can only measured in the completely isolated client-server benchmark and I hope it will be done eventually by someone as I have some concerns that it’s justified). The overhead itself is not huge, but it becomes huge when multiplied on a high number of clients (and especially noticeable with 1000+ clients).

          Another “change” was that I’ve built Lidgren in RELEASE build (as everyone should use it) which resolved a very serious unnecessary bottleneck related to the initial network interface inquiry on Windows (as I have also reported and provided screenshot of the profiler report) – it was the reason why the developer of the benchmark measured so unusually high CPU usage of Lidgren, especially in 64 clients case.
          The applied change demonstrated that Lidgren matches LiteNetLib and Neutrino performance regarding the duration of connecting 1000 clients and even exceeds their performance in terms of CPU and RAM usage after all the clients are connected. The overall duration of test in all three cases was surprisingly the same in my case (95-96 seconds total). With the previous value of socket poll duration the benchmark was clearly capped by performance of 1000 clients threads stuck in loop (“while (true)” socket poll call) as JetBrains dotTrace reported to me. The performance is still capped by that (so the test methodology is not useful), but at least Lidgren matches socket poll duration of LiteNetLib so we get more “valid” results.

          Regarding the comparison of apples-to-apples – I agree. We should always build from the source in such cases.
          I’ve tried to build LiteNetLib and Neutrino from the source code (which in case of Neutrino latest is 1 year old while the latest benchmark release is 3 months old) and use already available releases for LiteNetLib, but turns out they all are incompatible with the benchmark which is very strange. The developer should have linked the exact sources he used to build them or provide the links on the releases. Reported accordingly https://github.com/nxrighthere/BenchmarkNet/issues/4 and https://github.com/nxrighthere/BenchmarkNet/issues/5
          The best we can do for now is to use the DLLs included with the benchmark suite as I simply cannot make it work with any other version I got.
          Regarding the test methodology – it’s not fair or realistic as it doesn’t isolate clients from the server. It only makes sense to benchmark 1 server vs 1 client this way, as otherwise it will favor the net library with the least client overhead, even if the server of that net library is much faster. 1 server, 1000 clients, total 1001 threads or even more performing the job, so server at best gets only 1/1001th of the available resources – while in the real world it will be all separate independent machines!
          The realistic benchmark might be done only with the independent virtual machines deployed in the cloud hosting (such as AWS, Azure or GCP) in the same datacenter. This way it will be possible to measure the actual performance of the server and the client, how well the server scales with the load, etc. But it’s much more complicated to develop and run such a benchmark – virtual machines orchestration can be simplified with Docker containers but still it’s a huge effort to properly implement. Also I believe there are restrictions in order to simultaneously launch so many virtual machines just for a few minutes to perform the benchmark. It’s also relatively expensive (as tests on your PC are “free”, but running 1001 virtual machines will cost about $0.01 per hour each one).

          If you want to perform your own measurements now, you can apply the socket poll duration change to Lidgren yourself or you can download already compiled Lidgren assemblies from my Drive here https://drive.google.com/open?id=13RST8REk1PGZbSAAvTByeSASr8hIYPt4
          You will need to download the latest BenchmarkNet release https://github.com/nxrighthere/BenchmarkNet/releases and replace Lidgren.Network.dll.
          Run it with all the default parameters. For accurate measurements ensure that you have 100% CPU power mode in Windows Power Options, have Turbo boost disabled in BIOS (so the CPU clock will not change) and there are no background CPU-intensive applications. Please measure the overall duration of the test (displayed when the test is finished), time to connect all the clients, and sustained CPU and RAM usage (which can be monitored with Task Manager or Process Explorer).
          Regards!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.