Roblox CEO Explains What Happened In Massive Server Outage

2 years 5 months ago

Roblox is one of the most popular games in the world, and it was offline for more than 60 hours over the Halloween weekend as part of a massive outage. Roblox Corp. founder and CEO David Baszucki has now published a blog post that explains what went wrong.

Issues with Roblox started to appear on Thursday afternoon of October 28, at which time players reported having issues logging in. Baszucki said it was at this time that its teams began "working around the clock" to find the source of the issue and get the game back up and running.

But this was not easy, Baszucki said. In short, the server issues were caused by a "growth in the number of servers" for Roblox's datacenters, caused by a "subtle bug" in the game's backend.

"This was an especially difficult outage in that it involved a combination of several factors. A core system in our infrastructure became overwhelmed, prompted by a subtle bug in our backend service communications while under heavy load," Baszucki said. "This was not due to any peak in external traffic or any particular experience. Rather the failure was caused by the growth in the number of servers in our datacenters. The result was that most services at Roblox were unable to effectively communicate and deploy."

The reason it took the team multiple days to get Roblox back up and running came down to the "difficulty in diagnosing the actual bug," Baszucki said.

"Recovery took longer than any of us would have liked. Upon successfully identifying this root cause, we were able to resolve the issue through performance tuning, re-configuration, and scaling back of some load. We were able to fully restore service as of this afternoon," Baszucki said in the post, dated October 31.

Looking ahead, Baszucki said Roblox Corp. will release a more in-depth post-mortem report about what happened once the teams complete their analyses. This message will also include details on what the studio is doing to make sure this type of outage doesn't happen again. Additionally, it sounds like Roblox Corp. will offer a payment of some kind to creators to make up for losses related to the outage.

"We will implement a policy to make our creator community economically whole as a result of this outage. There are more details on this to come," Baszucki said.

Finally, Baszucki said players don't need to worry about any of their "player persistence data" getting compromised as a result of the outage.

"We are grateful for the patience and support of our players, developers, and partners during this time," the executive said.

To put the outage into perspective based on the number of people affected, Roblox Corp. reported that the game had 48 million players in August. Like other tech companies, Roblox Corp. is trying to create a metaverse of sorts with Roblox.

Author
Eddie Makuch

Tags