Yesterday a flaw was discovered in one of the core components of Linux. The bug was quickly named “Ghost” and got its own logo. And when a bug has a name and a logo, it’s generally not a good sign. So what did we do in the last 24 hours to protect our customers? Exactly what we did with previous Internet-threatening bugs such as Heartbleed, Poodle and Shellshock: be the first provider in the Netherlands to fix the whole server farm.
How does the Linux Ghost flaw work?
At the core of every Linux system is a library that handles generic tasks. One of these tasks is translating names to IP addresses, such as “www.byte.nl” into “184.108.40.206” and so on. But somebody discovered that when you ask it to translate a specifically crafted name, the system will crash. And, depending on the input, you can make it do other things than translating. Such as – let’s get creative – installing a backdoor on your system.
So, you might say, how can somebody else force your system to translate a particular name? There happen to be a gazillion ways to do this. For example, when you enter your email address on a site, it will do a translation on the domain part of your email address.
What did Byte do?
The bug was published yesterday at the end of the day. Our team gathered on Whatsapp and Skype to discuss the impact and it was agreed upon that swift action was required. Debian and Ubuntu were really quick to provide fixed system packages. Installing these on all of our hundreds of servers was a breeze, thanks to the automation we have set up for exactly this purpose. However, just upgrading was not enough, as all of the running services (ftp, web, php, mysql, time, redis etc etc) would require a restart to load the updated code. For most services this is trivial, however restarting a busy database server might take up to 15 minutes.
So our database specialist team got together and came up with a very clever procedure to failover all of our database servers to the standby replicators, so that restarting would not affect our customers’ uptime.
And so it happened. At 1:58 in the night our efforts had payed off and all systems were declared “SAFE!”.