Category: "Chicago DC"

OnApp Outage (Resolved)

April 11, 2015 at 1:01 PM

We are aware of and are actively working to resolve an issue on our OnApp cloud that is causing downtime to certain VPS. 

We would like to thank you in advance for your patience while we work to resolve this issue. We currently anticipate that the issue will be resolved within the next 30 minutes, so at approximately 01:30 PM CST.

Update @ 01:30 PM CST on 04/11/2015

Most VPS customers are back online at this time. There are still a few VPS offline, but they are now in the process of being booted back up. Those remaining few should be back online within the next 10 minutes.

The cause of this outage has been traced to a bug with the OnApp software that was present on certain hypervisors running a slightly different version, that we discovered while performing routine maintenance today. Those hypervisors have now been updated to be on the same software version as the rest of the cloud.

If you have any questions, or have any issues that you would like to report, please contact us at [email protected].

Emergency Maintenance Reboots - All Systems (1/27/2015)

January 27, 2015 at 12:08 PM

We will be rebooting all shared and reseller hosting systems shortly to address an urgent security update. Expected downtime is less than 15 minutes per server. We will update this blog post once the process is completed.

Update: The emergency maintenance has been completed.

Incoming TCP Syn Flood - Helios (12/11/2014)

December 11, 2014 at 4:26 PM

Users on our Helios shared hosting server are currently experiencing downtime as the result of an in-progress TCP Syn Flood attack. We are currently working to restore service as quickly as possible.

We apologize for any inconvenience and we thank you for your patience while we work to restore your service as quickly as possible.

EDIT: 4:30 PM CST 

The server is back online at this time.

Emergency Hypervisor Maintenance

November 16, 2014 at 4:54 PM

We are currently experiencing an issue on one of our VPS hypervisors that requires immediate maintenance. This will not impact all VPS customers, however a handful will see intermittent service during this time. We expect this issue to be resolved within 15-30 minutes. We are very sorry for the inconvenience. Please look to this post for further updates as the maintenance proceeds.

Update: Service has been restored for all VMs directly affected by the maintenance this evening. We are going through all VMs to ensure everyone is back online.

Incoming DDoS Attack - Helios (11/1/2014)

November 1, 2014 at 5:14 PM

Our admin Geeks have been dealing with a number of DDoS attacks targeting the Helios system that began last night. The DDoS attacks have caused significant downtime for users on the primary shared IP of Helios. We do have a solution, and most customers on Helios have received e-mails pertaining to this. Let's look at how we handle a DDoS attack.

DDoS Mitigation

These are the steps we take to mitigate incoming and frequent DDoS attacks:

Step 1: Nullroute. 

Our first line of defense against DDoS is to immediately halt incoming traffic to the attacked IP address(es). This is called a "nullroute." Our upstream provider nullroutes the attacked IP as soon as they see an attack and see that it meets specific criteria (namely, if it will impact other users significantly).

A nullroute does two things: it prevents other IPs in our own network from going down due to the incoming attack, and it also prevents routing of the attacked IP so the attacker can't reach the IP. This means they can't attack anything until our IP is back online. We leave the IP offline for a little while, and bring it back up once the attacker has lost interest. Often times this is the end of the attacks, but sometimes they pick back up later.

Step 2: Identify the attacked website.

Incoming DDoS attacks always have a target. The attacker has a grudge with a particular domain on the server, or maybe they just don't like the server. The underlying reason is rarely known; most attackers never reach out and let us know why.

If we can identify the attacked website, we can segregate the associated traffic to a different IP address. This way, we can ultimately just nullroute the IP hosting the targeted site instead of the IP hosting a few hundred websites. This limits service interruption to one account, and keeps other customers online during DDoS attacks. If the DDoS attacks never subside, then the account holder can choose to employ DDoS mitigation services to keep their site online through an attack. Mitigation services can be very costly.

Step 3: Account IP Dispersion.

If the attacked website is unable to be identified, we resort to something we call Account IP Dispersion. You may be familiar with your account IP address. The same IP address that hosts your account may also host anywhere from one to a few hundred websites. We try to bring that number down as far as possible by dispersing accounts among a dozen or more new IPs, and doing this relies on a bit of custom code we built here at GeekStorage. In short, the code scans each website on the attacked IP to determine whether the hosted domains on that account are using our nameservers. If all of the account domains point to our nameservers, then we can change the hosted IP reliably and without downtime. We do this by using both the old and new IPs to host the website while the IP transitions (which can take up to 24 hours).

The first Helios account IP dispersion is complete; no IP is host to more than 30 accounts (excluding the IP holding domains that do not point to our nameservers). Within 24 hours, if attacks continue, we will narrow down which account is being attacked to a subset of 30, and disperse those accounts once again until we determine the attacked website. Fewer accounts will be impacted by the DDoS attacks with each iteration of the account IP dispersion until ultimately only one account sees downtime.

Helios Status

The Helios server is currently 100% online. It has seen four downtime-causing DDoS attacks over the last 24 hours, but our Geeks are hard at work keeping an eye on incoming traffic and preparing for the next step should the attacks continue. We hope to post a quick resolution to this issue within the next 24 hours.

If you are concerned about additional downtime, there are a couple things to keep in mind:

  • Migration to a new server is almost as effective as migration to a new IP. Most customers are already on a new IP. The likelihood of additional downtime for any particular account has been reduced by about 85% by the account IP dispersion. There is also a small chance that your site is the target, in which case a migration will not help.
  • Enabling CloudFlare via cPanel can improve your site speed and reliability. This holds true whether your account is being affected by DDoS attacks or otherwise.

We hope this sheds some light on the actions we are taking to improve availability on the Helios server despite recent DDoS attacks. If you have any questions for our support team, please don't hesitate to ask. We are available 24/7 via our support desk or at [email protected].