Incident Report – Ad serving disruption December 12, 2013

On December 12, 2013, our ad server hosting platform experienced two time intervals in which ad serving was very slow or out of order. The first time interval began around 11.30 UTC and lasted approximately 90 minutes. The second time interval began around 16.35 UTC and lasted approximately 20 minutes. Since this incident affected a very large subset of our customers, we decided to post this message on our blog to provide some more detail about the cause of the incident.

Most people are familiar with OpenX Source as the software used by website publishers, and to some extent by ad networks. However, it can also be used as an ad server for advertisers and media agencies. One of our hosting clients, a media agency, created a set of ad tags and sent them to a media company where they had purchased ad inventory. An employee at the media company appears to have made a mistake when they entered the details of the inventory purchase. Instead of delivering the target number of impressions over the course of several weeks, they entered the duration as just a few hours. As a result, the ad tags started to get requested at a rate of many millions per minute, effectively flooding our ad server hosting platform with requests to a point where our servers were no longer able to handle the volume.

Since the error was made by a third party with whom we have no direction relationship, it took us longer than we had wished to have the erroneous ad campaign deactivated. After things had settled down, it appears the same person made the exact same error once again, causing the second disruption time a few hours later. Since it was already after business hours, we were unable to contact the media company, and therefor had no remedy left but to take down the ad server system we host for our affected customer, in order to ensure all ad servers we host for our other customers were no longer affected.

We have since been in touch with the general manager of the company employing the staff member who caused so much trouble. They are reviewing their internal processes, and they have promised they will report back to us how they intend to prevent this from ever happening again.

Unfortunately, there simply isn’t any way to prevent this type of situation, especially since the cause of it was entirely outside of our control. We believe we’ve done everything we can at the time of the incidents to minimize the impact on our other customers, up to the point where we took one of our customers out of business entirely. As you can imagine, a measure we did not enjoy implementing, but we had no other choice.

OpenX Source Ad Server