Ad Serving Interruption
Incident Report for Kevel
Resolved
This issue has been resolved. Our system works in a star configuration where each engine has a complete record of all the data it needs to serve ads. Each engine is pulling from a central database any changes (ad starts and stops). The central database went down earlier today (instance issue on Amazon), which our system is setup to handle. The engines continued to serve ads and we worked on restoring the database from the hot slave. While we were doing this the old db appears to have become reachable again but with no data, which then synced that data to all of the engines before we could un-hook those engines from the database.

Once this happened the engines began to serve blanks. We saw this happen and immediately moved all engines to pull from the slave, but due to all 100 engines trying to pull all the data it took around 10 minutes for this to complete. Once engines could restore data they began to come back up and begin serving ads.

This is something we have planned for and have a standard operation for - but we didn't anticipate the central db coming back online with no data. We are modifying our plan to include first un-hooking the engines to prevent this in the future. We are also setting up additional hot slaves so that if we did need to re-populate from scratch it would take much less time.
Posted Oct 12, 2015 - 16:11 EDT
Update
Ad serving has been restored to all engines and you should see ads serving again. Full explanation coming soon.
Posted Oct 12, 2015 - 15:56 EDT
Monitoring
Ad serving is beginning to be restored with engines coming back online as soon as they sync the data they need.
Posted Oct 12, 2015 - 15:51 EDT
Identified
We are experiencing an interruption to ad serving. Request are being returned but are serving blanks. We have already identified the issue and a fix is being deployed right now.
Posted Oct 12, 2015 - 15:42 EDT