February 2017

Elevated error rates on ad serving
We have resolved the issue. New engines weren't being registered properly with our proxies when they scaled up, we have resolved it and fixed the core issue.
Feb 8, 09:37-09:55 EST

January 2017

Reporting Delay
Everything has been confirmed accurate and all data is loaded for Jan 1 and Jan 2 - Jan 3 continues to process as normal.
Jan 2, 15:55 - Jan 3, 16:57 EST

December 2016

UI/API Changes delayed
This has now been resolved.
Dec 16, 17:05-18:23 EST

November 2016

Reporting Slowdown
Things are now caught up and we have resolved the core issue. We also have a number of performance improvements shipping this week that will help speed up reporting for all reports.
Nov 28, 09:20-13:58 EST
Increased errors in Management API
This incident has been resolved.
Nov 28, 11:13-11:46 EST
Reporting Slow
Reporting is back to normal. We are still working on a number of performance improvements that you will see over the next couple weeks.
Nov 21, 11:50-14:43 EST

October 2016

Ad serving interruption
This incident has been resolved.
Oct 21, 07:43-16:05 EST
Reporting Delayed
The issue has been resolved and all reporting data is up to date.
Oct 1, 20:34 - Oct 2, 10:48 EST

September 2016

Report Failures Over the Weekend
Starting last Friday evening (Eastern Time), some reports were failing for some customers. This has been resolved as of this morning.
Sep 26, 10:53 EST
Elevated Latency
We experienced elevated latency for some customers between 1:00 and 2:00 AM EST - this issue has been resolved and engines are returning back to normal.
Sep 17, 03:04 EST
Report Failures - September
We've implemented a fix and reports should return correctly for those dates.
Aug 31, 23:07 - Sep 1, 02:16 EST

August 2016

Reporting 2.0 Outage
2.0 is up to date and fully available - we will post a more in-depth retro on Monday with what happened and how it was handled and improvements we can make.
Aug 20, 20:20 - Aug 21, 16:25 EST

July 2016

Reporting 2.0 planned maintenance
Reporting 2.0 maintenance is complete.
Jul 23, 15:09-16:46 EST

June 2016

Reporting 2.0 Maintenance
Reporting 2.0 has caught up on yesterday's data (6/15) and this morning's data (6/16), and it is now processing normally.
Jun 16, 00:03-15:20 EST
Keyword Reporting issue
The final Keyword data is being processed now and will be available in the next hour. We apologize for this inconvenience - as we begin moving to reporting 2.0 we look forward to leaving these issues in the past.
Jun 4, 16:38 - Jun 6, 15:57 EST

May 2016

Ad Serving Interruption
This issue has been completely resolved. While we never like to see downtime this was a great test of our new bulk heading system as many customers were not affected by an issues generated by a single customer. More information on this coming soon.
May 17, 19:31-19:50 EST
Ad Serving Issue
This issue has been resolved.
May 16, 17:57-18:07 EST
Reporting Service
The issue is resolved - we apologize for the inconvenience.
May 16, 16:44-17:25 EST
Updated SSL
We have updated our SSL cert on - this should not affect any ad serving but we wanted to send out a notification in case you are doing server to server and see any issues.
May 12, 12:06 EST
Reporting Delayed
Apologies for the delay in the update but all non-keyword data has been imported. Keyword data will load over the next couple of hours. We apologize again for the inconvenience this has caused. We have put a tremendous amount of resources in to building a very fault tolerant pipeline in Reporting 2.0 and look forward to having everyone moved to that very soon.
May 11, 09:57-18:38 EST

April 2016

Reporting Unavailable
All reporting data for Sat and Sun was fully restored over night. Monday has been fully loaded and today is running at close to normal delay (it might be an additional hour for some customers) and we fully expect it to catch up throughout the day. All of us here at Adzerk appreciate your patience and apologize for the inconvenience we know this caused.
Apr 24, 14:40 - Apr 26, 11:00 EST

March 2016

Engine Slowness
This issue has been resolved. We received a burst of a specific type of traffic that caused an issue on this cluster - we are working on a fix to make sure we can better scale to handle it in the future.
Mar 18, 15:16-15:22 EST

February 2016

No incidents reported for this month.

January 2016

Reporting Delayed
The final keyword data is loading and will be completed in the next hour. We are going to mark this as resolved. Please contact support if you have any questions or concerns.
Jan 14, 10:24 - Jan 15, 00:13 EST

December 2015

Elevated Response Times
This issue only affected a single customer and we have notified that customer so we will be closing out this out now.
Dec 13, 19:40-23:10 EST

November 2015

No incidents reported for this month.

October 2015

DNS Resolution Issue impacting Ad Serving on US East Coast
Things have been resolved by our CDN provider. The issue looks to have affected users on the east coast serve by their Virginia, New Jersey, and Atlanta nodes. Cloudflare (our CDN provider) will post additional updates here if you would like more information.
Oct 15, 17:06-17:50 EST
Ad Serving Interruption
This issue has been resolved. Our system works in a star configuration where each engine has a complete record of all the data it needs to serve ads. Each engine is pulling from a central database any changes (ad starts and stops). The central database went down earlier today (instance issue on Amazon), which our system is setup to handle. The engines continued to serve ads and we worked on restoring the database from the hot slave. While we were doing this the old db appears to have become reachable again but with no data, which then synced that data to all of the engines before we could un-hook those engines from the database. Once this happened the engines began to serve blanks. We saw this happen and immediately moved all engines to pull from the slave, but due to all 100 engines trying to pull all the data it took around 10 minutes for this to complete. Once engines could restore data they began to come back up and begin serving ads. This is something we have planned for and have a standard operation for - but we didn't anticipate the central db coming back online with no data. We are modifying our plan to include first un-hooking the engines to prevent this in the future. We are also setting up additional hot slaves so that if we did need to re-populate from scratch it would take much less time.
Oct 12, 15:42-16:11 EST
Reports not being returned in the UI
This incident has been resolved, you will see a slightly longer delay (2 hours) in reporting tonight but things will catch up over the course of the night. Reporting was only interrupted for about an hour while we identified the issue and redirected it to our fail-over server. The issue was a large number of un-optimized queries being sent to the master server which slowed it down to the point where it couldn't process the reports coming from the UI. After moving the reporting system to the fail-over we added additional indexes to handle the queries to master, then switched back over.
Oct 1, 14:52-23:24 EST

September 2015

Click Data Delayed
Click data has been reloaded and verified. A brief process is running to update revenue and that will be finished in 60 minutes. We again apologize for the inconvenience.
Sep 22, 09:31 - Sep 23, 01:55 EST
Elevated response times
Things have been steady since the last update and we are now marking this resolved. We will be following up with a post mortem to explain what happened and how we can prevent it in the future.
Sep 20, 08:43-15:08 EST
Reporting Delayed
Reporting is now completely caught up and running at it's normal pace. We apologize for the inconvenience this may have caused. As part of this issue we have developed a new tool to help reporting catch up in the event it gets behind. We process almost a billion impressions a day in reporting so any interruption can cause a delay that can often takes hours or days to recover from. The new tool we created will help further compress the days data to enable us to catch up reporting in a single import as opposed to through the normal process. Two quick notes on reporting. 1. We use a log system where every engine logs out impressions to a file that is then stored on a redundant storage system. This gives us the confidence that we are capturing every impression and never losing data as we can always compare these raw logs to reporting data to ensure consistency. 2. We are in the process of testing a new reporting system that we believe will be much faster, reliable, and will include additional data not currently tracked in reporting. We will have more news on this shortly. Thanks again for your patience and let us know if you have any questions or concerns.
Sep 16, 20:04 - Sep 17, 07:06 EST
Reporting Delayed
Due to unscheduled maintenance last night on one of our reporting machines impression data will be delayed by 5-6 hours today. This will not affect the validity of the numbers and does not affect clicks or events. We apologize for the inconvenience.
Sep 15, 07:59 EST
UI/API Not Responding
All Clicks and Revenue numbers are now loaded and correct. Please let us know if you have any questions or concerns.
Sep 7, 12:41 - Sep 8, 14:50 EST

August 2015

Reporting Issues
Our partner looks to have resolved the issue and reports should be returning normally in the UI again. Reporting will be delayed by about an hour today.
Aug 10, 15:20-15:36 EST

July 2015

No incidents reported for this month.

June 2015

No incidents reported for this month.

May 2015

No incidents reported for this month.

April 2015

Reporting Revenue Calculation
We have completed the restore of data for the 7th, 8th, and 9th. All network level and site level data is now complete and available. Again we apologize for the issues.
Apr 8, 20:43 - Apr 10, 17:10 EST
Ad Serving Interruption
We have resolved the issue that caused our downtime - things should be back to normal. We deployed a new release of the engine code today and that all went as planned. When we enabled a new feature though it was missing a configuration value and caused the engines to all crash at the same time. This causes a cascading failure where engines couldn't come back up without being hit with 100x more traffic than they could handle. We brought up additional engines to cover the load and within 20 minutes had enough to handle our production load. We have identified the bug in the code that allowed this missing config to crash the engine and a fix for that is going out right now. We hate nothing more than having downtime - please accept our apologies for this issue and know that we will be working to ensure that we prevent it in the future.
Apr 7, 17:03-17:49 EST

March 2015

No incidents reported for this month.

February 2015

Reports not returning
The issue has been resolved. We store each generated report in a database table and we ran into an issue with the number of reports in that table. We are currently purging older reports (we will keep everything less than 90 days old). This will not affect your reporting data, it will only affect any report URLs you have saved for over 90 days. Please let us know if you have any questions.
Feb 17, 16:54-17:52 EST

January 2015

No incidents reported for this month.

December 2014

Increased Ad Serving Latency and Error Rate
The capacity issue has been resolved and things are back to normal. We have identified the reason for the failure of the the cluster to scale up to meet demand and are taking steps to prevent this in the future.
Dec 28, 08:12-08:50 EST
Reporting Inconsistancy
The reload has completed and things now look correct. Again - apologies for the inconvenience this has caused.
Dec 18, 17:37-20:31 EST
Reporting Delays
We have completed the final updates to the data (updating revenue) and the restore is now complete. We all apologize for the inconvenience this has caused and the length of time it took for the restore to be completed. We have more than doubled our volume this year and due to that growth things took longer than expected.
Dec 16, 07:37 - Dec 18, 14:55 EST

November 2014

UI and API Unexpected Downtime
API and UI are both back up and running. Please let us know if you encounter any issues.
Nov 3, 10:54-11:45 EST

October 2014

Delayed Flight Changes
The backup has been cleared and things are back to normal. We will be releasing a new version of our flight processing system later today to build in some circuit breakers and speed improvements to help prevent this in the future.
Oct 21, 07:36-11:50 EST
Reporting Lag
This was previously resolved but the status wasn't updated.
Sep 28, 06:12 - Oct 6, 14:47 EST

September 2014

Reporting Lag
Yesterday successfully completed and today is in progress of catching up - things will be back to normal by afternoon and today will finish on time. Thanks for your patience.
Sep 22, 17:52 - Sep 23, 07:56 EST
Spike in Latency and 502 Response Codes
We experienced a massive spike in latency and 502 responses starting at 11:30 EST. We were alerted to the issue and immediately investigating. We were able to resolve the core issue by 11:45 EST and things began to recover and have fully recovered now. We are still evaluating what was the original cause of the issue and will have an update on that by the end of the day.
Sep 18, 12:50 EST
August Reporting Numbers
We have completed the reload of numbers. We apologize for the huge inconvenience this has caused. The total discrepancy was less than 1% so you shouldn't see a big change in numbers, but we value accuracy and wanted to make sure we got it right. We will be reloading the first 4 days of September as well and then running our new audit system on a daily basis to ensure this doesn't occur again. We thank you for your patience!
Sep 4, 15:47 - Sep 5, 20:53 EST

August 2014

Delayed Click Reporting
All clicks were re-processed last night and all reporting data is back to correct. Apologies for the inconvenience.
Aug 27, 09:42 - Aug 28, 07:44 EST
Elevated Response Times
We haven't seen this issue reoccur so we are going to go ahead and close the incident. We are still working to figure out why it was happening and how to prevent it in the future.
Aug 15, 12:19 - Aug 26, 14:07 EST

July 2014

Reporting Degraded
We have resolved this issue and reporting is back to normal. A larger than normal number of reports and new flights and flight changes resulted in one of our databases being over utilized resulting in slow reports. We have optimized access to that database to resolve the issue.
Jul 22, 12:55-16:10 EST

June 2014

No incidents reported for this month.

May 2014

CDN Outage
We have routed all traffic away from the troubled CDN provider and everything should be back to normal - it might take 10-15 minutes for these changes to propagate.
May 20, 16:06-16:37 EST
Creative Preview not working
We have resolved the issue with Creative Preview. We are continuing to test the new deployment and will be fixing any other issues we find.
May 16, 11:02-11:46 EST
Increased response times in ad serving
At approximately 11:45PM EST, this incident was resolved, and all ad requests should be handled at normal speeds.
May 15, 20:38 - May 16, 00:00 EST
Reporting Issue
We rolled back the change that caused this issue and things are resolved.
May 15, 15:38-16:07 EST

April 2014

Reporting Degraded Performance
We have completed our maintenance effectively doubling our reporting capacity. We hope that this will help speed up reports and make them more consistent. We are still working on some additional improvements that will be implemented over the coming weeks.
Apr 24, 19:40 - Apr 28, 09:21 EST
Customer Dashboards
This incident was due to a misconfiguration in our load balancer, affecting some white-label customers. It was resolved at approximately 10:15AM.
Apr 10, 10:04-10:22 EST

March 2014

Reporting Delays
Reporting caught up yesterday at 2:00pm EST. We left this incident open to ensure there were no other issues. We are confident now things are back to normal.
Mar 18, 10:40 - Mar 21, 15:55 EST
UI Stability
We believe we have identified and resolved the issues around UI Stability. We have been closely monitoring it and haven't seen the same issues we saw in the past this week. Please get in touch if you continue to see issues where you can't login or receive an error message.
Mar 3, 11:14 - Mar 14, 09:48 EST

February 2014

Reporting Latency
The issue has been resolved.
Feb 25, 14:02-14:47 EST
Reporting Discrepency
This issue has been resolved and the days in question have had their impressions reloaded from the original logs.
Jan 31, 14:05 - Feb 4, 10:34 EST

January 2014

Temporary disruption to email impression tracking
Fix was successfully deployed - impressions will be restored over the next 2-3 days.
Jan 25, 18:10-21:56 EST
AdFeedback Tracking
Deploy of the fix has been completed and events are now being properly recorded again.
Jan 23, 09:17-10:19 EST

December 2013

Reporting Delayed
Reports were delayed for a period today due to an issue with one of our services needed to process reports - the issue has been resolved and reports are now being processed. If you have been waiting for a report you should receive it shortly.
Dec 17, 15:04 EST
Click reporting delayed
This issue has been resolved.
Dec 9, 15:40-16:23 EST

November 2013

No incidents reported for this month.

October 2013

No incidents reported for this month.

September 2013

No incidents reported for this month.

August 2013

Ad Serving Outage
The load balancer that routes traffic to our ad servers has a health check that determines whether a given node in the cluster is operating correctly. It does this by collecting periodic "heartbeat" messages from different processes that are expected to be running on the node. If these heartbeats are not received within a certain period of time, the node is marked as unhealthy and removed from operation. We recently deprecated a process in our ad serving system that is no longer required. As we shut down this process across all nodes in the cluster, it stopped sending heartbeats to the health check process. Since the health check on each node expected heartbeats from this deprecated process, it began to report that the node was unhealthy. In turn, this caused the load balancer to remove all nodes from operation. We noticed the problem immediately and re-started the deprecated process, causing the health check to succeed again. The load balancer then re-added all nodes to the active cluster. We have since altered the configuration of the health check to not expect heartbeats from this deprecated process. Our ad serving system was down for approximately 60 seconds, followed by a period of approximately 60 seconds where the system was operational with increased latency, as nodes were re-added to the active cluster.
Aug 29, 16:09 EST
UI Reporting is Delayed
Process is running correctly and all numbers will be updated shortly.
Aug 26, 22:41 - Aug 27, 00:41 EST

July 2013

No incidents reported for this month.

June 2013

No incidents reported for this month.

May 2013

No incidents reported for this month.

April 2013

UI and API Intermittent Issues
We resolved the immediate issue and are working on a permanent fix.
Apr 11, 10:34-11:00 EST

March 2013

No incidents reported for this month.

February 2013

No incidents reported for this month.