Long response time, apps load slowly
Incident Report for SweetHawk
Resolved
Apologies to all customers affected by the intermittent outages and slow response times of apps earlier today. The root cause of the long delays and timeouts when loading and using apps was due to a combination of high load and inefficiencies in some operations, causing requests to be queued both at the database level and web server level. The high load on the database server has exposed inefficient database queries, making the problem worse. We are fixing all of these issues. Some of the inefficiencies have been addressed as of now, preventing the same issue from occurring again, these mostly were related to the Notify app, which is the reason the app has been under maintenance longer than others. We've planned further changes in particular to the inner workings of the Tasks app most notably the child ticket status updates which normally take up a large percentage of the app server's workload. These changes will require Zendesk app review due to the need to update some of the app's built-in triggers/targets. Until and after the new version has been released, we'll be actively monitoring these processes to make sure they don't cause further problems in the meantime.
Posted Sep 13, 2019 - 06:14 UTC
Update
All apps have been functioning well for the last few hours, including the Notify app which continued with some problems. We're continuing work to confirm the root cause while implementing various optimizations.
Posted Sep 13, 2019 - 02:15 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Sep 12, 2019 - 21:29 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Sep 12, 2019 - 21:09 UTC
Monitoring
Further cause identified, response times back to normal levels.
Posted Sep 12, 2019 - 20:57 UTC
Identified
We're still investigating further delays.
Posted Sep 12, 2019 - 20:32 UTC
Monitoring
We've cleared the long running database requests, server response times should return to normal levels shortly.
Posted Sep 12, 2019 - 20:25 UTC
Identified
We've identified the inefficient queries that are being optimized at this time. Some long running app server requests are being stopped.
Posted Sep 12, 2019 - 19:43 UTC
Update
We are continuing to investigate this issue.
Posted Sep 12, 2019 - 19:40 UTC
Investigating
We're investigating the cause of the database server delays.
Posted Sep 12, 2019 - 19:17 UTC