Post-Incident Review for Degraded Connectivity on May 6-7, 2026
A full account of the May 6–7 network disruption, root cause, and the steps we've taken to prevent recurrence.
Summary
On May 6-7, 2026, the network connectivity supporting Skydio flight operations was degraded, and the resulting disruption prevented some customers from flying missions successfully. Customers flying X10s from Docks and controlling those flights through Remote Flight Deck first reported poor video performance during drone flights on the afternoon of May 6, with heavily pixelated video and high latency. Affected flights had significantly lower throughput than usual, with throughput of just 0.5 Mbps on affected flights, instead of the typical 3-5 Mbps.
By working with affected customers, we identified the root cause and performed system maintenance to restore normal service that rolled out to all customers during the period 2 pm to 5 pm on May 7, 2026. We are implementing corrective measures to prevent the issue from recurring, while undertaking a detailed review to improve system robustness over time.
What Happened?
- The immediate symptom in the incident was severely degraded video quality from drones, with a surface cause of reduced throughput. Customers reported degraded video quality and control latency, and our monitoring systems showed a clear drop in the average network utilization of around 33%, from about 3 Mbps to 2 Mbps. Some customers had a larger capacity drop and did not have sufficient network capacity to support flight.
- Critical flight safety functions, such as airspace deconfliction and Pathfinder, were not affected. Skydio Autonomy’s obstacle avoidance software runs on the drone itself and functioned normally throughout affected flights. Even when there is control lag, obstacle avoidance will prevent collisions. Likewise, the Pathfinder functions for autonomous flight were designed to work with lost communications and functioned as designed. No safety incidents were observed or reported during this time as a result of the disruption to connectivity.
- The root cause of the problem is that a timer in the Skydio Cloud software designed to track all flight-related events could overflow after a certain amount of system uptime. Until this incident, that condition had never occurred. When the overflow occurred, internal timers reset to zero and we could no longer estimate network latency accurately. This led to our software estimating that there was much less bandwidth available than reality, and video being throttled from the drone.
How Did We Respond?
- Customer reports. Between 3:30 and 4 pm Pacific time on the afternoon of May 6, we received reports from two customers on the East Coast of degraded video from drone flights. On the morning of May 7, we received several additional reports from customers throughout the US and internationally, and our internal processes escalated the issue because multiple customers were unable to fly missions as normal.
- Creation of the “War Room” and incident publication. At 11 am, a core team was established to work on the problem. When the problem could not be immediately resolved, we published an update to the Skydio Cloud status page at 12 pm to let all of our customers know that we were aware of and working on the issue.
- Root Cause Analysis. To understand the issue, the war room analyzed flight logs from affected customers, determined that there were no recent software changes that would affect network performance, conducted internal flight tests, and tested multiple candidate “hot fixes”, which identified the timestamp comparisons as the source of the issue.
- Infrastructure Test and Rollout. Working with a few customers, we brought up new production cloud infrastructure for test flights, and a few affected customers performed joint live flight testing in a production environment. When these customer tests succeeded, we then rolled out infrastructure changes across the entire Skydio cloud from 2 pm - 5pm.
- Intensive Monitoring of Fixes. At 5:30 pm, we updated the incident to describe the fixes and switched into monitoring mode to validate that our customers were able to resume normal mission flying.
- Incident Closed. By 9 pm, we had sufficient monitoring data to indicate that our fixes had restored mission capability to all affected customers, and we switched our focus to implementing fixes to prevent the problem from recurring and searching our code base for similar problems that can be proactively addressed.
How are we preventing issues like this in the future?
Starting immediately, we will be making changes to prevent recurrence of this problem. Our postmortem process involves both making improvements that will prevent this class of issues in the future, but also improve detection and time to resolution in the case that an incident does happen.
- Targeted code change. Our team has already made and deployed code changes with automated regression tests to address the specific issue found. We will be modifying code that uses timestamps to mitigate any effects caused by overflows.
- Infrastructure operational changes. Now that we understand the root cause, we have modified our cloud operational procedures to ensure that our infrastructure will not be vulnerable to this class of time-related issue in the future.
- Code audit. Our team is auditing the entire code base for similar classes of issues, using a combination of human-driven and AI-powered code audits, and will be deploying fixes as needed.
- Observability. While we have extensive monitoring and alerting to understand the customer experience, this event illuminated a gap in our systems. We were too slow to identify this issue in our own telemetry and relied too heavily on corroborating customer reports. We are investing in additional real-time monitors that will alert us to diminishing customer experiences sooner, so that we can resolve issues faster, or even prevent them from impacting customers at all.
Final Remarks
We are committed to both transparency and continuous improvement. Future software releases will fully address the issues documented in this post. These changes occur on Skydio Cloud, and therefore, all customers on Skydio Cloud will automatically receive up-to-date software without needing to take any action.
In the past year, we’ve made significant investments and improvements to our infrastructure and instrumentation to help us identify and resolve issues like this. It is because of these improvements that we were able to resolve this incident on the timeline we did. However, it was still too long for the critical customers and industries we serve. We will use the lessons from this incident to keep improving both our technology and our internal systems to ensure we are delivering a highly reliable system you can count on when you need it.
We especially appreciate those customers who worked with us after reporting issues, even going so far as to be flight testers of our proposed fixes. As we were pinpointing the problem, these diagnostic flights were invaluable in our troubleshooting process by gathering real-world data to test our theories and fixes.
Finally, for our customers: We recognize that you choose to partner with us at Skydio because of the unique capabilities we provide. We are proud to support your missions and understand the criticality of your work, and we will continue to be transparent when we fumble the ball and relentlessly improve our products to help you do more for your customers and communities. As part of that, we continuously look for ways to improve the technical and engineering support that directly contributes to the success of your mission. We welcome and encourage feedback, and if you have questions or feedback about this incident and root cause analysis, please reach out to us through your customer success or account teams.