December 17 2019

Inside the mind of the Skydio 2


Every day we get comments about the Skydio 2 being magic, technology from the future, or a total fake. The truth is that this magic is the result of a small group of world-class engineers investing years of research into building the AI that powers this drone. Our dedicated R&D combined with full vertical integration of the hardware is what makes Skydio 2 far ahead of anything in industry or academia in trustworthy autonomous flight.

That is why the videos blow people away, and also why the Skydio 2 is not just the best flying tool for outdoor enthusiasts, but also for inspection, mapping, monitoring, and first responders. The technology for mobile robots has just reached the tipping point to transform major industries and save lives, all on affordable consumer hardware.

In this post we dive in to the Skydio Autonomy Engine — how the Skydio 2 sees the world and decides how to fly and film — by walking through an example. In this clip, watch how Skydio 2 keeps a cinematic framing of the biker while dodging obstacles at high speeds downhill. To do this, it has to estimate its own motion, build an obstacle map, estimate the biker’s path, and handle the aerodynamics of fast flight, all in real time onboard the drone.

Video Placeholder

Visual sensing

Everything the autonomy engine does comes from processing visual data. In addition to the user camera, the Skydio 2 has 45 megapixels of visual sensing with six 4K navigation cameras, each of which has a super-fisheye lens with a 200° field-of-view.

If you’ve ever piloted a drone, you know how stressful it can be to navigate with only a forward facing camera — every sideways or backward motion is a potential crash. The Skydio 2 was built from the ground up to enable 360 vision, so that the autonomy engine can confidently guide Skydio 2 in any direction. Three of the navigation cameras are pointed up and three are pointed down, so not only can Skydio 2 see in every direction at once, it’s got triple coverage to minimize the chance of missing anything.

Video Placeholder

Humans are amazing at naturally understanding images, but to a computer they just start as a bunch of pixel values. Breakthroughs in computer vision and deep learning only recently made it possible for robots to understand the world from those pixel values.

The first step is to understand the camera calibrations — how 2D pixels in the image map to directions in 3D. Note how the fisheye images are very distorted — trees on one side are upside down from the other. To undo this, we build a detailed model of the warping and then calibrate them both in the factory and during every flight. We even calibrate the lens distortion parameters online because they change significantly with temperature.

In addition, the cameras are so wide that they can see many parts of the drone itself. See if you can spot the battery, propellers, and the protective camera fins. These parts of the images need to be ignored, since they aren’t useful data about the scene. We call them invalid regions.

By applying our camera calibrations and masking out invalid regions, we can combine the cameras to create a 360 view of the scene.

Video Placeholder

This isn’t exactly how our algorithms process the images, but it shows how we use geometry to combine information between multiple cameras. The user camera field of view is highlighted in yellow here to show how narrow it is in comparison, and give a sense for how much more effectively seeing everywhere at once enables Skydio 2 to fly.

Building a map

In order for Skydio 2 to navigate in unknown environments, it needs to estimate its own motion and build a local map of the scene from its camera data. These are two of the most fundamental robotics problems, and an area where we’ve invested years of research to push the limits.

Camera data is incredibly rich but also presents very difficult challenges like thin objects, sun glare, high dynamic range, motion blur, reflections, textureless surfaces, vibrations, dirt, smudges, and fog. Not only that, but our navigation cameras are rolling shutter, meaning each row of the image was captured at a slightly different time. These realities are what make the gap so huge between a demo and production-ready autonomy.

At the core, we handle these challenges by carefully combining 3D geometric modeling with deep learning. By integrating our knowledge of the drone, its cameras, and physics into our neural networks, we’re able to get smaller networks that run at high frame rates onboard and generalize better to new scenarios. Our internal research team is constantly using data from tens of thousands of flights to learn to fly safely in more difficult environments, through tighter gaps, and at higher speeds than ever before.

Video Placeholder

In June 2018, we believe we were the first to ever fly an obstacle avoidance system based on deep learning in a complex environment. Now that system has shipped with the Skydio 2 and produces 3D maps that look more like they come from a LIDAR system (too costly and bulky to fly) than from visual sensors.

Video Placeholder

Tracking and filming

Once the Skydio 2 has a 3D map, it needs to track the biker, estimate his trajectory, and plan a cinematic shot while dodging obstacles. It’s critical that this happens with low latency so that the drone can react quickly at high speeds to new obstacles and motions of the subject.

In order to do the best job of staying locked on to one subject, we track multiple nearby objects so we have the most information to discriminate between the possible matches. We estimate each object’s position, trajectory, and appearance again using a combination of geometric reasoning and deep networks, then we solve an optimization problem that decides where each object has moved in every new image frame.

Video Placeholder

When using the Skydio Beacon accessory, the drone additionally receives a GPS signal which is blended with visual tracking to provide a seamless tracking experience — when the subject goes through tight gaps that the Skydio 2 can’t safely navigate, it will catch back up and re-acquire a visual lock. We’ve worked hard to smoothly blend between visual and GPS tracking so the footage stays cinematic even during the transitions.

The most important signal for getting cinematic footage is not where the subject is, but where the subject is going. Our motion planner considers the future motion of the subject and decides on a flight plan to get the best shot given the nearby environment and the high level settings chosen by the user.

Video Placeholder

At a rate of 500 times per second, the autonomy engine refines its flight plan by considering over forty objectives that balance smooth flight, effective tracking, image framing, obstacles, aerodynamics, power constraints, and many other factors. Since R1, we’ve completely rewritten our motion planning and controls system to make this new algorithm possible in real-time, which leads to much more intelligent and holistic decision making.

Into the wild

When technology makes a large leap forward, it can look like magic or draw skepticism based on a history of false promises that didn’t actually work. Skydio 2 sets a new bar for intelligent autonomy and unlocks a new category of applications that were previously inaccessible. However, we’re still just beginning the journey to smarter, smaller, faster, and safer flight.

We’re so excited to be shipping and we’re doing everything we can to get more Skydio 2 out into the world.

Skydio Autonomy is hiring engineers and researchers who want to accelerate the rate at which robotics breakthroughs have an impact in the real world. If you get excited about robust implementations of cutting edge algorithms, and you’re interested in having significant ownership of key initiatives as part of our small-but-incredibly-powerful team, we’d love to hear from you.