Artificial insect intelligence and vision for nano drones – how many pixels do you actually need?
My long-time interest has been in developing insect-type vision for small drones. Properly implemented, I believe that such vision hardware and algorithms will allow small drones to fly safely amidst clutter and obstacles. Of course, I have been following with great interest other approaches to providing autonomy to small drones. Like almost everyone else, I am in awe at the achievement of Skydio- nice work!
My own work, though, has been to implement similar types of autonomy on much smaller “nano” scale platforms. These are tiny sparrow-sized drones that can fit in your hand and weigh just tens of grams. This is well under the 250-gram threshold the FAA uses to determine if a drone needs to be registered, and certainly much smaller than most commercially available drones. Nano drones have some fantastic advantages- they are small, stealthy, easy to carry, and (in my opinion) inherently safe. They also fit into much tinier spaces than larger drones and can thus get closer to objects that might be inspected.
That small size, however, does not lend itself well to carrying lots of on-board processing, say from a GPU single-board computer. The entire mass of one of my drones (vision and all) weighs less than available GPU boards (especially once you add the required heat sink!). Until single-digit gram GPU modules (inclusive of everything but the battery) are available, I am stuck with much more modest processing levels, say from advanced microcontrollers capable of hundreds of MIPS (million instructions per second) rather than the Teraflops you can get from GPUs. Given most contemporary approaches to vision-based autonomy use VGA- or similar-resolution cameras to acquire imagery and GPUs to process this imagery, you might think implementing vision on a nano drone is not feasible.
Well, it turns out implementing vision on nano drones is quite doable, even without a GPU. In past work, I’ve found you can do quite a lot with just a few hundred to a few thousand pixels. I’ll get into examples below, but first let’s consider what types of solutions flying insects have evolved. If you begin a study on insect vision systems, you will notice two things right away- First, insect vision systems tend to be omnidirectional. Their only blind spots tend to be directly behind them, blocked by the insect’s thorax. Second, insect vision systems have what we would think of as a very low resolution. An agile dragonfly may have about 30,000 photoreceptors (nature’s equivalent of pixels), almost three orders of magnitude less than the camera in your smart phone. And the lowly fruit fly? Only about 800 photoreceptors!
How do insects see the world with such low resolution? Much like contemporary drones, they make use of optical flow. Unlike contemporary drones, which generally use a single camera mounted on the bottom of the drone (to measure lateral drift or motion), insect vision systems measure optical flow in all directions. If the optical flow sensing of a contemporary drone is analogous to one optical mouse looking down, an insect vision system is analogous to hundreds or thousands of optical mice aimed different directions to cover every part of the visual field.
The vision systems of flying insects also include neurons that are tuned to respond to global optical flow patterns that, it is believed, contribute to stability and obstacle avoidance. Imagine yourself an insect ascending in height- the optical flow all around you will be downward as all the objects around you appear to descend relative to you. You can imagine similar global optical flow patterns as you move forward, and yet others as you turn in place, and yet other expanding patterns if you are on a collision course with a wall.
Another trick performed by insects is that when they fly, they make purposeful flight trajectories that cause optical flow patterns to appear in a predictable manner when obstacles are present. Next time you are outside, pay attention to a wasp or bee flying near flowers or their nest- you will see they tend to zig-zag left and right, as if they were clumsy or drunk. This is not clumsiness- when they move sideways, any objects in front of them makes clear, distinct optical flow patterns that not only reveals the presence of what is there but it’s shape. They do this without relying on stereo vision. Essentially you can say flying insects use time to make up for lack of spatial resolution.
Such purposeful flight trajectories can be combined with optical flow perception to implement “stratagems” or holistic strategies that allow the insect to achieve a behavior. For example, an insect can fly down the center of a tunnel by keeping the left and right optical flow constant. An insect can avoid an obstacle by steering away from regions with high optical flow. Biologists have identified a number of different flight control “stratagems”, essentially holistic combinations of flight paths, resulting optical flow patterns, and responses thereto that let the insect perform some sort of safe flight maneuver.
In my work over the past two decades, I have been implementing such artificial insect vision stratagems including flying them on actual drones. At Centeye we took an integrated approach to this (the subject of a future article)- We designed camera or “vision chips” with insect-inspired pixel layouts and analog processing circuitry, matching lenses, and small circuit boards with processors that operate these vision chips. We then wrote matching optical flow and control algorithms and tested them in flight. Nobody at Centeye is just a hardware engineer or just a software engineer- the same group of minds designed all the above, allowing for a holistic and well-integrated implementation. The result was robust laboratory-grade demonstrations of different vision-based control behaviors at known pixel resolutions and processor throughputs. See the list below for specific examples. This list focuses on early implementations, most from a decade or more ago, to emphasize what can be performed with more limited resources. We also include one more recent example using stereo vision, to show that even stereo depth perception can be performed with limited resolution.
Altitude hold (2001), 16-88 pixels, 1-4 MIPS: Have a fixed-wing drone hold its altitude above ground by measuring the optical flow in the downward direction. Video links: https://youtu.be/IYlCDDtSkG8 and https://youtu.be/X6n7VeU-m_o
Avoid large obstacles (2003), 264 pixels, 30 MIPS total: Have a fixed-wing drone avoid obstacles by turning away from directions with high optical flow. Video link: https://youtu.be/qxrM8KQlv-0
Avoid a large cable (2010), 128 pixels, 60 MIPS: Have a rotary-wing drone avoid a horizontal cable by traveling forward in an up-down serpentine path.
Yaw control (2009), 8 pixels, 20 MIPS (overkill): Visually stabilize yaw angle of a coaxial helicopter without a gyro. Video link: https://youtu.be/AoKQmF13Cb8
Hover in place (2009), 250-1024 pixels, 20-32 MIPS: Have a rotary-wing drone hover in place and move in response to changing set points using omnidirectional optical flow. Video links: https://youtu.be/pwjUcFQ9b3A and https://youtu.be/tvtFc49mzgY
Hover and obstacle avoidance (2015), 6400 pixels, 180 MIPS: Have a nano quadrotor hover in place, move in response to changing set points, and avoid obstacles. Video link: https://youtu.be/xXEyCZkunoc
What we see is that a wide variety of control stratagems can be implemented with just a few hundred pixels and with processor throughputs orders of magnitude less than a contemporary GPU. Even omnidirectional stereo vision was implemented with just thousands of pixels. Most notable was yaw control, which was performed with just eight pixels! The above are not academic simulations- they are solid existence proofs that these behaviors can be implemented in the stated resolutions. The listed demonstrations did not achieve the reliability of flying insects, but then again Nature, by evolving populations of insects in the quadrillions over 100 million years, can do more than a tiny group of engineers over a few years.
There are several implications of all this. First, it would seem that the race for more pixel resolution that the image sensor industry seems to be pursuing is not a panacea. Sure- more pixels may yield a higher spatial resolution, bringing out details missed by coarser images, but at the cost of producing much more raw data than what is needed, and certainly much more than what is easily processed if you don’t have a GPU at your disposal!
Second, this begs the question- are GPUs really the cure-all for all situations in which processing throughput is a bottleneck? Don’t get me wrong- I love the idea of having a few teraops to play with. But is this always necessary? For many applications, even those involving vision-based control of a drone, perhaps we are better off grabbing just fewer pixels and using a simpler but better tuned algorithm.
Personally, I am a big fan of the so-called 80/20 principle, which here implies that 80% of the information we can use comes from just 20% of the pixel information. The 80/20 principle is recursive- the top 4% may provide 64% of the value, and taken to an extreme the top fraction of a percent of the pixels or other computational elements of a vision system will still provide within an order of magnitude the information as that of the original set. It might seem like information is thrown away, until you realize that it is much easier to process a thousand pixels than a million. I wonder what other implications there are of this to machine vision and to artificial intelligence in general…
Third, this is very good news for nano drones, or even future insect-sized “pico” drones- if we just need a few thousand pixels and a hundreds of MIPS of processing throughput, current semiconductor processes will allow us to make a vision system within a small fraction of a gram that supports this. Of course, we need the RIGHT thousand pixels and the RIGHT algorithms!
Thank You for indulging me in this. Please let me know what you think.