Where will Computer Vision be in 2020?

What is computer vision? It is the ability for a computer to use a camera and process the incoming image to discern something from it. We see it today in home security cameras that can setup zones to watch for, facial recognition to unlock devices, cameras that can find a face and focus on them, cars that are using cameras to help understand obstacles on the road, and social media that can recognize our friends. Where will that leave us in 2020?

By 2020 our sensors for capturing the visual world will improve and we will be able to capture a better visual representation of the world. There will be a budding area of light field video captures and computer vision algorithms will be applied to it to better make sense of what the sensors are seeing. Light field capture differs from traditional image capture in that it is able to capture light coming in from multiple directions at every capture point. As sensors capture more visual data about the world we will continue to push the envelope of what single systems, especially mobile systems, can process on their own. Additionally even as sensors are better able to capture our visual world computer vision is limited to a visual representation of the world and there are many other ways of sensing the world. To overcome the limits of sensors and processors we will see greater uses of combined sensors as well as systems that rely on external systems for processing.

Traditionally combining multiple sensors on a system is known as sensor fusion. An example of sensor fusion with computer vision today is the combination of a camera, accelerometer, gyroscope, and compass data to determine depth from an image. By 2020 I anticipate sensor fusion that extends beyond single systems. For example multiple devices will share their sensor data to a cloud based system that can then apply machine learning to make better sense of the surroundings. A single description or simulation can be computed by a cloud-based system that is then shared by multiple systems. Today we collect traffic and road condition data from multiple mobile devices and share traffic conditions between them. By 2020 we should be able to share visual, depth, acoustic, and other sensor data that is processed by a cloud-based system. The increase in sensor data from multiple systems will need to leverage machine learning to better understand and feed the multiple systems. In 2020 as the number of connected systems and systems with sensors increase, computer vision will be just one of the fundamental tools utilized by cloud-based systems that will feed a shared understanding of the world.