Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs — and take actions or make recommendations based on that information.
If artificial intelligence enables computers to think, computer vision enables them to see, observe and understand.
Computer vision works much the same as human vision, except humans have a head start. Human sight has the advantage of lifetimes of context to train itself to tell objects apart, how far away they are, whether they are moving and much more. Computer vision trains machines to perform these functions but has to do it in a much shorter period of time — using cameras, data and algorithms rather than retinas, optic nerves and a visual cortex.
To catch up, computer vision needs data — lots of data. It runs analyses of the data over and over until it discerns distinctions and ultimately recognize images. For example, to train a computer to recognize apples, it needs to be fed vast quantities of apple images and apple-related items to learn the differences and recognize an apple.
Two essential technologies are used to accomplish this: a type of machine learning called deep learning and a type neural network called a convolutional neural network (CNN).
Machine learning uses algorithmic models that enable a computer to teach itself about the context of visual data. If enough data is fed through the model, the computer will “look” at the data and teach itself to tell one image from another. Algorithms enable the machine to learn by itself, rather than a programmer programming it to recognize an image.
The CNN helps the machine learning or deep learning model “look” by breaking images down into pixels that are given tags or labels. It uses the labels to perform convolutions (a mathematical operation on two functions to produce a third function) and makes predictions about what it is “seeing.” It runs convolutions and checks the accuracy of its predictions in a series of iterations until the predictions start to come true. It is now recognizing or seeing images in a way similar to humans.
Much like a human making out an image at a distance, the CNN first discerns hard edges and simple shapes, then fills in information as it runs iterations of its predictions.
CNN is used to understand single images. A recurrent neural network (RNN) is used in a very similar way for video applications to help computers understand how pictures in a series of frames are related to one another.
Two areas that are related to but different from computer vision are image processing and image analysis. These fields focus more on enhancing the clarity or other aspects of an image for human review or manipulation, rather than recognizing and understanding the image itself.(1)
Another term sometimes incorrectly associated with computer vision is computer vision syndrome, an eye-strain condition resulting from prolonged focusing on a computer screen.
Computer vision is used today in industries ranging from agriculture to automotive — and the market is growing. It’s expected to reach USD 48.6 billion by 2022, according to Forbes.