Computer Vision - IBM Blog

Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs — and take actions or make recommendations based on that information. If AI enables computers to think, computer vision enables them to see, observe and understand. Computer Vision works much the same as human vision, except humans have a head start. Human sight has the advantage of lifetimes of context to train how to tell objects apart, how far away they are, whether they are moving and whether there is something wrong in an image.

Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs, and based on those inputs, it can take action. Computer vision trains machines to perform these functions, but it has to do it in much less time with cameras, data and algorithms rather than retinas, optic nerves and a visual cortex.

Computer vision is used in industries ranging from energy and utilities to manufacturing and automotive – and the market is continuing to grow.

Real-world applications demonstrate how important computer vision is to endeavors in business, entertainment, transportation, healthcare and everyday life. A key driver for the growth of these applications is the flood of visual information flowing from smartphones, security systems, traffic cameras and other visually instrumented devices. This data could play a major role in operations across industries, but today goes unused. The information creates a test bed to train computer vision applications and a launchpad for them to become part of a range of human activities:

Working of Computer Vision

Computer vision needs lots of data. It runs analyses of data over and over until it discerns distinctions and ultimately recognize images. For example, to train a computer to recognize automobile tires, it needs to be fed vast quantities of tire images and tire-related items to learn the differences and recognize a tire, especially one with no defects.

Two essential technologies are used to accomplish this: a type of machine learning called deep learning and a convolutional neural network (CNN).

Machine learning uses algorithmic models that enable a computer to teach itself about the context of visual data. If enough data is fed through the model, the computer will “look” at the data and teach itself to tell one image from another. Algorithms enable the machine to learn by itself, rather than someone programming it to recognize an image.

Deep learning is a subset of machine learning where neural networks — algorithms inspired by the human brain — learn from large amounts of data. Deep learning algorithms perform a task repeatedly and gradually improve the outcome through deep layers that enable progressive learning. It’s part of a broader family of machine learning methods based on neural networks. With accelerated computational power and large data sets, deep learning algorithms are able to self-learn hidden patterns within data to make predictions.

Neural networks are a subset of machine learning, and they are at the heart of deep learning algorithms. They are comprised of node layers, containing an input layer, one or more hidden layers, and an output layer.

Convolution is a mathematical operation, where a function is “applied” in some manner to another function. Convolutions are really good at detecting simple structures in an image, and then putting those simple features together to construct even more complex features. Convolutional neural networks power image recognition and computer vision tasks. This ability to provide recommendations distinguishes it from image recognition tasks.

A CNN helps a machine learning or deep learning model “look” by breaking images down into pixels that are given tags or labels. It uses the labels to perform convolutions and makes predictions about what it is “seeing.” The neural network runs convolutions and checks the accuracy of its predictions in a series of iterations until the predictions start to come true. It is then recognizing or seeing images in a way similar to humans.

Computer Vision on AWS

Amazon Rekognition is a service that makes it easy and quick to add deep learning-based visual search and image classification to your applications. With Rekognition, you can detect objects, scenes, and faces in images. You just provide an image or video to the Amazon Rekognition API, and the service can identify objects, people, text, scenes, and activities. You can also search and compare faces, recognize celebrities, and identify inappropriate content.

Amazon Rekognition is based on the same proven, highly scalable, deep learning technology developed by Amazon’s computer vision scientists to analyze billions of images and videos daily. It requires no machine learning expertise to use. Amazon Rekognition includes a simple, easy-to-use API that can quickly analyze any image or video file that’s stored in Amazon S3 and build the analysis into any web, mobile or connected device application.

With Amazon Rekognition Custom Labels, you can identify the objects and scenes in images that are specific to your business needs. For example, you can build a model to classify specific machine parts on your assembly line or to detect unhealthy plants.

Integrated with AWS, Amazon Rekognition provides a fast, scalable, reliable and secure image recognition platform to help customers cost effectively gain fast insight and new revenue opportunities from their image library at the scale of their business.

Process Automation in Insurance Sector

Within the insurance sector, computer vision technology is currently being used to help improve claims processes. Computer vision enables insurance companies to expedite the claims settlement process by letting AI perform damage assessments using pictures, rather than in-person appraisals. This improves the overall customer experience, as policies can be priced more accurately and efficiently while claims can be settled in a timelier manner.

When an insured vehicle gets damaged in an accident the insurance company bears the cost of repair. Cost estimation is an intensive manual process and requires the experts from the body shop to evaluate the damage caused. The process is time consuming, increases the turnaround time for claim settlement and there is scope for human error as well.

Automating vehicle condition assessments to evaluate the damages leading to insurance claims, ensure a higher process consistency and increase the overall assessment efficiency.

IBMs Solution Architecture

Leveraging Cognitive Image Analytics, IBM’s CDAT is an analytical model that uses Advanced Computer Vision and Deep Neural Network-based techniques to assess the type and extent of damage incurred to the vehicle. This first-of-a-kind solution in the insurance industry integrates with a mobile app and clients’ back-end systems to provide a seamless user experience. IBM has successfully implemented this solution for clients in the Insurance sector.

Key objectives expected to be realized through this initiative

Accelerating the Coverage Decisions of the incoming claims
Reducing Leakages (Fraud, erroneous decisions, and subsequent payment/non-payment decisions)
Improving Customer Service by expediting the claims process and supporting payments earlier

The customer can upload photos of the damaged part through a mobile app. The AI engine analyzes the photos and within seconds generates a list of parts that need to be repaired or replaced. These parts are then searched in the historical claims database for the average cost of repair or replacement. In a few minutes, the total cost is displayed to the customer’s mobile app.

In the current solution IBM uses TensorFlow and Keras for image recognition and classification. TensorFlow is an open source library created by Google. It combines many models and algorithms, which enables users to develop deep neural network to identify and classify images. Keras is a high-level API that makes implementing the complex and powerful functions of TensorFlow easier.

IBMs Solution with AWS

Alternately IBM can also implement the solution using Amazons Rekognition service. Though there might not be a pre trained model for classifying the damaged parts of the vehicle, using Amazon SageMaker Ground Truth we can easily build a training dataset from unlabeled data. A customized computer vision model can be trained using Amazon Rekognition Custom Labels. Amazon Rekognition Custom Labels is an automated machine learning (AutoML) feature that enables you to train custom ML models for image analysis without requiring ML expertise. Additionally, Amazon S3 can be used as data repository to store the huge volumes of data required for training.

For a deeper analysis we can take into consideration other factors like incident location, weather conditions, activity on social media platforms… etc. to increase the decision accuracy. These services are accessible as individual APIs, can be called using AWS Lambda function, which can handle multiple 3^rd party API requests parallelly. By combining AWS Lambda with other AWS services, we can build powerful web applications that are highly scalable and easy to use.

Was this article helpful?

YesNo

Rebecca Carroll

More from Business transformation

5 steps for implementing change management in your organization

4 ways generative AI addresses manufacturing challenges

Business process management (BPM) examples

IBM Newsletters