Mobile apps that use visual search engines are becoming more prominent. Use cases are increasing in such industries as defense, insurance, health care, and fashion as these technologies mature. The ability to take a picture and have an algorithm recognize the objects in that image requires a data store for the algorithm to perform comparisons, and these data stores are increasingly being located in the cloud. This article provides an understanding of visual search engine algorithms, how they use data stores, how to connect your application to these data stores, and the pros and cons of choosing specific vendor solutions.
What are visual search engines?
Although visual search engines are not limited to mobile devices (smart phones or tablets), they are the most prevalent endpoint or user interface, because mobile devices these days have built-in cameras. With these cameras, mobile apps can interact with the images taken with these cameras asynchronously.
By using visual search engines, users can take a two-dimensional picture and use the "search" algorithm to determine whether the image contains recognized objects. These algorithms are deployed in mobile apps via software connectors called applications programming interfaces (APIs). APIs from visual search engine providers like IQ Engines enable programmers to create their own apps using the visual search engine technology.
Some visual search engine providers have mobile apps prebuilt for use, such as Google Goggles. However, Google has yet to deploy an API for Goggles, which means that limited use cases exist for deploying this app to industry. A visual search engine is also available from an Italian company called Macroglossa. Little is known of the deployment or use of Macroglossa in industry, but it is an alternative to IQ Engines and Google. Regardless of the visual search engine used, they all work with common dynamics and processes.
To use IQ Engines VisionIQ as an example, the user first takes a picture using the mobile device, which triggers the client–side visual search process when an app is loaded that uses VisionIQ. Then, the visual search engine API service calls the server–side software, which in turn references business rules from training conducted on what is a cloud–based data store for a sanity check. Finally, for public data stores, crowdsourcing is enabled to allow the public to assist in refining the search algorithm. Figure 1 demonstrates this process.
Figure 1. The visual search engine process as defined by IQ Engines VisionIQ
For the visual search engine algorithm to recognize images, a user must first train the system using a combination of the image and supporting attributes that are then either uploaded to a cloud–based data store or placed in the path of a web crawler. Google Goggles uses the web crawler route. To use Goggles, you must first render the images and the supporting attributes or metadata into HTML. VisionIQ uses cloud–based data stores to which the developers and technical staff upload the content via Representational State Transfer (REST)-ful APIs. Regardless of how the images and the metadata associated with those images (color, texture, date, brand) are scanned or uploaded, the algorithm must analyze and continuously refine the data. Figure 2 provides an example of an image with the associated attributes.
Figure 2. The VisionIQ data model
Note that the search algorithm and attributes must be refined iteratively over the course of the allotted schedule to allow for a proper success rate. After the system has been trained to an acceptable accuracy rate, it's ready to be deployed for user acceptance testing (UAT). UAT enables the developers and the organization to refine the images and the search algorithm so that the app is more useful in real-world scenarios where there are alternating ambient light, unclear screens, and multiple angles. As the cloud is often used to store the data for training these systems, it's prudent to go over what's meant by the "cloud".
Cloud and visual search converge
Now, let me explain how cloud computing and visual search engines work together.
The structure of cloud
Cloud computing is a rebranding, if you will, of the old application service provider (ASP) model. However, true cloud offerings have additional nuances from the ASP model—namely, resource pooling, on-demand utilization, multitenancy, and rapid elasticity. Cloud service providers (CSPs) offer services based on different cloud service and deployment models.
Cloud service models are methods in which an organization can use the cloud predicated on the business requirements, such as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS). In contrast, cloud deployment models include public, private, hybrid, and community clouds. Most CSPs have public clouds, although some leverage external, private clouds, as well. Most large organizations leverage the cloud internally as a private cloud. Hybrid clouds leverage both public and private cloud solutions. A community cloud leverages resource pooling to a large extent, one example being a number of schools in a school district that share server resources for information processing.
Most of the cloud deployment and service models used for visual search engines leverage multiple models of each. For example, when IQ Engines' customers opt out of crowdsourcing, they are using a private, IaaS, and PaaS cloud model. This model leverages closed storage pools for its IaaS-based data stores and PaaS-based APIs to be called by mobile apps. However, when consumers of IQ Engines' offerings enable public crowdsourcing of their data store, this deployment becomes a public cloud. The classification of Google Goggles is as a public SaaS solution, as it does not have an API yet and allows crowdsourcing.
Visual search engines and cloud computing
Mobile apps that use visual search engines frequently have images for training the system uploaded to cloud–based data stores via APIs. The most prevalent API format these days is REST. REST APIs follow the service–oriented architecture model and therefore are often used by web services-based software architectures for integration purposes. Because they are web based, REST APIs are mostly used for thin-client software applications that are consumed via an Internet browser or web server. As mentioned, REST APIs are a relatively new technology requiring a specific, yet flexible API format.
Cloud computing services by nature are distributed; so, the use of APIs—particularly web-based RESTful APIs—by consumers is a logical solution for the remote consumption of services. This solution increases in relevance as requirements need to be satisfied for a remote and mobile workforce. Examples include multiple claims–based departments for an insurance company leveraging a common cloud-based data store for business processing. In this example, insurance adjustors use mobile apps with VisionIQ REST APIs to take, tag, and recognize pictures of water-damaged homes after a natural disaster. These pictures can be used to iteratively refine the algorithm by being stored in a centralized, private cloud data store. Internal, private crowdsourcing can then be used to further the accuracy of the system in identifying water damage versus wear and tear or neglect.
Another use case is the deployment of mobile apps that use visual search engines and private cloud data stores for medical diagnosis services. These services use an image of a scratch, cut, mark, bruise, or similar to complement several attributes the patient provides in answers to questions included in the app to provide an initial diagnosis. This information can be coupled with a telephone call by a health care professional to verify the diagnosis. Such an app can exchange data with a health care systems electronic health record system via REST APIs for complete automation.
Finally, for intelligence, legal, and law enforcement purposes, these apps will expedite the recognition of people of interest and evidence in cases. Facial recognition algorithms are not new, and it is assured that mobile apps with this functionality for these groups exist, although not on the level and proliferation they will in the future. It is not beyond the realm of possibility that crime scene investigators could use mobile apps for most of their work in several years.
As mobile apps mature, visual search engine technology will go along with it. APIs using contemporary formats like REST will drive both mobile and cloud interoperability for enhanced use cases of visual search engine technologies, especially when Google releases its API for combing custom mobile apps with Goggles. With the growth of visual search engines, cloud–based data stores will become more prevalent.
These stores will be called by APIs, and usage data will be used to further refine the search algorithm, which is the key to the success for this technology. This refinement can be done by the providers themselves or with the inclusion of crowdsourcing. As these algorithms are refined, the manner in which they are used will grow exponentially. Although this article includes several use cases, it's not a stretch to imagine that many more use cases will be introduced over time. The proliferation of mobile devices with cameras has brought about a new era in technology that, when refined, will change the way we live our lives and do business.
- Learn more about visual search engines in the Top Visual Search Engines blog entry.
- Learn more about IQ Engines' Training API and data model.
- In the developerWorks cloud developer resources, discover and share knowledge and experience of application and services developers building their projects for cloud deployment.
- Follow developerWorks on Twitter.
- Watch developerWorks demos ranging from product installation and setup demos for beginners to advanced functionality for experienced developers.
- Learn more about IQ Engines visual search engines.
- Learn more about Google Goggles.
- Learn more about the Macroglossa visual search engine.
Get products and technologies
- Access IBM SmartCloud Enterprise.
- Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the SOA Sandbox learning how to implement service-oriented architecture efficiently.
- Get involved in the developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.