Commentary / Opinion

The Continuum: Big Data, Cloud & Internet of Things

Share this post:

If geospatial systems are to remain relevant in a fast-changing world, then data sources that go beyond imagery and maps must become part of the analyst’s armory. Big Data, its analytics in the Cloud, and finally, the Internet of Things are what the future holds.

Q: How many Big Data scientists does it take to screw in a lightbulb?
A: Just a minute. Let me run the algorithm.

Fast-changing, human-driven events — like the expansion of cities and creation of assets for transportation—are very vulnerable to old data. Anyone who’s been misled by car navigation systems can testify that digital road networks on devices are often out of date and missing new features. Thus, the need of the hour is for data delivery speed and crunching. Where does this data come from, and how can it be used in real-time (or near real-time) for decision-making?

Extracting business value

Source: IBM (click image for larger version)

Natural resources, as well as social, political and economic activities, have a strong bearing on the outcomes of such projects as urban growth, infrastructure building, and even a farmer’s decision to plant a specific crop.

Enter the world of Big Data, Big Data Analytics and Internet of Things.

More data is not always more intelligent data

“The rate at which we are generating data is rapidly outpacing our ability to analyze it,” says Dr. Patrick Wolfe, data scientist at the University College of London. “The trick here is to turn these massive data streams from a liability into a strength.” The extent to which we are missing extraordinarily valuable data-analytic opportunities is incredible: right now, only 0.5% of our information is analyzed. We have more data, but it is not always more intelligent data. Part of the problem with Big Data is it is not valuable until it is understood. “You have to start with a question and not with the data,” stresses Andreas Weigend, lecturer at UC Berkeley. “The fact that data gets collected is a good thing,” he adds, “but what we really need is to figure out what problems we can solve with it.”

The promise of Big Data is exciting. It improves sustainability by reducing power usage, and less use of resources means savings — $200 billion per year, according to one estimate. Chicago and New York City are now being called “smart cities” in the press for integrating Internet of Things (IoT) sensors with analytics to streamline spending and improve infrastructural efficiency.

All technologies are there to solve the world’s problems, which can scale from big to small applications. “There may be problems at the scale of the city’s infrastructure, and to make sure a city works more effectively and efficiently, it might require larger environment monitoring, like floods and climate; or it can focus down on the individual,” says Ed Parsons, geospatial technologist, Google.

Problem-solving

So how does technology make your life better? How does it save you a few minutes every day? How does it make you little bit happier in your life, dealing with the things you have to deal with? “We must be driven by user needs, saying that — here’s a problem that we can solve, and it might make just a small incremental gain, but that scaled across everyone on the planet makes a huge difference,” Parsons adds.

“Our world is ever-changing and fresh, and dynamic applications that are a combination of content, workflow, analytics and experience can be used in any area of application where we need to sense this change,” elaborates Atanu Sinha, director of Hexagon Geospatial, India & SAARC. Hexagon, for instance, already has Smart M.Apps to analyze green space, road areas, crime incidents, snow cover, forest burn ratio, iron oxide index in rocks, crop health, UAV data processing and so on.

Taner Kodanaz, director of DigitalGlobe, adds there is a large applicability in economic monitoring, supply chain and logistics fields, commodity trading markets, environmental research and monitoring, the shipping and maritime industry, forestry and agriculture, land management, real estate and real estate investment, and energy markets.

As location intelligence gets more or less relevant across industries, Big Data, in terms of consumer-generated data tightly integrated with location data, is driving marketing benefits. Advertising and marketing is a big area that benefits from spatial analytics. Tony Boobier, insurance leader for IBM UK’s EMEA Business Analytics, IBM highlights that weather forecasting uses data from sensors all over the world. Such forecasts can be used in the insurance sector, as well as for financial services in understanding the impact of assets and liabilities volatility. It can also be used in the retail sector to help understand the pattern of product sales at particular times of the year.

Geospatial Big Data

Big Data is characterized by five Vs: Volume, Velocity, Variety, Veracity and Value. While volume is easily understood, velocity, variety and veracity, as well as value, lie in the ability to take fast-moving data and convert it into something of value through analytics. Traditional geospatial data, which includes remotely sensed data, is structured and stored for analysis post facto in analytical systems like GIS. However, modern data, with useful geospatial content like photos, social media chats, video, voice and messages, now constitutes almost 80% of total data; but in its unstructured form, it cannot be used in conventional analytic systems like GIS, because the sheer volume far exceeds the data storage capacity available. It also has high velocity, but its veracity may require curation.

Infographic

Sinha substantiates this view when he says, “There was always a tussle between advancement and availability of technology in terms of how much and how fast can we capture, curate, manage, search, share, transfer, analyze and visualize versus the sheer amount, complexity and disparity of the available geospatial content.” Even today, despite vast increases in computing speeds and storage capacities, it is still true that our capacity to acquire geographic information, in orders of magnitude, is greater than our capacity to examine, visualize, analyze, or make sense of it.

“Today, datasets are available from satellites, UAVs, ground-based sensors, smartphones and social media in near real-time, offering the potential of almost immediate discoveries and predictions. So we can say that there is definitely velocity and variety in the geospatial data itself. However, this is not true for traditional GIS technologies, and hence there is a need to effectively make this data manageable and available,” he adds.

Kodanaz echoes the same sentiment: “Even if one only considered traditional satellite imagery products as solely encompassing geospatial Big Data (which I do not), the near-term future holds significant potential growth in both variety and velocity from both industry leaders, such as DigitalGlobe, and new entrants working feverishly to launch their own assets.” He goes on to add that in 2014, Digital Globe alone produced 70 TB of data per day compared to 600 TB produced by Facebook. If we add imagery data produced by other entities and to be produced by new entrants, then the total data velocity would be in excess of those produced by social media and other non-traditional sources.

The promise of Big Data

The promise of Big Data is exciting. It improves sustainability by reducing power usage, and less use of resources means savings — $200 billion per year.

Apart from this, everything from traditional GIS datasets — like roads, terrain maps, places of interest, boundaries, and transportation networks — to location information from mobile device movement, geo-tagged social media content created by users, UAS/UAV photos/videos created by commercial or private drones, and IoT data from non-stationary devices could also be considered part of the geospatial Big Data family. By itself, remote sensing data from satellites, aerial, and UAS/UAV sensors captures a plethora of content every day, representing a significant variety of geospatial Big Data.

Parsons supports this view through an example. Google collects people’s movements anonymously and analyzes them to show emerging patterns. For example, if you look at a business in Google Maps, like a hotel or a restaurant, Google shows a little graph indicating when that business is busiest; they do that by analyzing the content people contribute. “It is a simple process, but it is analytics at scale and I think that is where the geospatial industry can add particular value, because we can do these large-scale pieces of analysis viewing things through that sort of geographic lens.”

Ron Bisio, vice president of Trimble Geospatial, takes a more traditional view: “From the GIS viewpoint, Big Data describes datasets that are so large—both in volume and complexity—that they require advanced tools and skills for management, processing and analysis.”

Bhoopathi Rapolu, head of Analytics for EMEA, Cyient UK, points out that 80% of corporate data is spatially relevant. “We have been using this data without the spatial context all along, but now that we have enough technology to bring the spatial component and tightly integrate with the corporate dataset, we can make spatial sense out of it. So, with that, we can see that the broader insight is being generated with the spatial element.”

Parsons thinks it is about things that change in time and space. “It is about geography. Geography is interested in what changes in the world, and that is the distribution of things over space and the distribution of those things over space and time.” So with more detail, we move from being a static viewer of the world to a viewer of the world with a higher temporal resolution and cadence. “We have heard a lot about the potential of daily satellite coverage. We think about that, and the combination of real-time location of people and facilities, then: that is where the real advances are going to be made. It is going to be that temporal aspect that drives it.”

According to Bisio, “By combining multiple datasets, it is possible to develop 4D models that enable users to view conditions over time.” This approach enables users to detect and measure changes and provides important benefits to applications such as construction, earthworks, agriculture and land administration. A fifth dimension, cost, also can be included with spatial information. The resulting model helps users to improve efficiency and cost effectiveness for asset deployment.

To effectively use such data, we need real-time or near real-time engines that analyze the data on the fly, to curate the data and establish patterns, which are stored and used with conventional geospatial structured data. As an analogy, consider a conventional GIS that applies different analytics on a stored database to realize meaningful reports. With Big Data, the stored database consists of analytic modules rather than data. These modules work simultaneously on a variety of data streams and deliver meaningful patterns.

Handling structured Big Data

Sinha thinks maps of the future need to be fresh, portable, dynamic and logical. While Hexagon provides solutions in their products via their ECW (Enhanced Compression Wavelet) technology, Imagine and Geomedia both can effectively manage the volume of Big Data because their enterprise and Cloud offerings effectively manage the sheer velocity and variety contained within the data sets.

We need real-time or near real-time engines that analyze the data on the fly, to curate the data and establish patterns, which are stored and used with conventional geospatial structured data.

Kodanaz feels communications  and storage speed, storage access, and web-based services (APIs) to access the data are all improving at dynamic rates. GPU processing approaches allow users to access areas of interest versus the current model of accessing the entire binary file, even if the area of interest represents a small portion of the overall file. These methods, as well as machine learning approaches that operate on raw data within hours of acquisition, shortening to near real-time speeds in the coming years, are all being applied to move geospatial Big Data to real-time or near real-time access.

Curating unstructured data

Sadly, a very common problem with traditional and new data sources is quality, or the lack of it.

“That is where validity or veracity of geospatial Big Data comes in, but unfortunately the fourth V—validity—would be a property that geospatial Big Data lacks, which is often undocumented, lacking in metadata, and without clearly identified provenance,” stresses Sinha.

Automated machine learning approaches are used to identify and categorize objects within images as a simplistic example of curation, points out Taner Kodanaz. This is sometimes referred to as “search space reduction” or “area reduction.” Semi-automated methods are used to determine the aesthetic benefits of an image, like cloud cover, image quality, atmospheric distortions, etc. Finally, manual review methods are typically used by analysts to identify the best images to address specific use cases and may include leveraging the automated and semi-automated processes, as well.

“There’s a lot of work we need to do on natural language processing, on greater understanding of semantics, to try and pull out the meaning from those pieces of social media,” says Parsons. But social media also represents a more human view of the world. If you are talking on social media, you talk much more in terms of places than spaces. You don’t see coordinates expressed in tweets or in Facebook statuses; you see place names. “A better understanding of how we as humans interact, create place names and define space — that is a really interesting insight, and I think a lot of that is driven from this unstructured data. I think we do need it in the systems that we have developed to better reflect how we as humans see the world around us.”

Big Data analytics and IoT

The Internet of Things can be improved when it comes to streamlining operations where interactions between machines and machines (M2M) and machines and humans (M2H) occur. A case in point is the concept of a smart city. In such a city, sensors can control traffic lights as well as detect traffic jams to alert authorities like the police. Sensors can also alert municipal waste management services when refuse bins become full and need replacement.

The technology for such intelligent systems is already available, but adoption is slow because the concept, as visualized by vendors, involves connecting all areas of city management to a centralized data infrastructure. Jascha Franklin-Hodge, Boston’s Chief Information Officer, thinks the movement is overhyped and that more targeted, less centralized IoT Big Data applications can be more effective.

The interesting intersection of IoT and geospatial Big Data lies in the reality of sensors on the ground coupled with near real-time modeling of visible spectrum data gathered from remote sensing. In other words,” says Kodanaz, “from micro to macro tied together to describe the world in ways never before possible.”.

The data size is humungous, and as Rapolu puts it, we are not even joining the dots: we are creating the dots in bits and pieces to understand the world. As intelligent applications connect with different databases, we will see the IoT emerge. “… It’s about connecting the entire intelligent things and then making location sense out of it.”

Boobier takes a different view. “I think the individual and organizations and perhaps governments also will play certain restrictions on the amount of information, which is commonly available. Organizations already tend to turn up the security levels around the level of information that is available to employees. We are talking in terms of analytics being democratized. The democratization of information is one of the big ethical questions I think of in the Big Data environment going forward.”

In the end, Big Data, Cloud and the Internet of Things are all parts of a continuum. It is hard to think about the Internet of Things without thinking about the Cloud, and it is hard to think about the Cloud without thinking about the analytics.

“It goes without saying that if you are going to have lots and lots of devices creating data, that data is going to exist in the Cloud, and because you have got large volumes of data, the only way to analyze them is to use analytical models—identify the current state of the world, but then also predict it, saying ‘If we see this pattern emerging, this is what we can expect to happen’,” sums up Parsons.

Learn more

To learn more about how IBM Watson IoT can help you obtain greater efficiency through smarter asset management, visit our website or contact a representative today. And read more about how Maximo Spatial can help you dynamically visualize asset relationships with geographic information.

This article originally appeared on Geospatial World.

Add Comment
2 Comments

Leave a Reply

Your email address will not be published.Required fields are marked *


Naj

Article is interesting. Thank you. Your infographics look interesting too but you failed to provide a link to a larger more legible size!

Reply

    Scott Stockwell

    Thanks Naj – you can see a larger version of the graphic by clicking on it. I’ve updated the post to let other readers know – thanks for pointing it out.

    Reply
More Commentary / Opinion Stories
By Jen Clark on August 1, 2017

Connected homes: are we there yet?

Smart homes are the new black. Or so the home appliance market would have us believe. Smart fridges, smart security systems, smart TVs, smart thermostats – they’re all here to stay, and they’re breeding like rabbits. But does a hefty number of connected devices make for a truly connected home? What are the barriers to […]

Continue reading