IBM Analytics Engine and Watson Studio now support a wide variety of geospatial functions as native components of these services. 

The library—developed by IBM Research, which provides geospatial and temporal functions for various other IBM products—joins these two cloud services. With the introduction of this feature, Watson Studio (in its Spark environments) and Analytics Engine support complex geospatial functions and expand data science to location analytics.

Key features of spatiotemporal functions

  • Geodetic function support—all functions are accurate for all geometries without the need for projections, including large geometries like entire countries or hemispheres and geometries that are near the poles or the anti-meridian.
  • With Analytics Engine and Watson Studio Spark environment, all the geospatial functions are fully distributable and can take advantage of Spark’s native distributed processing capabilities. This improves overall performance.
  • Extensions of Spark distributed joins to performing spatial, temporal, and spatiotemporal joins.
  • Native geohash support for arbitrary geometries that can be used for simple aggregations and object storage spatial indexing, improving cloud storage retrieval.
  • SQL/MM extensions to Spark SQL, providing native SQL support for geospatial functions

Why geospatial support?

Location sensors have become first-class citizens in various devices—spanning phones, connected cars, and IoT sensor feeds. Fusing geographic feature data (e.g., zipcode polygons, address features) with these location sensor data is key to extracting and enriching with contextual information. The addition of such a location context to most of the enterprise problems provide valuable business insights.

Geospatial information, either by itself or in combination with traditional relational data, can help institutions and businesses do things like decide in which areas to provide services or determine the locations of possible markets. For example:

  • The manager of a county welfare district can verify which welfare applicants and recipients actually live within the area that the district services. This can be done by analyzing the geometry of the service area and the addresses of the applicants and recipients.
  • The owner of a restaurant chain wants to open new restaurants in nearby cities and needs the answer to such questions as: 
    • Where in these cities are concentrations of the types of people who typically frequent restaurants like mine? 
    • Where are the major highways? 
    • Where is the crime rate lowest? 
    • Where are competing restaurants located?
  • A business analyst in a health insurance company wants to determine if there are primary care providers within a 15-mile driving distance of each of their customers. A list of suggested health care providers can be suggested proactively and customized to the needs of the customers.

What does it do?

Our geospatial functions include, but are not limited to, the following:

  • Topological functions, per the DE-9IM standard
  • Metric functions, distance, azimuth, etc.
  • SQL/MM extensions to Spark SQL
  • Spatial indexing
  • WKT and GeoJSON support
  • Spatial, temporal, and spatiotemporal joins
  • Distributed spatial functions

Get started

This functionality is available in the Spark Python and Spark Scala environments on Watson Studio and Analytics Engine clusters. You can get started by going over the sample notebooks for Spatial and Spatial Index.


More from Analytics

Data science vs data analytics: Unpacking the differences

5 min read - Though you may encounter the terms “data science” and “data analytics” being used interchangeably in conversations or online, they refer to two distinctly different concepts. Data science is an area of expertise that combines many disciplines such as mathematics, computer science, software engineering and statistics. It focuses on data collection and management of large-scale structured and unstructured data for various academic and business applications. Meanwhile, data analytics is the act of examining datasets to extract value and find answers to…

Financial planning & budgeting: Navigating the Budgeting Paradox

5 min read - Budgeting, an essential pillar of financial planning for organizations, often presents a unique dilemma known as the “Budgeting Paradox.” Ideally, a budget should give the most accurate and timely idea of anticipated revenues and expenses. However, the traditional budgeting process, in its pursuit of precision and consensus, can take several months. By the time the budget is finalized and approved, it might already be outdated.In today's rapid pace of change and unpredictability, the conventional budgeting process is coming under scrutiny.It's…

How Macmillan Publishers authored success using IBM Cognos Analytics

5 min read - Macmillan Publishers is a global publishing company and one of the “Big Five” English language publishers. If you're a reader, chances are good you've read a book from Macmillan. They published many perennial favorites including Kristin Hannah’s The Nightingale, Bill Martin’s Brown Bear, Brown Bear, what do you see? and some of the more recent bestsellers such as The Silent Patient by Alex Michaelides, Identity by Nora Roberts and Razorblade Tears by S. A. Cosby. It’s no wonder then that Macmillan…

MLOps and the evolution of data science

7 min read - The advancement of computing power over recent decades has led to an explosion of digital data, from traffic cameras monitoring commuter habits to smart refrigerators revealing how and when the average family eats. Both computer scientists and business leaders have taken note of the potential of the data. The information can deepen our understanding of how our world works—and help create better and “smarter” products. Machine learning (ML), a subset of artificial intelligence (AI), is an important piece of data-driven…