Wherobots makes Spatial Data + AI pipelines 10 times faster on Databricks
Geospatial data is becoming increasingly prevalent in various industries and applications, from automotive and logistics to environmental monitoring and urban planning. However, processing and analyzing this data can be a major challenge due to its sheer size and complexity. Geospatial data processing requires specialized tools and techniques, and even with these in place, it can still be slow and time-consuming.
Many Databricks customers have been utilizing Apache Sedona for this very purpose for several years, with its optimized spatial algorithms and breadth of function. Now, Wherobots is offering even faster and more robust tooling for use with Databricks.
Why integrating Wherobots with Databricks
Wherobots leverages the power of Apache Sedona to provide scalable and efficient data processing. With Wherobots, developers can easily ingest, process, and analyze geospatial data at scale, without the need for specialized hardware or software. This means that developers can spend less time worrying about infrastructure and more time focusing on building meaningful applications and insights from their geospatial data.
What you can do with Wherobots on Databricks
With Wherobots on Databricks, you can complete complicated Spatial data+AI tasks, and see significant speedups on even the simplest of spatial operations.
In the example below, using Wherobots, we can do spatial joins, cast GPS coordinates to geospatial points, and convert serialized WKT geometries to Sedona geometries.
The first step is to load a dataset and convert the latitudes and longitudes to Sedona geospatial points.
After we have the taxi dataset, we will assemble a frame of zip codes in NYC. These rows each contain WKT encoded geometries that represent the boundaries of that zip code.
Finally, we execute a spatial join to add a column to each pickup location with which zip code the pickup fell inside.
Compare Wherobots and Open-source Apache Sedona
The Wherobots Compute engine is highly efficient at performing spatial data processing operations. When compared to OSS Apache Sedona, Wherobots Compute executes spatial join operations ten times faster while using only half the compute resources. Additionally, it provides comprehensive support for H3 cells.
Fast spatial query execution
We have performed a series of spatial join queries on the datasets below.
The planet OSM road network dataset. Only use the nodes.
The planet OSM road network dataset. Only use the road segments (i.e., edges)
The planet OSM road network dataset. Use the full shape of each road. Each road has many segments.
All zip code zones in the world. Each zone is a polygon and indicates a city.
We report our performance gain with Wherobots over the OSS Apache Sedona as follows:
Spatial Join Query
Wherobots Performance gain over Apache Sedona
Nodes within Zones join
10 times faster
Edges overlap Zones join
11 times faster
Roads overlap Zones join
5 times faster
Comprehensive support for Uber H3.
Wherobots Compute provides the following functions related to Uber H3. More details can be found at https://docs.wherobots.services/references.
ST_H3CellIDs(geom: geometry, level: Int, fullCover: true)
ST_H3CellDistance(cell1: Long, cell2: Long)
ST_H3KRing(cell: Long, k: Int, exactRing: Boolean)
Let's use the Seattle road network dataset (from OSM) as an example.
Create H3 cell IDs for geometries
You can create H3 cell ids using ST_H3CellIds as follows.
Visualize/Debug H3 cells
You can use ST_H3ToGeom to generate the boundary of an H3 cell given its ID. To demonstrate, we will plot a subset of these cells using the basic GeoPandas plotting function. But, the user may use a more advanced map visualization techniques to plot H3 grids. The resulting plot is shown below:
Join geometries by H3
The example above demonstrates how to join two datasets together using their H3 cell IDs. Even more impressively, you can create a ring buffer around the original geometry using ST_H3KRing and find matches using the rings. The following query returns roads located within 10 cells of ST_POINT(-122.390, 47.54717658413222):
We can visualize the ring of the cell created by ST_H3KRing(ST_H3CellIDs(ST_POINT(-122.390, 47.54717658413222), 10, false), 10, false) as follows:
Alternatively, we can fill in all cells in the ring like this ST_H3KRing(ST_H3CellIDs(ST_POINT(-122.390, 47.54717658413222), 10, false), 10, true) :
How to set up Wherobots on Databricks
At Wherobots, we have implemented our own automation that makes it easy to deploy Wherobots to your Databricks account with just a few clicks on our website, https://www.wherobots.services.
Create a Databricks cluster
To get started with Wherobots on Databricks, first create a Databricks cluster.
Log in to your Databricks workspace.
Under the Data Science & Engineering menu, choose compute
Click the Create compute button. You can use the default setting if you just want a quick start.
Create Personal Access Token (PAT)
Once you have created a cluster, you will need to create an api key with Databricks that will allow Wherobots to manage that cluster. In the user menu at the top right of the dashboard you can click on User Settings and then the Access tokens tab to generate a token. Make sure to save it for the next step.
Install Wherobots to your cluster via our website
With your credentials in hand, log in or create an account at https://wherobots.services. Input your Databricks credentials on your account page. Then, head to https://wherobots.services/dashboard/clusters and select the cluster you created. Finally, press the Set Up Wherobots button to begin the setup process.
You will be prompted to select a wherobots version: choose the recommended version
Once the installation completes, you can start your cluster with the Start button!
Run Wherobots notebooks on Databricks
Now you have Wherobots setup for your cluster and can run spatial queries on Databricks. We provide many ready-to-use example Python Jupyter notebooks which will work on Databricks. Please try them out: Wherobots examples.
1. Navigate to your Databricks workspace and import your notebook.
2. Select your notebook file. You can get one from Wherobots examples.
3. Now all thats left is to run the notebook. The documentation for Wherobots includes a lot of details on how to use Wherobots with Apache Spark (Including Databricks)
Try it now!
Processing geospatial data used to be a challenging and time-consuming task. However, with Wherobots and Databricks, developers can easily ingest, process, and analyze geospatial data at scale. Wherobots is the ideal platform for developers looking to extract value from their geospatial data, with its powerful geospatial processing capabilities, unified data interface, and visualization tools. Whether you're building applications for logistics, environmental monitoring, or urban planning, Wherobots can help you unlock the full potential of your geospatial data.