SedonaSnow: Integrating Apache Sedona with Snowflake
Updated: Jun 26
Today, the Wherobots.ai team announces the release of a plugin (namely SedonaSnow) that brings Apache Sedona spatial sql functionality to Snowflake for beta review. The proposed SedonaSnow plugin will be provided free of charge to all Snowflake users after they sign up for Wherobots. The plugin doubles the spatial sql fucntions supported by Snowflake, and hence enables snowflake users to do more with their geospatial data. In this blog post, we explain the Sedona-Snowflake integration details.
Why Integrate Apache Sedona with Snowflake?
Apache Sedona is powerful open-source framework for large-scaling geospatial data processing and analysis. It has been used in use cases where data engineers need to process spatial data at scale in their complicated data stack. On the other hand, Snowflake is a cloud-based data warehousing platform that allows for seamless data integration and processing.
Many Snowflake users collect and store geospatial data in the platform. To meet this need, Snowflake released the GEOGRAPHY data type and corresponding geospatial functions in June 2021. Since then, new features have been added to fulfill computation demands.
Apache Sedona is a top computation engine in the geospatial domain, providing comprehensive geospatial data processing capabilities to popular computation engines in the industry. The Wherobots team has observed a significant overlap in user bases between Snowflake and Sedona. For users who have been utilizing Sedona Spatial SQL APIs, we aim to enable them to continue using their favorite Sedona-flavored ST functions through Snowflake SQL.
Use your favorite Sedona Spatial ST_* Functions natively in Snowflake SQL
Wherobots has wrapped up 120+ Sedona SQL ST functions as Java functions that can be registered as UDFs (User-Defined Functions) in Snowflake SQL. This has doubled the number of available ST functions in Snowflake SQL, and enables you to do things that are currently not supported in Snowflake's built-in geospatial functions.
Popular ST functions such as ST_SubDivideExplode and ST_MakeValid, which are used by Sedona users, will now be available in Snowflake SQL. You will be able to call them as UDFs in Snowflake, just as you would in Sedona Spark SQL.
You can use S2 cells to generate cells from shapes and match shapes with cell IDs.
Below is an example query that reads points stored in GEOJSON format, extracts cell IDs for the points, and finds the points that match the cell IDs with a polygon window.
We also imported the predicates to allow for spatial range queries and joins.
Use Sedona and Snowflake ST functions in the same query
We have designed Sedona’s user-defined functions (UDFs) to seamlessly integrate with snowflake's built-in geospatial features.
Universal Serde format
Wherobots uses the Extended Well-Known Binary format as the input and output format for Sedona ST functions. Additionally, Snowflake has implemented a constructor from EWKB for the GEOMETRY type, making it easy for users to use both Sedona and Snowflake Spatial functions in a hybrid way within the same SQL.
You can hybridize the calling of Sedona and Snowflake functions as follows:
To transform data into a format accepted by Sedona, you can use the ST_ASEWKB(GEOMETRY) function. If you want to opt out of Sedona functions, you can call TO_GEOMETRY(BINARY sedona_outcome) and then continue to call the built-in geometry functions. The example below shows how you can mix function calls in a single SQL query. Note that the sedona.ST* functions are included.
How to install Sedona in Snowflake
At Wherobots, we have implemented our own automation that makes it easy to deploy Sedona Snowflake to your Snowflake account with just a few clicks on our website, https://www.wherobots.services. Currently, we are actively working with Snowflake to make Sedona installation available via the Snowflake Marketplace.
**Registration and deployment of Snowflake through our service is free.
Setup Permissions for Wherobots through Snowflake
You can either use an existing database or create a new one for Wherobots to utilize. Wherobots will install Sedona onto the selected database. To enable Wherobots to deploy to your environment, you need to prepare the following:
Setup Schema Permissions and assign the role to a user
Wherobots registers functions in a hardcoded schema called sedona. To enable this, you have two options:
Create the sedona schema yourself with minimal permissions
Allow Wherobots to create the schema for you
The last step is to assign the role to a user.
Get your Snowflake Credentials
To use Snowflake from Wherobots, you will need the following credentials:
Role (created above)
Steps to get your Account Identifier
Login to your Snowflake account
Click the icon that says “Copy account identifier
Deploy Sedona to Snowflake
Step 1: Navigate to the Wherobots Cloud Providers Page and select Snowflake
Step 2: Enter the credentials you obtained above. For the Account Identifier, paste the value from above and replace the period with a hyphen. (ex. XXXXXXX.WLB00000 => XXXXXXX-WLB00000)
Step 3: Select the database you want to use and click “Set Up Sedona”.
Step 4: Click "install" on the dialog and wait for Wherobots to finish installing to your database.
Step 5: Done! Now you can revoke your user credentials in Wherobots. Wherobots won't use the credentials for purposes other than deploying the Sedona ST_* functions.
Try it yourself!
Wherobots has seamlessly integrated Apache Sedona with Snowflake, enabling users to leverage Sedona's geospatial data processing functionalities through Snowflake SQL. All of Sedona's SQL ST functions have been encapsulated as Java functions by Wherobots, which can be registered as UDFs in Snowflake SQL, and have been engineered to seamlessly integrate with Snowflake's in-built geospatial capabilities. Wherobots has also developed its own automation system, making it effortless to deploy Sedona Snowflake to any Snowflake account. Now it is time to try it yourself!