top of page
  • Writer's pictureMo Sarwat

Wherobots provides ready to use spatially-indexed Overture Maps Data for free

The Overture Maps Foundation (OMF) has recently released its first open-source Parquet dataset, divided into four themes - places of interest, buildings, transportation networks, and administrative boundaries. The dataset was released as parquet files stored on Amazon S3. Wherobots applauds and fully supports this effort as it paves the way for more open, accurate, and usable maps. Kudos to the OMF team!

However, the parquet formats lacks native spatial data types and spatial indexing schemes suited for efficient spatial analysis. The required data loading and preparation are time-consuming. Using a format optimized for geospatial workloads, like GeoParquet, could significantly reduce the pre-processing time for this analysis.

Wherobots engineers prepared a data pipeline that converts Overture Maps Foundation (abbr. OMF) data from parquet to geoparquet. The pipeline consists of five steps, which are (1) Reading OMF Parquet Files from Overture Maps Foundation S3 bucket, (2) Using Apache Sedona to convert OMF Parquet files to GeoPaquet, (3) Storing OMF Geoparquet files on S3 bucket managed by Wherobots, (4) Users download load/access the Geoparquet files for free, (5) Users can now run spatial queries on OMG Geoparquet files using Sedona Spatial SQL and Spatial python API.

With that, users can run spatial queries / data engineering tasks on the overture maps datasets 60X faster using Sedona+Geoparquet compared to just using the plain parquet format. More details are available in this medium article.

Enjoy 🙂

49 views0 comments
bottom of page