New OSM file format: 30% smaller than PBF, 5x faster to import – General talk

Date:

Share:

The OSM dataset is huge, and keeps growing every day. Great news, of course, but sometimes the sheer volume can be overwhelming – there are just gobs and gobs of data!

Hence, we created GOB (“Geo-Object Bundle”), a new file format that makes tackling OSM data faster and easier. It’s a companion format to our now-familiar Geo-Object Library (essentially, a tightly-compressed GOL with its indexes stripped).

To support this new format, GOL Tool 2.1 has two new commands: save GOLs as GOBs and load GOBs into a GOL (Of course, like all of the GeoDesk Toolkit, the GOL Tool is free & open-source).

  • GOB files are on average half the size of a GOL, and 30% smaller than PBFs.

  • Importing a GOB is 5 times faster than building a GOL from a PBF. A modern system loads a planet-size GOB into a GOL in 3 minutes. The speed advantage grows more pronounced on memory-constrained machines: gol build starts paging heavily with less than 32 GB of RAM, whereas gol load requires minimal resources (even a decade-old laptop loads the whole planet in under an hour).

  • GOBs are organized into tiles, so it’s easy to extract regional subsets (basically at file-copy speed) and stitch them back together; that makes GOB a convenient format for archiving and distributing geodata.

The image above shows some of the tiling structure, which mimics that of tile renderers. On the left, the smallest squares are zoom 6, the right shows the most granular level (zoom 12). A typical planet GOB has about 60,000 tiles.

Below are some size statistics for the planet file and popular regional extracts (without metadata):

                PBF      GOL               GOB
Planet      65.4 GB  93.6 GB  +43.1%   46.0 GB  -29.7%
California  1.18 GB  1.59 GB  +35.0%    770 MB  -36.5%
France      4.54 GB  5.89 GB  +29.7%   2.84 GB  -36.3%
Germany     4.29 GB  5.92 GB  +38.0%   2.67 GB  -37.5%
Italy       1.96 GB  2.63 GB  +34.0%   1.34 GB  -31.6%
Japan       2.13 GB  2.91 GB  +36.1%   1.34 GB  -37.0%
Poland      1.84 GB  2.72 GB  +47.6%   1.29 GB  -29.7%
Switzerland  487 MB   634 MB  +30.1%    311 MB  -36.2%

Dense, well-mapped areas tend to compress best as GOB. Less complete regions are below average in terms of GOB’s size advantage (GOBs for Brazil and China are only 23% smaller).

Just like GOLs, GOBs don’t store:

  • metadata (timestamp of last edit, changeset, username, etc.)

  • history (each GOB is a snapshot of the OSM dataset)

Therefore, it is not intended for editing, but for archival and distribution.

You will need GOL Tool 2.1 or above (download).

To export a GOL as a GOB:

gol save  []

If is omitted, it uses the same base name as the GOL. The .gol and .gob extensions are optional.

To limit the export to a specific area, use the --area (-a) option. You can specify a (multi)polygon as WKT, GeoJSON or simple coordinates (lon,lat pairs, rings are closed automatically), either directly or as a file. If no file extension is given, .wkt is assumed.

For example:

gol save world bodensee -a 9.55,47.4,8.78,47.66,9.01,47.88,9.85,47.58,9.82,47.46 

exports the tiles covering the region around the Bodensee (Lake Constance).

To import tiles into a GOL:

gol load  []

As with save, if is omitted, the base name of the GOL is used. If the GOL does not exist, it is created. To load just a specific region, restrict it with the -a option.

gol load japan -a shikoku

loads tiles from japan.gob into japan.gol (creating it if it doesn’t yet exist), but only those intersecting the area defined in shikoku.wkt.

This is still a work in progress, so the format may change. I’m experimenting with different compression algos beyond zlib to make it even tighter and faster (zstd didn’t yield any significant gains). I’m also in the process of enabling gol load to download a GOB directly from a URL and build the GOL in the background, which would bring the wall-clock import time to zero.

As always, questions/feedback are welcome! Please stop on by on Github and @geodesk@en.osm.town.

Source link

Subscribe to our magazine

━ more like this

What You (Want to)* Want

November 2022Since I was about 9 I've been puzzled by the apparent contradiction between being made of matter that behaves in a predictable way, and the...

‘She landed through her uncomfortableness and gave me a chance’: After weeks of patience, a once-scared mama cat finally starts to trust the hooman...

Sometimes, the Cat Distribution System delivers in the most unexpected places. While checking on a quiet cemetery, this caretaker discovered a tiny, furry mystery....

Taylor Swift’s Style | PS Fashion

While each product featured is independently selected by our editors, we may include paid promotion. If you buy something through our links, we may...

Olive & June The Builder Gel Mani System Review At Home

"Because gel had done so well, but we knew people were like, 'How do I get stronger, longer nails that are mine? How do...

OpenAI Data Shows Hundreds of Thousands of Users Display Signs of Mental Health Challenges

OpenAI claims that 10% of the world’s population currently uses ChatGPT on a weekly basis. In...