Introduction

mapcv turns a bounding box and a set of polygon labels into a ready-to-train satellite imagery dataset for semantic segmentation. You give it a geographic region, a tile source, and optionally a KML or GeoJSON annotation file. It fetches the tiles, rasterizes your labels onto the image grid, extracts fixed-size patches, and writes everything to disk.

To ensure a lightweight footprint and easy installation, the tool is GDAL-free. Tile stitching, rasterization, and patch sampling are powered by Rust, while the Python layer handles orchestration, configuration, and access via the CLI or API.

Key concepts

Bounding box: The target area in WGS-84 (longitude/latitude). mapcv expands these coordinates outward to align with the nearest tile grid so no downloaded tiles are partially cut off.
Zoom level: Controls resolution. Higher zoom means more tiles and more detail. For example, zoom 19 has about 4x more tiles than zoom 17 for the same area.
Strips: The tile grid is divided into horizontal bands of strip_rows tile rows. Processed and written one strip at a time to cap memory use.
Patches: Fixed-size square crops extracted from the stitched image. Each patch has a matching mask file of the same name.
Class IDs: Mask pixels are 0 (background) or 1..255 (class). When label_field is null, all polygons get class_id = 1. When a field name is given, IDs are assigned by encounter order starting at 1.

When to use mapcv

You have polygon annotations (KML or GeoJSON) and want pixel-level segmentation masks.
You want image-only patches (unlabeled) — just omit the labels section.
You need a reproducible pipeline that goes from a config file to a ready-to-train dataset in one command.

Quick start guide

Get up and running with mapcv and generate your first dataset in three simple steps.

1. Install mapcv

Install the latest version of mapcv using pip.

pip install mapcv

2. Configure your dataset

Generate a template configuration file and customize it to fit your needs:

mapcv init my_dataset.yaml

Note: For details on all available settings, check out the Configuration Reference page.

3. Generate the dataset

Once your configuration file is ready, run the following command to start the generation process:

mapcv generate my_dataset.yaml

That’s it! Your generated dataset will be saved to the output directory you specified in your configuration file.

Citation

If you use mapcv in your research, please consider citing it.

(Citation information will be added after publication.)