Skip to content

Examples

The most common case is to download satellite tiles and rasterize KML polygon annotations onto them.

Replace buildings.kml below with the path to your own KML or GeoJSON annotation file.

region:
west: 4.883
south: 52.371
east: 4.896
north: 52.378
zoom: 17
tiles:
source: google_satellite
strip_rows: 4
max_connections: 16
policy: lenient
labels:
path: buildings.kml
label_field: null # all polygons get class_id = 1
all_touched: false
sampler:
patch_size: 256
stride: 0
mode: grid
edge_strategy: drop
max_empty_ratio: 0.95
writer:
staging_dir: ./output
image_format: png
split:
test_ratio: 0.20
val_ratio: 0.10
seed: 42
strategy: stratified

Multiclass dataset (GeoJSON with class field)

Section titled “Multiclass dataset (GeoJSON with class field)”

When your GeoJSON has a property that identifies the class, pass it as label_field. Class IDs are assigned by encounter order starting at 1.

labels:
path: landuse.geojson
label_field: class # e.g. "residential", "commercial", "park"
all_touched: false

The returned class_map (visible in the manifest) tells you the exact name -> id mapping.


Omit the labels section entirely. No Masks/ directory is written.

region:
west: 74.274
south: 31.568
east: 74.327
north: 31.613
zoom: 19
tiles:
source: esri_satellite
sampler:
patch_size: 512
stride: 0
mode: grid
edge_strategy: shift
writer:
staging_dir: ./output
image_format: jpg
jpg_quality: 90

Use labeled_ratios to generate labeled/unlabeled split files for semi-supervised training.

split:
test_ratio: 0.20
val_ratio: 0.10
labeled_ratios: [0.05, 0.10, 0.20]
seed: 42
strategy: stratified

This writes 5/labeled.txt, 5/unlabeled.txt, 10/labeled.txt, 10/unlabeled.txt, and so on for each ratio, inside staging_dir/splits/. The directory name is the ratio as an integer percentage.


Pass any XYZ URL with {z}, {x}, {y} placeholders via url_template instead of source.

tiles:
url_template: "https://tile.example.com/{z}/{x}/{y}.png"
strip_rows: 4
max_connections: 8
policy: lenient

Use mode: random to draw patches at random positions instead of a regular grid. Useful for large regions where you only need a sample.

sampler:
patch_size: 256
mode: random
random_count: 500
random_seed: 42
edge_strategy: pad