Examples
Labeled dataset (KML annotations)
Section titled “Labeled dataset (KML annotations)”The most common case is to download satellite tiles and rasterize KML polygon annotations onto them.
Replace buildings.kml below with the path to your own KML or GeoJSON annotation file.
region: west: 4.883 south: 52.371 east: 4.896 north: 52.378 zoom: 17
tiles: source: google_satellite strip_rows: 4 max_connections: 16 policy: lenient
labels: path: buildings.kml label_field: null # all polygons get class_id = 1 all_touched: false
sampler: patch_size: 256 stride: 0 mode: grid edge_strategy: drop max_empty_ratio: 0.95
writer: staging_dir: ./output image_format: png
split: test_ratio: 0.20 val_ratio: 0.10 seed: 42 strategy: stratifiedMulticlass dataset (GeoJSON with class field)
Section titled “Multiclass dataset (GeoJSON with class field)”When your GeoJSON has a property that identifies the class, pass it as label_field. Class IDs are assigned by encounter order starting at 1.
labels: path: landuse.geojson label_field: class # e.g. "residential", "commercial", "park" all_touched: falseThe returned class_map (visible in the manifest) tells you the exact name -> id mapping.
Unlabeled dataset (image-only)
Section titled “Unlabeled dataset (image-only)”Omit the labels section entirely. No Masks/ directory is written.
region: west: 74.274 south: 31.568 east: 74.327 north: 31.613 zoom: 19
tiles: source: esri_satellite
sampler: patch_size: 512 stride: 0 mode: grid edge_strategy: shift
writer: staging_dir: ./output image_format: jpg jpg_quality: 90Semi-supervised dataset
Section titled “Semi-supervised dataset”Use labeled_ratios to generate labeled/unlabeled split files for semi-supervised training.
split: test_ratio: 0.20 val_ratio: 0.10 labeled_ratios: [0.05, 0.10, 0.20] seed: 42 strategy: stratifiedThis writes 5/labeled.txt, 5/unlabeled.txt, 10/labeled.txt, 10/unlabeled.txt, and so on for each ratio, inside staging_dir/splits/. The directory name is the ratio as an integer percentage.
Custom tile source
Section titled “Custom tile source”Pass any XYZ URL with {z}, {x}, {y} placeholders via url_template instead of source.
tiles: url_template: "https://tile.example.com/{z}/{x}/{y}.png" strip_rows: 4 max_connections: 8 policy: lenientRandom patch sampling
Section titled “Random patch sampling”Use mode: random to draw patches at random positions instead of a regular grid. Useful for large regions where you only need a sample.
sampler: patch_size: 256 mode: random random_count: 500 random_seed: 42 edge_strategy: pad