Synthetic and Manipulated Overhead Imagery For Forensics Research
Brandon B. May, Kirill Trapeznikov, Shengbang Fang, Matthew C. Stamm
Comprehensive Dataset of Synthetic and Manipulated Overhead Imagery For Development and Evaluation of Forensic Tools
https://arxiv.org/abs/2305.05784
@inproceedings{may2023comprehensive,
title={Comprehensive Dataset of Synthetic and Manipulated Overhead Imagery for Development and Evaluation of Forensic Tools},
author={May, Brandon B and Trapeznikov, Kirill and Fang, Shengbang and Stamm, Matthew},
booktitle={Proceedings of the 2023 ACM Workshop on Information Hiding and Multimedia Security},
pages={145--150},
year={2023}
}
👉 ⬇️ Dataset Download
Dataset Overview

Sources
Pristine Tile Providers

City Conditioning
Inpainting Category
| Inpainting Category | Region Type | Manipulated 🔄 Pristine | Mask |
|---|---|---|---|
| Greenspace | Bezier | ![]() |
![]() |
| Buildings | Bezier | ![]() |
![]() |
| Greenspace | GraphCut | ![]() |
![]() |
| Buildings | GraphCut | ![]() |
![]() |
Inpainting Size
| Inpainting Area Size | Manipulated 🔄 Pristine | Mask |
|---|---|---|
| Exra Small | ![]() |
![]() |
| Small | ![]() |
![]() |
| Medium | ![]() |
![]() |
| Large | ![]() |
![]() |
Fully Synthetic Categories

Dataset Stats
Train Set
The train set consists of 10,000 images in total:
- 4,964 pristine (unmanipulated)
- 2,539 fully synthetic
- 2,497 partially manipulated (with manipulated regions between 1/16 and 1/4 of the image area)
- 1,260
buildings_roadsmanipulation class - 1,237
greenspace_watermanipulation class
- 1,260
Each image has a corresponding binary mask image that indicates the manipulated region with the color white.
Test Set
The test set consists of 3,150 images in total:
- 1,511 pristine (unmanipulated)
- 751 fully synthetic
- 888 partially manipulated (including 150 with manipulated regions < 1/16 of the image area)
- 432
buildings_roadsmanipulation class - 456
greenspace_watermanipulation class
- 432
Each image has a corresponding binary mask image that indicates the manipulated region with the color white.
Dataset Download
Metadata CSV Format
Each dataset is provided with a metadata.csv file which contains detailed information about each image. The following columns are particularly relevant for training and testing:
image_filename: Relative path to the image, i.e.im0000.pngmask_filename: Relative path to the corresponding binary mask, i.e.im0000_mask.pngmanipulated_fraction: [0-1], Fraction of the image that has been manipulatedmanipulation_class:buildings_roadsgreenspace_water- Empty for pristine and fully synthetic images
manipulation_type:pristine: Unmanipulatedgenerated_uncond: Fully synthetic, no basemap usedgenerated_cond: Fully synthetic, generated basemap usedgenerated_real: Fully synthetic, real basemap usedinpainted_struct: Partially manipulated,buildings_roadsinpainted_unstruct: Partially manipulated,greenspace_water
Additional Generation Modes
Partial Manipulations

Natural Disaster Simulation

Contact
For more information: Kirill Trapeznikov, kirill.trapeznikov@str.us; Brandon May, brandonbmay@gmail.com
Acknowledgment
This material is based upon work supported by DARPA under Contract No. HR0011-20-C-0129. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA.
DISTRIBUTION A. Approved for public release: distribution unlimited.















