Skip to the content.

Face Manipulation Datasets

Comprehensive Dataset of Face Manipulations for Development and Evaluation of Forensic Tools.
Brian DeCann (brian.decann@str.us), Kirill Trapeznikov (kirill.trapeznikov@str.us), Technical Report.

CelebHQ-FM (Face Manipulations)

We compiled a dataset of edited portrait-style images. The image data was sourced from a subset of the CelebA-HQ dataset. In our subset, we only consider identities that appear at least twice (i.e., there are at least two images of a given identity) in the image data.

Manipulation Model: We applied the Pivotal Tuning approach by Roich et al. to create each manipulated image [2].

We created two partitions of image data for training and testing purposes.

Evaluation Protocol

For our portrait-style face manipulation dataset, we supply two challenges: detection and classification. A description of both challenges and associated outputs are described in the following sections.

Download

  Train Test
images train.zip test.zip
meta train.csv test.csv

FFHQ-FM (Face Manipulations) in-the-wild

We compiled a dataset of edited in-the-wild-style images. The image data was sourced from a subset of the Flickr-Faces-HQ (FFHQ) [3].

Manipulation Model: We adopt the approach from Tzaban et al. to inject edits to in-the-wild images [4]. Edits are localized to a region of the full-scene image. This is in contrast to the portrait-style face manipulation dataset, where images are fully synthesized from face-based GAN’s.

We created two partitions of image data for training and testing (validation) purposes.

Evaluation Protocol

We supply three challenges: detection, localization, and classification. A description of each challenge and outputs are described in the following sections.

In our in-the-wild face manipulation dataset the types of edits that are present in the training partition are also represented in the testing partition. Similarly, the types of edits that are in the testing partition are also represented in the training partition. Thus, the classification problem for this dataset is closed-set. This is in contrast to the portrait-style data, where novel edit types exist in the testing partition. We encourage users utilizing this data and challenge problem to consider open-set solutions as the set of potential edit types is near-unlimited.

Download

  Train Test Test (Diffusion Model)
images train.zip test.zip test_dm.zip
meta train.csv test.csv test-dm.csv

Contact

Kirill Trapeznikov kirill.trapeznikov@str.us
Brian DeCann brian.decann@str.us

References

[1] C.-H. Lee, Z. Liu, L. Wu, and P. Luo. Maskgan: Towards diverse and interactive facial image manipulation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[2] D. Roich, R. Mokady, A. H. Bermano, and D. Cohen-Or. Pivotal tuning for latent- based editing of real images. arXiv preprint arXiv:2106.05744, 2021.

[3] T. Karras, S. Laine, and T. Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.

[4] R. Tzaban, R. Mokady, R. Gal, A. H. Bermano, and D. Cohen-Or. Stitch it in time: Gan-based facial editing of real videos. arXiv preprint arXiv:2201.08361, 2022

[5] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 325–341, 2018

How to Cite

Please include the following citation when utilizing this dataset:

@misc{https://doi.org/10.48550/arxiv.2208.11776,
  doi = {10.48550/ARXIV.2208.11776},
  url = {https://arxiv.org/abs/2208.11776},
  author = {DeCann, Brian and Trapeznikov, Kirill},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), Multimedia (cs.MM), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Comprehensive Dataset of Face Manipulations for Development and Evaluation of Forensic Tools},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

Acknowledgment

This material is based upon work supported by DARPA under Contract No. HR0011-20-C-0129. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA.

DISTRIBUTION A. Approved for public release: distribution unlimited.