Anonymous, The Danbooru Community, & Gwern Branwen; Danbooru2020: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset, 2021-01-12. That should give you Danbooru2017 bit-identical to as released on 2018-02-13. A combination of a n = 300k subset of the 512px SFW subset of Danbooru2017 and Nagadomis moeimouto face dataset are available as a Kaggle-hosted dataset: Tagged Anime Illustrations (36GB). Image boorus like Danbooru are image hosting websites developed by the anime community for collaborative tagging. These tags form a folksonomy to describe aspects of images; beyond the expected tags like long_hair or looking_at_the_viewer, there are many strange and unusual tags, including many anime or illustration-specific tags like seiyuu_connection (images where the joke is based on knowing the two characters are voiced in different anime by the same voice actor) or bad_feet (artists frequently accidentally draw two left feet, or just bad_anatomy in general). With the exception of MNIST & Omniglot, almost all commonly-used deep learning-related image datasets are photographic. So the absence of a tag isnt as informative as the presence of a tageyeballing images and some rarer tags, I would guess that tags are present <10% of the time they should be. Similar face datasets: DanbooruAnimeFaces:revamped (DAF:re) (Rios et al 2021): a reprocessed dataset, using n = 460k larger 224px images of 300k characters. Danbooru20xx datasets have been extensively used in projects & machine learning research. Proposed fix: in Danbooru2019+s 512px SFW subset, the downscaling has switched to adding white backgrounds rather than black backgrounds; while the same issue can still arise in the case of white line-art drawings with transparent backgrounds, these are much rarer.
