Create Anime Characters with A.I. !
We all love anime characters and are tempted to create our custom ones, but most of us simply cannot do that just because we are not pros. What if anime characters can be automatically generated in a profession level quality? Image that you just specify some attributes such as blonde + twin tail + smile and an anime character with your customization is generated whiteout any further intervention!
Already we have some pioneers in the anime generation, such as ChainerDCGAN, Chainerを使ってコンピュータにイラストを描かせる, and online-available code such asIllustrationGAN and AnimeGAN. But very often results generated these models are blurred and distorted, and it is still a challenge to generate industry-standard facial images for anime characters. As a step towards tackling this challenge, we propose a model that produces anime faces at high quality with promising rate of success.
Dataset: a Model of Good Quality Begins with a Clean Dataset.
To teach computer to do things requires high quality data, and our case is not an exception. Large scale image boards likeDanbooruand Safebooruare noisy and we think this is at least partial reason for issues in previous works, so instead we use standing pictures (立ち絵) from games sold on Getchu, a website providing information and selling of Japanese games. Standing pictures are diverse enough since they are of different styles for games in a diverse sets of theme, yet consisting since they are all belonging to domain of character images. We also need categorical metadata (a.k.a tags/attributes) of images like hair color, whether smiling or not. Getchu does not provide such metadata, so we use Illustration2Vec, a CNN-based tool for estimating tags of anime.
Model: The Essential Part
A good model is also a must-have for our goal. The generator should know and follow user's specified attributes, which is called prior or condition, and should also have freedom to generate different, detailed visual features, which is modeled using noise. We use a popular framework called GAN (Generative Adversarial Networks). GAN uses a generator network GG to generate images from prior and noise, and also a another network DD trying to distinguish GG's images from real images. We train them, and in the end GG should be able to generate images so realistic that DD cannot tell it from real images, given the prior. However it is infamously hard and time-consuming to properly train GAN. Luckily a recent advance, named DRAGAN, can give presumable results compare to other GANs with least computation power required. We successfully train the DRAGAN whose generate is SRResNet-like. Also, we need our generator to know the label information so user's customization can be used. Inspired by ACGAN, we feed the labels to the generator GG along with noise and w add a multi-label classifier on the top of discriminator DD, which try to predict the assigned tags for the images.
Samples: A Picture is Worth a Thousand Words
To taste the quality of our model, see the generated images like the following: it handles different attributes and visual features well.

One interesting setting would be fixing the random noise part and sampling random priors. The model now is required to generate images have similar major visual features with different attribute combinations, and it does well:

Also, by fixing priors and sampling randomly the noise, the model can generate images have the same attributes with different visual features:

Web Interface: Bring Neural Generator to your Browser
In order to make our model more accessible, we build this website interface with React.js for open access. We also make the generation completed done on the browser side, by imposing WebDNN and converting the trained Chainer model to the WebAssembly based Javascript model. For a better user experience, we would like to keep the size of generator model small since users need to download the model before generating, so we replace the DCGAN generator by SRResNet generator can make the model 44 times smaller. Speed-wise, even all computations are done on the client side, on average it takes only about 66 seconds to generate a single image.