
Face Toon
The following project was completed as a team of four in which we wrote the code in the Python programming language and used its subsequent libraries such as Numpy and PyTorch. This project was created for our second year introduction to artificial intelligence course.
The aim of the project was to create a machine learning algorithm that could convert human faces to cartoons, hence the name Face Toon. The motivation for this project came from photo-editing software's such as Adobe Photoshop and Snapchat where a user can apply different filters to their faces. Thus, we wanted we wanted to gain a deeper understanding of how these software’s work and attempt to replicate them using deep learning; and the chosen solution was to use GANs (Generative Adversarial Networks).
​
In terms of data processing, two datasets were collected for this model. The first was of human faces and the second was of anime faces. The team decided to select anime faces as the cartoon to be applied for this project.
The datasets were cleaned in order to produce the desired outcome from our model. This meant that the images had to have the face centered and clear of any obstructions as well as its size reduced to 64x64 pixels. Since the model used GAN’s, a validation set was not required and thus the data was split with a 80:20 ratio of training and testing.
​
The baseline model used was called style transfer. We used a pretrained 19-layer VGG (Visual Geometry Group) network available on Pytorch, which simply took two images and combined their styles in the output. However, the problem with this model was that it overlaid the two images onto each other without considering the position of the features; as a result a valid output wouldn't be expected from this model (Figure 1). Nonetheless, we were able to narrow our requirements from a model that converted faces to cartoons to a model that would be able to convert some features of a human face to cartoons. This also narrowed down the problem space to specific kinds of GAN's that we could implement.
​
Therefore, a collective decision was made and the model was trained using CycleGAN. It incorporated two sets of unique generators and discriminators. Generator A converted anime faces to human faces and was then checked by Discriminator A. Similarly, Generator B and Discriminator B performed the same operation but this time producing anime faces. Finally, these two pairs of generators and discriminators were combined by running a real anime image through Generator A and then through Generator B. After that, MSE (Mean Squared Error) Loss was applied to the real anime image. The idea was that if we could get the same image by converting it twice with our generator, then the style of the image space is learnt and the features of the image are kept (hair colour, eye shape etc).
​
The images below show the model's results on the training set (Figure 2) along with data it hadn't seen before (Figure 3). You can also click on either Training or Testing to view the source code.

Figure 1. Baseline model (Style Transfer)

Figure 2. Resulting outputs from the training data