Deep Learning: Faceswap model

Plan of Attack

1. Faceswap Intuition

Faceswap is a process of change person face with other faces. There are many tools, libraries to do this job. However, I am going to build a "Faceswap" from scratch using one of the most powerful deep learning algorithms which is "AutoEncoders" algorithm. Why AutoEncoders?

An autoencoder is un-supervised deep learning that tries to reconstruct its input. It's mostly use to build a recommendation/suggestion system.

So for our "Faceswap", we are going to train our model with faces_a then we pass the image of face_a or face_b to the model and let it reconstruct the face. For more detail you can watch in this video.

2. Data Preparation

In this article, I am going swap my face with "The Rock". So, I need to collect his faces images and the best place to find it is using google image. And you can use the code in my previous article for this section.

However, what we need is just his face, so we need to write OpenCV code to crop all his faces into a folder where you can find the detail of face cropping in this post. After we crop faces, we need to do a manual check and remove the face doesn't belong to his faces. For our faces, we can use webcam to collect it where the code is here, and then we are ready for next step.

3. Init Model

Since our inputs are face images, it makes sense to use convolutional neural networks (convnets) as encoders and decoders. In practical settings, autoencoders applied to images are always convolutional autoencoders --they simply perform much better.

Let's implement it. The encoder will consist in a stack of Conv2D and MaxPooling2D layers (max pooling being used for spatial down-sampling), while the decoder will consist in a stack of Conv2D and UpSampling2D layers.

First, we need to load the needed package and init some variables such as image_size = 64, batch_size = 128...

from keras import backend as K
from keras.models import Model
from keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Reshape, Conv2DTranspose, UpSampling2D, Activation
from keras.optimizers import Adam
from keras.callbacks import TensorBoard
from libs.read_images_from import read_images_from
import argparse

ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", type=str, default="",
    help="model name", required=True)
args = vars(ap.parse_args())

image_size = 64
input_shape = (image_size, image_size, 3)
layer_filters = [32, 64]
kernel_size = 5
latent_dim = 16
batch_size = 128

Next, let's define encoding model.

inputs = Input(shape=input_shape, name='encoder_input')
x = inputs

for filters in layer_filters:
    x = Conv2D(filters=filters,
               kernel_size=kernel_size,
               strides=2,
               activation='relu',
               padding='same')(x)
    x = MaxPooling2D((2, 2), padding='same')(x)

shape = K.int_shape(x)

x = Flatten()(x)
latent = Dense(latent_dim, name='latent_vector')(x)
encoder = Model(inputs, latent, name='encoder')
encoder.summary()

Next, we define decoder model.

# Build the Decoder Model
latent_inputs = Input(shape=(latent_dim,), name='decoder_input')
x = Dense(shape[1] * shape[2] * shape[3])(latent_inputs)
x = Reshape((shape[1], shape[2], shape[3]))(x)

for filters in layer_filters[::-1]:
    x = Conv2DTranspose(filters=filters,
                        kernel_size=kernel_size,
                        strides=2,
                        activation='relu',
                        padding='same')(x)
    x = UpSampling2D((2, 2))(x)


x = Conv2DTranspose(filters=3,
                    kernel_size=kernel_size,
                    padding='same')(x)
outputs = Activation('sigmoid', name='decoder_output')(x)

# Instantiate Decoder Model
decoder = Model(latent_inputs, outputs, name='decoder')
decoder.summary()

Finally, the Autoencoder model.

# Autoencoder = Encoder + Decoder
# Instantiate Autoencoder Model
autoencoder = Model(inputs, decoder(encoder(inputs)), name='autoencoder')
autoencoder.summary()
optimizer = Adam(lr=5e-5, beta_1=0.5, beta_2=0.999)
autoencoder.compile(loss='mean_absolute_error', optimizer=optimizer)

4. Train Model

We need load and prepare face images before we fit into the model. So we need to define 2 methods read_images_from to load images from folder and random_warp to create a wrap image which use as train image.

# read_images_from.py
# it duplicate image and wrap the duplicate then resize to (64,64,3) then return.
import numpy
import cv2
from .umeyama import umeyama
coverage = 220

# get pair of random warped images from aligned face image
def random_warp(image):
    image = cv2.resize(image, (256, 256))
    assert image.shape == (256, 256, 3)
    range_ = numpy.linspace(128 - coverage//2, 128 + coverage//2, 5)
    mapx = numpy.broadcast_to(range_, (5, 5))
    mapy = mapx.T

    mapx = mapx + numpy.random.normal(size=(5, 5), scale=5)
    mapy = mapy + numpy.random.normal(size=(5, 5), scale=5)

    interp_mapx = cv2.resize(mapx, (80, 80))[8:72, 8:72].astype('float32')
    interp_mapy = cv2.resize(mapy, (80, 80))[8:72, 8:72].astype('float32')

    warped_image = cv2.remap(image, interp_mapx, interp_mapy, cv2.INTER_LINEAR)

    src_points = numpy.stack([mapx.ravel(), mapy.ravel()], axis=-1)
    dst_points = numpy.mgrid[0:65:16, 0:65:16].T.reshape(-1, 2)
    mat = umeyama(src_points, dst_points, True)[0:2]

    target_image = cv2.warpAffine(image, mat, (64, 64))

    return warped_image, target_image
# read_images_from.py
# it read image in given folder path.
import cv2
import os
import numpy as np
from .random_warp import random_warp

def read_images_from(path):
    valid_images = [".jpg",".gif",".png"]
    images = []
    wrap_images = []
    for f in os.listdir(path):
        ext = os.path.splitext(f)[1]
        if ext.lower() not in valid_images:
            continue

        image = cv2.imread(os.path.join(path,f))
        # images.append(image)
        wrap, target = random_warp(image)
        images.append(target)
        wrap_images.append(wrap)

    return np.array(wrap_images), np.array(images)

To train it, we will use the face images which we just prepare in step 2. However, we need to resize our image into (64,64,3) because the first layer of our model accept input is (64,64,3), and we will devide them with 255.0 in order to minimize calculation during training. Then we train our model with 1500 epochs and save it when the training is completed. And it take about 3 hours on my machine to complete.

wraped_face, a_faces = read_images_from("images/the_rock")

a_faces = a_faces.astype('float32') / 255.
wraped_face = wraped_face.astype('float32') / 255.


autoencoder.fit(wraped_face,
                a_faces,
                epochs=1500,
                batch_size=batch_size,
                callbacks=[TensorBoard(log_dir='/tmp/autoencoder')])

autoencoder.save("model/{0}_model.h5".format(args["model"]))
autoencoder.save_weights("model/{0}e_weight.h5".format(args["model"]))

Then run python train.py -m the_rock -n the_rock

After 1500 epochs, the model converges to a loss of 0.0883.

5. Test Model

In testing, we load the saved model and weight. Then we can load images images to swap.

from keras.models import load_model
from libs.read_images_from import read_images_from
import numpy as np
import cv2
import argparse

image_size = 64
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", type=str, default="",
   help="model name", required=True)
args = vars(ap.parse_args())

autoencoder = load_model("model/{0}_face_model.h5".format(args["model"]))
autoencoder.load_weights("model/{0}_face_weight.h5".format(args["model"]))

wrap ,a_faces = read_images_from("images/tests")

# show original image
for (index, img) in enumerate(a_faces):
 cv2.imshow("original_image_" + str(index), img)

a_faces = a_faces.astype('float32') / 255.
wrap = wrap.astype('float32') / 255.

decoded_imgs = autoencoder.predict(a_faces)
decoded_imgs = (decoded_imgs * 255).astype(np.uint8)
for (index, img) in enumerate(decoded_imgs):
 cv2.imshow("swap_image_" + str(index), img)

cv2.waitKey(0)
cv2.destroyAllWindows()

Then run the command python test_model.py -m the_rock to swap to "The Rock" face.

Let's look at the result we got where the original at left and swap at right:

Then run the command python test_model.py -m rathanak to swap to my face.

Let's look at the result we got where the original at left and swap at right:

It's swapped. However, it seems yet close to real face. This can be improve with more epochs or do some parameters turning which I cannot do it on my machine due to my machine is small.

Resources

Summaries

We just finished building a faceswap from scratch. Where we talk about the method we use to build the model, the tool and especially algorithm we use which is one of the most powerful deep learning algorithm nowadays. Then we init train and test the model. However, the result is yet like a real face. So now, it's your turn to improve the model.

That's it! If you have suggestions for more topics to be covered in this post (or in future posts), you can leave a comment below. 😃