- Usage
- Theory
- Experiments with different Parameters
- NST for Data Augmentation
- How Back Propagation work in NST
- Credits
Config file contains all the parameters you can tune
- MODEL
- There are three options
- "VGG19"
- Generates good results but heavy model
- "VGG16"
- Generates average results but comparitively lighter model than VGG19
- "MobileNet"
- Dont use this. It was just used for experimentation
- Maybe because of depthwise seperable convolution the generated images clarity is poor
- "VGG19"
- There are three options
- CONTENT_IMAGE_PTH
- absolute or realtive path of content image
- STYLE_IMAGE_PTH
- absolute or realtive path of style image
- RESHAPE_CONTENT_IMAGE
- Desired height of new content image
- If there are too many pixels GPU memory may get exhausted and could take lot of time to train. In such cases reduce the image resolution
- range : 1 to {preferably number smaller than content image height}
- Width will be automatically adjusted according to aspect ratio
- Desired height of new content image
- RESHAPE_STYLE_IMAGE
- Desired height of new style image
- If the style image have huge resolution then CNN will not be able to capture the patterns in image
- In such cases try to reduce the resolution of style image so that patterns in the style image are easily captured
- Also, If there are too many pixels GPU memory may get exhausted and could take lot of time to train. In such cases reduce the image resolution
- range : 1 to {preferably number smaller than style image height}
- Width will be automatically adjusted according to aspect ratio
- Desired height of new style image
- OUTPUT_PATH
- absolute or realtive path of output folder
- EPOCS
- number of epocs to train the generated image
- with LEARNING RATE 1e1 sweet spot is somewhere around 1500-2000
- range : 0 to +inf
- LEARNING_RATE
- Learning rate of generated image. In other words speed with which generated image is updated
- range : 0 to +inf
- CONTENT_WEIGHT
- weight of content component of loss
- range : 0 to +inf
- STYLE_WEIGHT
- weight of style component of loss
- for VGG19 good content and style weight is around (1e6,1e3-2e3)
- range : 0 to +inf
- SAVE_FREQ
- Saving frequency of the output image in terms of epocs
- If set to 10 output image will be saved after every 10 epocs
- range : 1 to
NOTE : Dont worry about difference in size of content or style image. Any size of image works
Now we are in an era where neural networks generate art. This reository is an implementation of the paper Neural Style Transfer Paper by Leon A. Gatys, Alexander S. Ecker, Matthias Bethge.
The paper presents and algorithm for combining the content of one image with the style of another image using Convolutional Neural Networks. Here is and example of combining my content image with and style image made of black and white texture.
Styled image | Normal Image |
---|---|
![]() |
![]() |
After Style Transfer |
Consider the example applying various styles on content image related to nature
Style Image | Final Result |
---|---|
![]() |
![]() |
![]() |
![]() |
Loss Function
Loss is calculated using both content image and style image
NST employs a pretrained convolution neural network (CNN) to transfer styles from a given image to another. This is done by defining a loss function that tries to minimise the differences between a content image, a style image and a generated image so that the texture of style image blend with the content image generating beautiful images.
In the above image you can observe how style loss is calculated form feature maps of multiple layers and the content loss is calculated form a single set of featrure maps
Style loss is calculated over multiple layers so that various texture sizes are considered while adding style to the target image. We will explain this in detail.
Finally loss is calculated between style image, target image and content image, target image. This loss is added in a weighted fasion which determines how much style is added to a given image.
There are two components of loss as described above
Therefore total loss is
Where S is style image, C is content image, G is generated Image and is weight of Content Loss and
is weight of style loss. We will see how these weight affect the target image (output).
Before going further lets understand how feature maps are generated.
In the above image we can observe how a single convolution happend. The final cube which is at far right of the image is called feature maps/activation maps where there 64 feature maps each of shape 224*224.
Content loss represent the difference in the content of the content image and generated image. At a given layer of CNN feature maps are generated for both content image and target image. Then we calculate mean square error loss between each corresponding feature maps. In this repository we mainly use VGG19 where we use 2nd convolutions feature maps for calculating content loss.
Hence at a given layer of CNN is feature map number
is the index of the individual feature. According to above diagram i=224, j=224,k = 64
Style loss mesasures the differnece between style of generated image and style image. It help in maintaining the texture patterns from the style image into generated image.
Gram matrix(GM) is generated from feature maps which used to generate style loss. Gram matrix is a correlation matrix among different feature map channels at a givnen layer.
In the above image correlation is calculated among each feature map at a given layer to generate gram matrix(GM). Therefore if the shape of the feature map of syle image is 445 then the shape of gram matrix is55$. Because correlation is calculate across each feature map of shape 44 with all other feature maps resulting in 5*5 combinations
Here means number at position i,j in gram matrix of layer l in CNN when style image is used as input.
Style loss is calculated from multiple layers as shown above. different layers in CNN extract different level of patterns in image (eg : initial layers extract edges then colours and textures, then larger patterns). Hence considering different layers of CNN for style loss will help in adding various levels (edges, colours, small patterns, large patterns) of texture/style to the generated image.
Total Style loss
where L represent the number of layers considered for style loss
Gram Matrix is correlation matrix between feature maps. What this basically means is given N feature maps. If a given feature map is hugely activated and when a correlation is calculated with anther feature map which is also activated then the corresponding value of gram matrix will be high. As a result it forces the generated images gram matrix to have similar style patterns.
Changing the weight of style loss and content loss will change the predominant feature of the image. If the content weight is larger then content will be dominant feature of generated and vice versa. Below images are generated from different weights of styles
NST if used properly can be very helpful in data augmentation. One good example is converting daytime of a city to night time as shown below
Content Image | generated_image |
---|---|
![]() |
![]() |
![]() |
![]() |
Observe the converted image have the night effect of style image like lights on building walls etc.
In general back propogation is used to update weights of the neural network. But in NST we sue back propagation to update the input image so that we genearte stylised image.
Initially genearted image is filled with guassian noise. However while converting generated image to tensor we set the generated image tensor to trainable. Also all the parameters in pretrained CNN network are set to non-trainable. When we multiply the generated image with CNN generated image will become the part of network and since we set the input image to trainable, loss values start updating the generated image converting it into stylised image.
- use it for data augmenttion
- how tensor graph update image in back prop
- why gram matrix work
- change in style size can recognise patterns easily (how to generate good style)
- This implementation is inspired from this tutorial on YouTube by Aleksa Gordic
- Some of the images i have use for explanation are from here