- Inbasekaran Perumal : [email protected]
- Pranav Koundinya : [email protected]
- Prof. Sumam David S
- Prof. Deepu Vijayasenan
In this project, we attempt to develop a neural network model which can restore noisy images. Unlike the conventional approach, the model is trained on noisy images, instead of clean images as ground truth. The images have been trained on Gaussian noise, with zero mean and a nonzero variance.
In traditional machine learning approach, image restoration is performed by creating a training dataset with noisy images
This approach, thus, requires the possession of a clean dataset as a prerequisite. However, in many scenarios, this luxury is not affordable. Examples include MRI images and satellite images, where we have only the noisy images, but not their denoised/clean counterparts. This shortcoming can be overcome if we have knowledge about the distribution of the noise in the images. In this new approach, we train the neural networks with only noisy image pairs instead of clean-noisy pairs.
This approach is based on the assumption that the average of all the noisy images in the training set is the clean/denoised image.
Gaussian noise with zero mean, is one among the many types of noise which satisfies the above condition. The neural network is trained with Gaussian noised images. In a sense, the network learns to restore images without actually knowing what denoising is, or what the clean image looks like.
Image restoration has played and will continue to play, an important role in research, commercial photography, medical imaging, surveillance and remote sensing. In the field of medical imaging, and remote sensing, the images obtained will be predominantly noisy, and must be restored before any further processing. When we try to train neural networks to perform this, availability of ample training data becomes a bottleneck. Unfortunately this may not be possible in many cases. This project aims to lift the requirements on this availability, thus enabling us to train deep neural networks to restore noisy images for which there are no sources to learn from.
The main literary source for the project is the paper titled “Noise2Noise: Learning Image Restoration without Clean Data”, published by researchers from NVIDIA, Aalto University, and MIT CSAIL. On a first reading, it was understood that the paper focussed on restoring images using deep learning, but it achieved this without providing any denoised images to compare at all. The data used was obtained from various sources, including ImageNet, Kodak, BSD300, SET14. In summary, the paper aimed to show that computers can learn to turn bad images into good images by only looking at bad images
Two networks have been used to implement the algorithm: UNet(Ronneberger et. al, 2015) and REDNet(Mao et.al, 2016). UNet has series of convolution and deconvolution layers, with skip connections to aid gradient propagation. The UNet that we have used has 7 convolution(encoder) layers and 11 deconvolution(decoder) layers. REDNet, on the other hand, has a series of convolution and deconvolution layers with padding, so that the height and width of the images remains same. The skip connections, which were concatenation units are replaced by addition units. The RED we have used has 5 convolution(encoder) and 5 deconvolution(decoder) layers.
The images have all been corrupted with Gaussian noise before feeding them to the neural network. Gaussian noise refers to the random noise obeying the Gaussian distribution
Gaussian noise can be generated in Python using the numpy.random.randn() function, which takes in the output shape, mean and standard deviation as the parameters. Throughout the training process, the mean has been chosen to be zero, wheras the model was trained for three values of standard deviation-0.01, 0.02, and 0.03. The chosen values satisfy eq.(3).
The dataset used for training the networks is BSD500(Berkeley Segmentation Dataset) which consists of 300 images for training, 100 images for validation and 100 images for testing. The images are normalized and noised before feeding them to the neural network for training. The training was performed for 40 epochs using the Adam(Kingma et. al, 2015) optimizer. The loss function used was mean squared error.
Two performance metrics have been used for evaluating the performance of the model:
It is the ratio between the maximum possible power of an image and the power of corrupting noise that affects the quality of its representation. To estimate the PSNR of an image, it is necessary to compare that image to an ideal clean image with the maximum possible power. The PSNR of a noisy image
where MSE denotes the mean squared error.
The Structural Similarity Index (SSIM) is a perceptual metric that quantifies image quality degradation caused by processing such as data compression or by losses in data transmission. The SSIM of two images
where
The above figure compares the output of UNet and REDNet architectures. While the REDNet tries to remove noise by smoothing(the network has learnt Gaussian denoising), the UNet tries to retain the crispness and clarity in the image, although it is slightly granular. The PSNR of the output image in the case of REDNet has a mean PSNR of 30.30dB while the UNet has a mean PSNR of 26.53dB. The evolution of loss function during training is shown in the below figure.
Next, we look at each model and its ability to restore the image for various values of the noise in the input image. RedNet predictions visualization the outputs of the REDNet model for inputs with standard deviation of noise,
As is expected, the ability of a network to denoise an image reduces as the noise value it is trained to remove, increases. Similar trend is observed in the case of both REDNet and UNet architectures. However, the rate at which the image quality drops is more higher in the case of UNet than REDNet: in the case of UNet, the SSIM drops from 0.645 to 0.3746 as
Mean SSIM | Mean PSNR(in dB) | |
---|---|---|
0.01 | 0.64533997 | 30.13082857986193 |
0.02 | 0.46611664 | 32.33277383797268 |
0.03 | 0.37464416 | 33.119557193755526 |
Table 1: Performance of UNet
Mean SSIM | Mean PSNR(in dB) | |
---|---|---|
0.01 | 0.5770542 | 30.955274337690753 |
0.02 | 0.48596337 | 31.431179280986903 |
0.03 | 0.43532306 | 32.044127396845184 |
Table 2: Performance of REDNet
The conventional approach is to feed the neural network with a large amount of data and with noisy inputs and clear outputs, in the presence of which the neural network will be able to learn the concept of noise and when presented with a new previously unseen noisy image, it will be able to clear it up. This requires clean images for training purposes but the new approach does not need clean HQ images to be fed into the neural network. Normally we would say this is impossible and end this project. However, under suitable constraints like knowing the distribution of noise opens up the possibility of restoring noisy signals without seeing the clean one. It has also been shown that this technique can do close to as nearly as good as the classical denoising neural networks. It can potentially be used in healthcare to restore scanned MRI images or in satellite imagery where having access to clean images is nearly impossible due to technological limitations. Once the training process is completed it takes milliseconds to render the new image. But there is also a dark side to this, It can be used to remove text noise which presents a significant problem for copyright infringement.
[1] : Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala,and Timo Aila, Noise2Noise: Learning Image Restoration without Clean Data in Proceedings of the 35th International Conference on Machine Learning, 2018.
[2] : Xiao-Jiao Mao, Chunhua Shen, Yu-Bin Yang, Image Restoration Using Convolutional Auto-encoders with Symmetric Skip Connections in NIPS, 2016.
[3] : Olaf Ronneberger, Philipp Fischer, Thomas Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation in MICCAI 2015.
[4] : https://github.com/SunnerLi/Simple-Hourglass
[5] : https://www.youtube.com/watch?v=P0fMwA3X5KI&t=6s
[6] : https://www.youtube.com/watch?v=dcV0OfxjrPQ
[8] : https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/
[9] : https://en.wikipedia.org/wiki/Structural\_similarity
[10] : https://www.geeksforgeeks.org/python-peak-signal-to-noise-ratio-psnr/