An Efficient Video Compression Framework using Deep Convolutional Neural Networks (DCNN)

: In the current world, video streaming has grown in popularity and now accounts for a large percentage of internet traffic, making it challenging for service providers to broadcast videos at high rates while utilizing less storage space. To follow inefficient analytical coding design, previous video compression prototypes require non-learning-based designs. As a result, we propose a DCNN technique that integrates OFE-Net, MVE-Net, MVD-Net, MC-Net, RE-Net, and RD-Net for getting an ideal collection of frames by linking each frame pixel with preceding and following frames, then finding linked blocks and minimizing un needed pixels. In terms of MS-SIM and PSNR, the proposed DCNN approach produces good video quality at low bit rates.


Introduction
People who watch videos on the internet are about 90%, this is expected to rise in the near future. As a result, an effective video compression model is required to deliver higher-quality frames while using less bandwidth.
Video codecs compress videos using hand-drawn models. Despite their superb design, the present models are poorly optimized. The video compression process can be improved even more by tweaking the entire codec model.
Deep neural networks have outperformed classic picture codecs like the Joint Photographic Experts Group in video compression. Deep neural networkbased models that rely on extremely nonlinear transformations require end-to-end training.
It's not easy to create a model that uses a variety of video compression algorithms. Motion estimation, which creates and compresses motion data, is the most important part. To remove temporal redundancy, video compression significantly relies on motion information. The only way to express motion vectors is to use an optical flow net. Although learning-based optical flow estimation focuses on obtaining exact flow data, proper optical flow isn't always the best solution for specific video applications. Furthermore, the ability of optical flow data is greater than that of existing models, resulting in high bit rate information when optical flow values are directly compressed using existing methods.
Reduced rate-distortion aims to provide higher-quality reconstructed frames at the same bit rate. It is essential for proper video compression to technique.
Rate distortion must be decreased to achieve the benefits of end-to-end training for deep learning-based video compression models. The following are the model's key benefits: All steps of the DCNN model are implemented using deep neural networks. The DCNN model is based on rate-distortion and uses a single loss function to combine all of the steps, resulting in a high compression ratio. This study will aid researchers working on computer vision, video compression, and deep model creation. Kumar and Janaki (2020), the video compression task can be categorized into three types. They are the classical era, the era of generic heuristics, and the era of modern techniques with deep learning. Through the detailed study of the literature through the past decades, it is learned that various schemes have been proposed for video compression. These schemes have contributed a lot of efficient mechanisms in different ways. However, further improvements are also needed towards the same pertaining to the limitations observed as specified. Birman et al. (2020), illustrate and explain various issues for the video compression process in the field of DNNs. Still , additional investigation is looking to achieve the upcoming generation and neural networks-based codecs. Ranjan and Black (2017) ,have presented a deep network with a fast and lightweight model for the optical flow process. The previous pyramid feature is replaced with a U-shaped network and this model btains o better results. And this model can help computer vision applications. Dai et al. (2009) , described an optical flow approach that provides the features of deep learningbased optical flow algorithms. This approach gives better accuracy results compared with an existing method and surpasses it in several benchmarks. Bao et al. (2019), the Estimation and Compensation of Motion (MEMC) neural network is proposed for learning and improving video frame interpolation. This model takes advantage of the MEMC framework's capabilities for managing massive amounts of motion data, as well as learning-based methods for extracting features quickly. Many video enhancement activities can be done with this MEMC framework. The qualitative and quantitative evaluation of these methods against state-of-the-art video interpolation and improvement algorithms on various standard data sets demonstrates that they outperform them. Wu et al. (2020), describes a video compression framework based on deep learning which provides MV and RP network. Here the experiment results show that MV and RP networks be able to improve the performance of compression by modeling spatial correlations among the frames accurately. Chen et al. (2017), present an efficient video compression framework based on deep learning. Here comparison of × 264, with this Deep Coder has shown a similar type of coding efficiency (lossy) with the familiar testing series used by the video coding society and video compression on deep learning is an alternative framework for the process of video coding in feature. Chen et al. (2019), propose PMCNN and modeled spatiotemporal to achieve predictive-based coding and a learning-based framework of an effective process for video compression is explored. Even though lack of entropy-based coding and still this achieves a better result for video compression, exhibiting new attainable handling of video compression. Balle et al. (2016), presented a nonlinear transform coding-based image compression method and a framework to optimize it end-to-end for rate-distortion performance. Nevertheless, additional visual improvements might be possible in terms of a perceptual metric like MSE, if the method were optimized. Balle et al. (2018) ,provide a variational auto encoderbased image compression trainable model. When evaluating rate-distortion performance using a traditional metric based on squared error, this model leads to picture compression when using the MS-SSIM index and it outperforms ANN-based techniques when using a traditional metric based on squared error (PSNR).

Related Work
The limitations of existing work are: • Lu et al. (2019) and (Yang et al., 2020a), the encoding procedures are more complex due to the processing of large-size videos • Lu et al. (2019) and (Yang et al., 2020b), at decoding, the performance of video quality is degraded with very usual frame drops during the encoding process

Materials and Methods
The above limitations are overcome in the proposed approach and the objectives of the proposed work are: • To reduce the storage space occupied by video • To decrease the time taken for video during transfer • To enhance the video quality with a better compression ratio Introducing the symbols: Assume V = {F1, F2, ...Ft-1, Ft ...} represents the sequences of current video and at time step t, Ft is frame. The symbols F̅ t and F̂t represent predicted frames and reconstructed or decoded frames. The residual information or error information between original frame Ft and predicted frame F̅ t is Rt.
The reconstructed (decoded residual) information is denoted by R̂t. In order, motion information is essential to reduce temporal redundancy. Among them, the optical flow or motion vector value represents Vt and its corresponding reconstructed form is V̂t. To improve the compression efficiency, either linear transform or nonlinear transform techniques can be used. Consequently, residual information Rt is converted to Yt, motion information Vt converted to Mt, and corresponding quantized versions R̂t and M t respectively. The detailed architecture of proposed DCNN approach is shown in Fig. 1 and description of each step is as follows: Step 1: Motion estimation We use an OFE Net to estimate the optical flow, which is considered as motion information Vt: Step 2: Motion compression The MVE-MVD net is proposed for compressing and decoding optical flow values. A sequence of convolution and nonlinear-transform procedures are used to extract or provide the optical flow Vt. After then, M t is quantized to Mt. The MVD net obtains the quantized representation, which is subsequently used to reconstruct the motion information V̂t. Entropy coding will also be done using the quantized representation M t: Step 3: MC net The motion compensation network obtains predicted frame F̅ t, which is near to the current frame Ft as possible, using both previously reconstructed frame F̂t-1 and motion vector V̂t. In beginning, the previous frame F̂t-1 warped to the present frame using motion information V̂t. However, there are still artifacts in the warped frame. To remove these artifacts, we send warped frame W (F̂t-1, Vt), reference frame F̂t-1, and the motion vector V̂t into another CNN, which produces the refined predicted frame F̅ t. The proposed method follows a pixelbased motion compensation strategy that gives more precise temporal information: Step 4: RE net and RD net The residual encoder network encodes the remaining data that exists between the original frame and the forecasted frame. Our method may more effectively harness the potential of non-linear transform and produce higher compression efficiency as compared to the discrete cosine transform used in the conventional video compression system: Step 5: BRE net The quantized motion information M t from Step 2 and residual information Ŷt from Step 4 is coded into bits and transmitted to the decoder during the testing stage. From the training stage, by employing CNNs for the number of bits costs are estimated (BRE Net in figure) and subsequently acquire the probability distribution of each symbol in M t and Ŷt: Step : 6 Frame reconstruction By adding F̅ t in Step 3 and R̂t in Step 4, obtains the reconstructed frame F̂t, i.e., F̂t = F̅ t +R̂t: Training and Testing Guo et al. (2019), provides the training (Vimeo) and testing (UVG & JCT-VC) datasets. Our proposed DCNN technique trained the 90k dataset (Vimeo) and tested the Ultra-Video-Group dataset (UVG) and Joint Collaborative Team-Video Coding dataset (JCT-VC).

Results and Discussion
The Average is calculated by PSNR and bpp in Tables 1, 2, 3, and 4.
The Average is calculated by MS-SSIM and bpp in Tables 5, 6, 7, and 8.
And also, the proposed approach (in terms of PSNR metric) proves to reduce the error rate with the following MSE Table 9, 10, 11, and 12 (mean square error rate) for the tested data sets are also given.
Similarly, the proposed approach (in terms of MS-SSIM metric) proves to reduce the error rate with the following MSE Table 13, 14, 15, and 16 (mean square error rate) for the tested data sets are also given.
The proposed approach proves for better compression ratio (in terms of PSNR) with the following Tables 17, 18, 19 and 20 for the tested data sets are given.
Similarly, the proposed approach proves for better compression ratio (in terms of MS-SSIM) with the following Tables 21, 22, 23, and 24 for the tested data sets are given.
Similarly, the compressed/output video takes less transmission time than original/input video.

Conclusion
In this study, we propose an DCNN-based efficient video compression framework. We also demonstrate how our DCNN technique outperforms both commonly used traditional video compression standards and more recent deep learning-based video compression solutions. Our proposed DCNN model offers a higher compression ratio and lower error rates because it enhances better video quality while using low bit rates.