Date of Award
8-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
School of Computing
Committee Chair/Advisor
Dr. Nianyi Li
Committee Member
Dr. Feng Luo
Committee Member
Dr. Long Cheng
Committee Member
Dr. Siyu Huang
Abstract
In today's digital era, visual data is vital across several domains such as medical diagnostics, scientific imaging, surveillance, and entertainment. However, video data often suffers from degradations like noise, blur, compression artifacts, and low resolution, which degrade quality and downstream usability. Video restoration aims to recover clean, high-fidelity video from such corrupted inputs. Unlike static images, video restoration must maintain temporal consistency across frames, making it a significantly more complex problem. While supervised deep learning methods have achieved state-of-the-art results, they typically require large datasets of paired noisy-clean video datasets that are scarce or impractical to obtain in real-world settings like medical microscopy. Unsupervised methods overcome this challenge by learning directly from noisy data; however, current approaches are often limited in scope: they are often tailored for Gaussian noise, rely on large-scale datasets, or fail under low frame rates and complex motion. Moreover, most methods are tailored to a specific domain and struggle to generalize across diverse modalities such as fluorescence, confocal, and natural-scene videos. This work proposes a suite of robust unsupervised deep learning frameworks designed to overcome these limitations by handling multi-modal video data, adapting to varying motion dynamics, operating under limited data regimes, and effectively denoising across a broad range of noise types and intensities.
In the first study, we propose a novel unsupervised deep learning approach for video denoising that addresses data scarcity and demonstrates robustness against various noise patterns, enhancing its applicability. Our method consists of three modules: a Feature Generator that produces feature maps, a Denoise-Net that generates denoised yet slightly blurry reference frames, and a Refine-Net that restores high-frequency details. By employing a coordinate-based network, we simplify the architecture while effectively preserving fine details in denoised frames. Extensive experiments on simulated and real-world videos, including calcium imaging, show that this approach can denoise corrupted videos without requiring prior noise model knowledge or extensive data augmentation during training.
In the second study, we extend our work to medical imaging by introducing a Deep Temporal Interpolation method, which leverages temporal correlation across long video sequences. This method incorporates a temporal signal filter into the lower CNN layers to restore microscopy videos corrupted by unknown noise types. Our unsupervised framework adapts to diverse noise conditions without requiring prior knowledge of noise distributions, addressing a critical gap in real-world medical applications. Evaluations on real microscopy recordings and simulated data confirm our framework's superior performance across a wide range of noise scenarios. Experiments show that our model consistently outperforms state-of-the-art supervised and unsupervised techniques, making it effective for denoising microscopy videos.
To generalize across the various multi-modal microscopy imaging including 2D time-lapse sequences of different temporal rates and 3D stack volumetric images while maintaining its efficacy in a limited training data setting, while remaining effective in limited‐data settings, the third study presents an innovative Spatio-Temporal Sampling method that generalizes effectively across diverse microscopy imaging modalities without requiring per-modality tuning. Our network integrates a learnable Spatio-Temporal Weighted (STW) kernel guided by optical flow estimation to balance temporal information, a Guided Temporal Fusion module that adaptively warps and gates high-frequency details from the target frames, and a recurrent optimization loop for iterative refinement of optical flow and denoised outputs. Extensive experiments demonstrate that our model effectively denoises microscopy videos, maintaining structural detail and temporal coherence, with multi-modal generalization ability.
In the fourth study, we introduce VR-INR, a novel video restoration method leveraging Implicit Neural Representations (INRs). Our approach integrates hierarchical spatial-temporal-texture encoding with multi-resolution implicit hash encoding, enabling adaptive reconstruction of sharp, noise-free frames from low-resolution inputs. VR-INR is trained solely at a single scale (×4) and effectively generalizes to arbitrary super-resolution scale factors at test time. Furthermore, our model demonstrates zero-shot denoising capabilities without prior training on noisy data. Extensive experiments validate VR-INR's superior performance over state-of-the-art methods in sharpness, detail preservation, and denoising across diverse unseen scales and degradations.
Recommended Citation
Aiyetigbo, Mary Damilola, "Unsupervised Deep Learning for Video Restoration" (2025). All Dissertations. 4088.
https://open.clemson.edu/all_dissertations/4088
Author ORCID Identifier
https://orcid.org/0009-0000-6238-9857