Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer

Kepeng Xu 1     Li Xu1     Gang He † 1 Wenxin Yu 2 Yunsong Li 1
Corresponding authors
1Xidian University       
2Southwest University of Science and Technology
IJCAI 2024

Demo

Abstract

Multiple complex degradations are coupled in low-quality video faces in the real world. Therefore, blind video face restoration is a highly challenging ill-posed problem, requiring not only hallucinating high-fidelity details but also enhancing temporal coherence across diverse pose variations. Restoring each frame independently in a naive manner inevitably introduces temporal incoherence and artifacts from pose changes and keypoint localization errors. To address this, we propose the first blind video face restoration approach with a novel parsing-guided temporal-coherent transformer (PGTFormer) without pre-alignment. PGTFormer leverages semantic parsing guidance to select optimal face priors for generating temporally coherent artifact-free results. Specifically, we pre-train a temporal-spatial vector quantized auto-encoder on high-quality video face datasets to extract expressive context-rich priors. Then, the temporal parse-guided codebook predictor (TPCP) restores faces in different poses based on face parsing context cues without performing face pre-alignment. This strategy reduces artifacts and mitigates jitter caused by cumulative errors from face pre-alignment. Finally, the temporal fidelity regulator (TFR) enhances fidelity through temporal feature interaction and improves video temporal consistency. Extensive experiments on face videos show that our method outperforms previous face restoration baselines. The code will be released on https://github.com/kepengxu/PGTFormer .

Video

Network Architecture

DDNeRF_Architecture_v21

Top: The architecture of PGTformer.

Quantitative Results

DDNeRF_Architecture_v21

Quantitative comparison on VFHQ Blind setting.

Qualitative Results

DDNeRF_Architecture_v21

Visual comparison results of different methods.

Acknowledgement

This Research is Supported by the National Key Research and Development Program from the Ministry of Science and Technology of the PRC (No.2021ZD0110600), Sichuan Science and Technology Program (No.2022ZYD0116), Sichuan Provincial M. C. Integration Office Program, and IEDA Laboratory of SWUST, Grant CEIEC-2022-ZM02-0247. Thanks to Dr. Li Xu and Gang He for their constructive suggestions on this paper.

BibTeX

@inproceedings{kpgtformer,
  author    = {Kepeng Xu and Li Xu and Gang He and Wenxin Yu and Yunsong Li},
  title     = {Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer},
  booktitle = {IJCAI},
  year      = {2024},
 }