StyleGANv2 (CVPR'2020)

Analyzing and Improving the Image Quality of Stylegan

Task: Unconditional GANs

Abstract

The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.

Results and Models

Results (compressed) from StyleGAN2 config-f trained by mmagic

Model	Dataset	Comment	FID50k	Precision50k	Recall50k	Download
stylegan2_c2_8xb4_ffhq-1024x1024	FFHQ	official weight	2.8134	62.856	49.400	model
stylegan2_c2_8xb4_lsun-car-384x512	LSUN_CAR	official weight	5.4316	65.986	48.190	model
stylegan2_c2_8xb4-800kiters_lsun-horse-256x256	LSUN_HORSE	official weight	-	-	-	model
stylegan2_c2_8xb4-800kiters_lsun-church-256x256	LSUN_CHURCH	official weight	-	-	-	model
stylegan2_c2_8xb4-800kiters_lsun-cat-256x256	LSUN_CAT	official weight	-	-	-	model
stylegan2_c2_8xb4-800kiters_ffhq-256x256	FFHQ	our training	3.992	69.012	40.417	model
stylegan2_c2_8xb4_ffhq-1024x1024	FFHQ	our training	2.8185	68.236	49.583	model
stylegan2_c2_8xb4_lsun-car-384x512	LSUN_CAR	our training	2.4116	66.760	50.576	model

FP16 Support and Experiments

Currently, we have supported FP16 training for StyleGAN2, and here are the results for the mixed-precision training. (Experiments for FFHQ1024 will come soon.)

Evaluation FID for FP32 and FP16 training

As shown in the figure, we provide 3 ways to do mixed-precision training for StyleGAN2:

stylegan2_c2_fp16_PL-no-scaler: In this setting, we try our best to follow the official FP16 implementation in StyleGAN2-ADA. Similar to the official version, we only adopt FP16 training for the higher-resolution feature maps (the last 4 stages in G and the first 4 stages). Note that we do not adopt the clamp way to avoid gradient overflow used in the official implementation. We use the autocast function from torch.cuda.amp package.
stylegan2_c2_fp16-globalG-partialD_PL-R1-no-scaler: In this config, we try to adopt mixed-precision training for the whole generator, but in partial discriminator (the first 4 higher-resolution stages). Note that we do not apply the loss scaler in the path length loss and gradient penalty loss. Because we always meet divergence after adopting the loss scaler to scale the gradient in these two losses.
stylegan2_c2_apex_fp16_PL-R1-no-scaler: In this setting, we adopt the APEX toolkit to implement mixed-precision training with multiple loss/gradient scalers. In APEX, you can assign different loss scalers for the generator and the discriminator respectively. Note that we still ignore the gradient scaler in the path length loss and gradient penalty loss.

Model	Comment	Dataset	FID50k	Download
stylegan2_c2_8xb4-800kiters_ffhq-256x256	baseline	FFHQ256	3.992	ckpt
stylegan2_c2-PL_8xb4-fp16-partial-GD-no-scaler-800kiters_ffhq-256x256	partial layers in fp16	FFHQ256	4.331	ckpt
stylegan2_c2-PL-R1_8xb4-fp16-globalG-partialD-no-scaler-800kiters_ffhq-256x256	the whole G in fp16	FFHQ256	4.362	ckpt
stylegan2_c2-PL-R1_8xb4-apex-fp16-no-scaler-800kiters_ffhq-256x256	the whole G&D in fp16 + two loss scaler	FFHQ256	4.614	ckpt

As shown in this table, P&R50k_full is the metric used in StyleGANv1 and StyleGANv2. full indicates that we use the whole dataset for extracting the real distribution, e.g., 70000 images in FFHQ dataset. However, adopting the VGG16 provided from Tero requires that your PyTorch version must fulfill >=1.6.0. Be careful about using the PyTorch's VGG16 to extract features, which will cause higher precision and recall.

Citation

@inproceedings{karras2020analyzing,
  title={Analyzing and improving the image quality of stylegan},
  author={Karras, Tero and Laine, Samuli and Aittala, Miika and Hellsten, Janne and Lehtinen, Jaakko and Aila, Timo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={8110--8119},
  year={2020},
  url={https://openaccess.thecvf.com/content_CVPR_2020/html/Karras_Analyzing_and_Improving_the_Image_Quality_of_StyleGAN_CVPR_2020_paper.html},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

StyleGANv2 (CVPR'2020)

Abstract

Results and Models

FP16 Support and Experiments

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

StyleGANv2 (CVPR'2020)

Abstract

Results and Models

FP16 Support and Experiments

Citation