Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending to Image Transformer #1

Open
alexmathfb opened this issue Aug 5, 2021 · 3 comments
Open

Extending to Image Transformer #1

alexmathfb opened this issue Aug 5, 2021 · 3 comments

Comments

@alexmathfb
Copy link

Thanks for this very important work.

I'm trying train Image Transformer as one layer flow and have a few questions I hope you can help me with.

In section 4 you describe how to make PixelCNN++ and related models (including Image Transformer) into a single-layer autoregressive flows.

Question 1. Is it correct that the modifications to be done for PixelCNN++ and Image Transformer are the same because both use DMOL?

Question 2. The below comment states that the PixelCNN++ is raw copy. If I am to extend Image Transformer, can I also just make a raw copy?

# Raw copy of https://github.com/pclucas14/pixel-cnn-pp

Question 3. It seems to me that the AutoregressiveSubsetFlow2d class does not assume PixelCNN++ and thus may work for ImageTransformer. In principle, if I change the following code to use Image Transformer, should it work?

model = AutoregressiveSubsetFlow2d(base_shape = (3,32,32,),

@didriknielsen
Copy link
Owner

Hi and thanks for your interest!

Q1: Yes, for an Image Transformer with DMOL, the setup is the same. Only the neural architecture that parameterizes the flow will be different.

Q2: Yes, that should be fine if you have an implementation of the neural architecture using in the Image Transformer.

Q3: Yes, by passing in the Image Transformer NN as net everything should still work fine.

@alexmathfb
Copy link
Author

You are indeed correct, I managed to get very similar bpd early in training. Will comment tomorrow when training finish.

(image below is bpd loss of Image Transformer trained autoregressively or as single layer normalizing flow)

image

@alexmathfb
Copy link
Author

alexmathfb commented Aug 10, 2021

The training loss curves seem indistinguishable.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants