Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing pretrained model - depth #15

Open
emergencyd opened this issue Jan 19, 2019 · 8 comments
Open

testing pretrained model - depth #15

emergencyd opened this issue Jan 19, 2019 · 8 comments
Assignees

Comments

@emergencyd
Copy link

emergencyd commented Jan 19, 2019

image

I'm using the pretrained model in folder "rgb2depth". I want to reproduce "loss = 0.35". Which data should I use?

I've tried the "depth_zbuffer" test data, but "l1_loss, loss_g, loss_d_real, loss_d_fake" are around "0.6,0.7,0.9,0.7".

I suppose that I used wrong data or wrong loss... should I use "depth_eucliean" data?

Thank you!

@b0ku1
Copy link
Member

b0ku1 commented Jan 20, 2019

@alexsax could you please comment on which models we used for testing?

@b0ku1
Copy link
Member

b0ku1 commented Jan 20, 2019

Also, did you use mask to mask out depth values that are "bad"? Basically we extracted loss values from mesh, and since there are holes in mesh, some depth values in ground truth are too high (we masked these values out during training).

@alexsax
Copy link
Collaborator

alexsax commented Jan 20, 2019

@b0ku1 that's a good point. That might explain the relatively high losses. @emergencyd we reported the l1 loss--so that's the one that you should pay attention to.

One contributing factor might be that the models that we released were trained on an internal set of images that were processed a bit differently than the released data. The internal set always has a FoV of 75 degrees, but the released data has a range of 45-75 degrees. The pretrained networks don't work as well on images with a narrow FoV, like those in the release set. You can verify this for yourself on the Taskonomy demo site.

@emergencyd do you notice that the losses are significantly better for large-fov images?

@b0ku1
Copy link
Member

b0ku1 commented Jan 21, 2019

@emergencyd changing to rgb-large won't change the FoV (field of view) problem. That's a discrepancy between the internal and public dataset. Basically we trained and tested on images with fixed FoV (internal), but for more general use for public, the release dataset has varying FoV.

re mask: @alexsax does the released dataset come with mask?

@alexsax
Copy link
Collaborator

alexsax commented Jan 21, 2019

Seconding @b0ku1 above.

And no need for an explicit mask—just check for pixels where the depth is equal to (or very close to) the max value, 2^16-1 :)

Finally, depth Euclidean is the distance from each pixel to the optical center. Depth z-buffer is something else (see the sup mat for the full description!).

@emergencyd
Copy link
Author

now I can see "full_plus", "full", "medium", "small" splits information of the whole dataset, but can't find the fov information of each image~ where should I get them~

Also, do I need to drop out the pixels with extremely high values? (if I understand it right)

@alexsax
Copy link
Collaborator

alexsax commented Jan 22, 2019

now I can see "full_plus", "full", "medium", "small" splits information of the whole dataset, but can't find the fov information of each image~ where should I get them~

The pose files :)

Also, do I need to drop out the pixels with extremely high values? (if I understand it right)

Yes, exactly.

@emergencyd
Copy link
Author

emergencyd commented Jan 25, 2019

  1. According to supplementary materiel, I use depth_zbuffer rather than depth_euclidean as my target depth mask.

  2. Then I use the "field_of_view_rads" information to pick the images with larger than 1.3 rads.

  3. Then I use the code below to process the target image and calculate l1-loss

        #####################
        #### target data ####
        #####################
        img_t = load_raw_image_center_crop(target_name, color=False) 
        mask_filt = np.where(img_t >= 2**16 - 1, 0, 1) 

        if 1:
            img_t[img_t>=2**16-1] = 0
            img_t[img_t==0] = np.max(img_t)    
        
        if 1:
            img_t = cfg['target_preprocessing_fn']( img_t, **cfg['target_preprocessing_fn_kwargs'])   
        else:
            img_t = load_ops.resize_image(img_t, [256, 256, 1])
            
        img_t = img_t[np.newaxis, :]
        
        if 1:
            mask_filt = load_ops.resize_image(mask_filt, [256, 256, 1])        
            weight_mask = mask_filt[np.newaxis, :] 
        else:
            weight_mask = np.ones(np.shape(img_t))   
            
        #####################
        ###### predict ######
        #####################            
        predicted, representation, losses = training_runners['sess'].run([m.decoder_output, m.encoder_output, m.losses], feed_dict={m.input_images: img, m.target_images: img_t, m.masks: weight_mask})   

Have noticed that there is a function "depth_single_image", I try it and calculate the loss again:

        if 1:         
            predicted = depth_single_image(predicted)
            diff = np.abs(predicted - img_t)
            diff[weight_mask == 0] = 0
            l1_loss = np.sum(diff)/(np.sum(weight_mask))

But still, the loss seems not right(around 0.15). But this time, the generated prediction pic seems same as the result in the demo website:
image

I guess there is something wrong with my processing of target images, and I'm quite confused now.

@alexsax

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants