Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多人图片的识别问题 #4

Open
xinzi2018 opened this issue Mar 16, 2022 · 15 comments
Open

多人图片的识别问题 #4

xinzi2018 opened this issue Mar 16, 2022 · 15 comments

Comments

@xinzi2018
Copy link

xinzi2018 commented Mar 16, 2022

发现在多人图片的识别效果很不好,如下图的右上角的结果。
image

然后我使用一下代码的处理方式(先通过detectronV2检测出人体框,然后将人体框逐个人体输入到网络中,最后进行融合)

def ClothSegMultiGen(self,img_cv,size=-1,):

        img = Image.fromarray(cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB))

        w,h = img.size

        body_boxes, _, sub_bodys = self.face_analysis.DetectronV2BodyBox(img_cv)
        total_rate = np.zeros((4,img_cv.shape[0],img_cv.shape[1]))-float('inf')  # 初始化一个数值最小的矩阵
        if len(sub_bodys)!=0:
            output_img  = np.zeros()
            for i in range(len(sub_bodys)):
                total_sub_rate = np.zeros((4,img_cv.shape[0],img_cv.shape[1]))-float('inf')  # 初始化一个数值最小的矩阵

                left, top, right, bottom = body_boxes[i][0],body_boxes[i][1],body_boxes[i][0]+body_boxes[i][2],body_boxes[i][1]+body_boxes[i][3]
                sub_img = img.crop((left, top, right, bottom))
                sub_img, sub_rate, sub_img_color   = self.ClothSegGen(sub_img,640)
                if i==0:

                    total_rate[:,top:bottom,left:right]= sub_rate
                else:
                    # np.argmax(sub_rate, axis=1)
                    total_sub_rate[:,top:bottom,left:right]= sub_rate
                    total_rate = maxTwoNumpy(total_sub_rate,total_rate)

            output_img = np.argmax(total_rate, axis=0)
            output_img_color = self.indexColor(output_img,w,h)
        else:
            output_img,_,output_img_color   = self.ClothSegGen(img,640)

def ClothSegGen(self,img_cv,size=-1,): #size表示短边长度
        img = Image.fromarray(cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB))
        w,h = img.size
        if size!=-1:
            if w>h:
                h_out = size
                w_out = w * h_out // h
            else:  
                w_out = size
                h_out = h * w_out // w

            img = img.resize((w_out,h_out))

        image_tensor = self.transform_rgb(img)
        image_tensor = torch.unsqueeze(image_tensor, 0)
        print('衣服识别时输入网络的图片大小:',image_tensor.shape)
        output_tensor = self.net(image_tensor.to(self.device))
        output_tensor = F.log_softmax(output_tensor[0], dim=1)

        output_tensor_ori = output_tensor.clone()
        output_tensor = torch.max(output_tensor_ori, dim=1, keepdim=True)[1]  # troch.max()[1], 只返回最大值的每个索引


       
        output_tensor = torch.squeeze(output_tensor, dim=0)
        output_tensor = torch.squeeze(output_tensor, dim=0)
        output_arr = output_tensor.cpu().numpy()
        output_img = Image.fromarray(output_arr.astype("uint8"), mode="L")

        
        output_img_color = self.indexColor(output_img,w,h)



        # 单独处理出概率的数据
        output_tensor0 = output_tensor_ori.clone() # 但是此时的float并非0-1之间的概率值
        output_tensor0 = torch.squeeze(output_tensor0, dim=0) 
        # output_tensor0 = torch.squeeze(output_tensor0, dim=0)
        output_rate = output_tensor0.cpu().numpy() # 4*h*w
 

        return output_img,output_rate,output_img_color 

但是发现这种方式有很大的融合问题(上一个人体框会压到下一个上面,比如上图中下面那张图片左边两个人连接处的问题)。不知道是不是因为log_softmax的问题?
因为以前是sigmoid的到的概率值,用同样的融合方式都能很正确的融合。

@levindabhi
Copy link
Owner

I am not sure what your actual question is. If you are asking about the issue of wrong output in the marked region from the below image, then it is caused by these total_rate[:,top:bottom,left:right]= sub_rate and total_sub_rate[:,top:bottom,left:right]= sub_rate two lines of code.
158549394-08990b60-0a52-40da-8370-7ab48b7c48e9out

I guess it should be total_rate[:,top:bottom,left:right] += sub_rate and total_sub_rate[:,top:bottom,left:right] += sub_rate with total_rate and total_rate initializes to zero

@levindabhi
Copy link
Owner

Also, your idea of using DetectronV2 is nice. I would be more than happy to add a feature of cloth segmentation in multi-person picture into this repo through your pull request

@xinzi2018
Copy link
Author

xinzi2018 commented Apr 8, 2022

我已经找到问题的关键所在了!!!
1.需要在生成时候将 output_tensor = F.log_softmax(output_tensor[0], dim=1) 修改为 output_tensor = F.softmax(output_tensor[0], dim=1) 。
2.生成出来的output_tensor尺寸是1/4/H/W,其中的output_tensor[0,0,:,:]中的白色区域(概率大的数据)位置表示的是背景,而我最一开始时候的初始化均为'-float(inf)',这将导致最后融合的时候出现问题
细节的处理代码如下:

def ClothSegMultiGen(self,img_cv,size=-1,):

        img = Image.fromarray(cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB))
        w,h = img.size
        body_boxes, _, sub_bodys = self.detectbody.gen_bodybox(img_cv)
        print('监测到的人物个数为:',len(sub_bodys))
        total_rate = torch.zeros(1,4,img_cv.shape[0],img_cv.shape[1]).float()  # 初始化一个数值最小的矩阵
        total_rate[:,0,:,:] =  1 # 初始化一个数值最小的矩阵
        if len(sub_bodys)>1:
            for i in range(len(sub_bodys)):
                print('----------------------------------')
                total_sub_rate = torch.zeros(1,4,img_cv.shape[0],img_cv.shape[1]).float()   # 初始化一个数值最小的矩阵
                total_sub_rate[:,0,:,:] = 1   # 初始化一个数值最小的矩阵
                left, top, right, bottom = int(body_boxes[i][0]),int(body_boxes[i][1]),int(body_boxes[i][0])+int(body_boxes[i][2]),int(body_boxes[i][1])+int(body_boxes[i][3])
                sub_img = img.crop((left, top, right, bottom))
                sub_w,sub_h = sub_img.size
                sub_img, sub_rate, sub_img_color   = self.ClothSegGen(sub_img)
                total_sub_rate[0,:,top:bottom,left:right]= sub_rate
                total_rate[:,1:,:,:] = torch.max(total_sub_rate[:,1:,:,:],total_rate[:,1:,:,:])
                total_rate[:,0,:,:] = torch.min(total_sub_rate[:,0,:,:],total_rate[:,0,:,:])  
                a = torch.max(total_rate[:,:,:,:], dim=1, keepdim=True)[1]  # troch.max()[1], 只返回最大值的每个索引
             

                a = torch.squeeze(a, dim=0)
                a = torch.squeeze(a, dim=0)
                output_arr = a.cpu().numpy()
                output_img = Image.fromarray(output_arr.astype("uint8"), mode="L")

            
                output_img_color = self.indexColor(output_img,w,h)
               
        else:
            output_img,_,output_img_color   = self.ClothSegGen(img,640)


        return output_img,_,output_img_color 

最终的结果图如下:
image

@davichen2017
Copy link

我已经找到问题的关键所在了!!! 1.需要在生成时候将 output_tensor = F.log_softmax(output_tensor[0], dim=1) 修改为 output_tensor = F.softmax(output_tensor[0], dim=1) 。 2.生成出来的output_tensor尺寸是1/4/H/W,其中的output_tensor[0,0,:,:]中的白色区域(概率大的数据)位置表示的是背景,而我最一开始时候的初始化均为'-float(inf)',这将导致最后融合的时候出现问题 细节的处理代码如下:

def ClothSegMultiGen(self,img_cv,size=-1,):

        img = Image.fromarray(cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB))
        w,h = img.size
        body_boxes, _, sub_bodys = self.detectbody.gen_bodybox(img_cv)
        print('监测到的人物个数为:',len(sub_bodys))
        total_rate = torch.zeros(1,4,img_cv.shape[0],img_cv.shape[1]).float()  # 初始化一个数值最小的矩阵
        total_rate[:,0,:,:] =  1 # 初始化一个数值最小的矩阵
        if len(sub_bodys)>1:
            for i in range(len(sub_bodys)):
                print('----------------------------------')
                total_sub_rate = torch.zeros(1,4,img_cv.shape[0],img_cv.shape[1]).float()   # 初始化一个数值最小的矩阵
                total_sub_rate[:,0,:,:] = 1   # 初始化一个数值最小的矩阵
                left, top, right, bottom = int(body_boxes[i][0]),int(body_boxes[i][1]),int(body_boxes[i][0])+int(body_boxes[i][2]),int(body_boxes[i][1])+int(body_boxes[i][3])
                sub_img = img.crop((left, top, right, bottom))
                sub_w,sub_h = sub_img.size
                sub_img, sub_rate, sub_img_color   = self.ClothSegGen(sub_img)
                total_sub_rate[0,:,top:bottom,left:right]= sub_rate
                total_rate[:,1:,:,:] = torch.max(total_sub_rate[:,1:,:,:],total_rate[:,1:,:,:])
                total_rate[:,0,:,:] = torch.min(total_sub_rate[:,0,:,:],total_rate[:,0,:,:])  
                a = torch.max(total_rate[:,:,:,:], dim=1, keepdim=True)[1]  # troch.max()[1], 只返回最大值的每个索引
             

                a = torch.squeeze(a, dim=0)
                a = torch.squeeze(a, dim=0)
                output_arr = a.cpu().numpy()
                output_img = Image.fromarray(output_arr.astype("uint8"), mode="L")

            
                output_img_color = self.indexColor(output_img,w,h)
               
        else:
            output_img,_,output_img_color   = self.ClothSegGen(img,640)


        return output_img,_,output_img_color 

最终的结果图如下: image
=====================================================
请问你复现的安装环境是怎样的?为啥我复现出现报错呢?
TypeError: Caught TypeError in DataLoader worker process 0.
TypeError: 'float' object cannot be interpreted as an integer

是数据集方面的问题吗?

@xinzi2018
Copy link
Author

在这个代码中 ,这两个问题我好像没遇到过,实际记不太清了。
关于“TypeError: Caught TypeError in DataLoader worker process 0.”这个问题 大概是因为你在dataloader里面你的worker设置的问题,设置成0应该是不报错了
关于“'float' object cannot be interpreted as an integer” 具体报错的定位是哪里呢?

我的环境是torch1.8 torchvision0.9 python3.6.5 cuda11.1

@davichen2017
Copy link

davichen2017 commented Apr 25, 2022

在这个代码中 ,这两个问题我好像没遇到过,实际记不太清了。 关于“TypeError: Caught TypeError in DataLoader worker process 0.”这个问题 大概是因为你在dataloader里面你的worker设置的问题,设置成0应该是不报错了 关于“'float' object cannot be interpreted as an integer” 具体报错的定位是哪里呢?

我的环境是torch1.8 torchvision0.9 python3.6.5 cuda11.1

我后来,将以下强制转化为int型。就不会出现这个错误了。
self.image_info[index]["orig_height"] = int(row["Height"])
self.image_info[index]["orig_width"] = int(row["Width"])

我的环境是torch1.11 torchvision0.12 python 3.8 cuda 11.3

@davichen2017
Copy link

在这个代码中 ,这两个问题我好像没遇到过,实际记不太清了。 关于“TypeError: Caught TypeError in DataLoader worker process 0.”这个问题 大概是因为你在dataloader里面你的worker设置的问题,设置成0应该是不报错了 关于“'float' object cannot be interpreted as an integer” 具体报错的定位是哪里呢?

我的环境是torch1.8 torchvision0.9 python3.6.5 cuda11.1

训练后,我在评估预测时,总是报内存不足问题。但是我的显卡内存为6G。按理说完全可以的。
不知你是否遇到过此问题呢?

@xinzi2018
Copy link
Author

会出现这个问题。在输入网络中的图片过大时 就会显存不够

@davichen2017
Copy link

会出现这个问题。在输入网络中的图片过大时 就会显存不够

我是在test数据集里随便找一张,都出现显存不够。

@xinzi2018
Copy link
Author

图片多大尺寸?

@davichen2017
Copy link

分辨率:2832*4256 ;96dpi

@xinzi2018
Copy link
Author

这个大小确实是会显存不够。我在24G的显卡上批量生成的时候,有个别图都会显存不够。

@davichen2017
Copy link

嗯嗯,我刚才找了一张800*1200的可以

@davichen2017
Copy link

请教你:看了你写的多人识别的函数很好。我们没测试,识别率高吗?

@karndeepsingh
Copy link

我已经找到问题的关键所在了!!! 1.需要在生成时候将 output_tensor = F.log_softmax(output_tensor[0], dim=1) 修改为 output_tensor = F.softmax(output_tensor[0], dim=1) 。 2.生成出来的output_tensor尺寸是1/4/H/W,其中的output_tensor[0,0,:,:]中的白色区域(概率大的数据)位置表示的是背景,而我最一开始时候的初始化均为'-float(inf)',这将导致最后融合的时候出现问题 细节的处理代码如下:

def ClothSegMultiGen(self,img_cv,size=-1,):

        img = Image.fromarray(cv2.cvtColor(img_cv, cv2.COLOR_BGR2RGB))
        w,h = img.size
        body_boxes, _, sub_bodys = self.detectbody.gen_bodybox(img_cv)
        print('监测到的人物个数为:',len(sub_bodys))
        total_rate = torch.zeros(1,4,img_cv.shape[0],img_cv.shape[1]).float()  # 初始化一个数值最小的矩阵
        total_rate[:,0,:,:] =  1 # 初始化一个数值最小的矩阵
        if len(sub_bodys)>1:
            for i in range(len(sub_bodys)):
                print('----------------------------------')
                total_sub_rate = torch.zeros(1,4,img_cv.shape[0],img_cv.shape[1]).float()   # 初始化一个数值最小的矩阵
                total_sub_rate[:,0,:,:] = 1   # 初始化一个数值最小的矩阵
                left, top, right, bottom = int(body_boxes[i][0]),int(body_boxes[i][1]),int(body_boxes[i][0])+int(body_boxes[i][2]),int(body_boxes[i][1])+int(body_boxes[i][3])
                sub_img = img.crop((left, top, right, bottom))
                sub_w,sub_h = sub_img.size
                sub_img, sub_rate, sub_img_color   = self.ClothSegGen(sub_img)
                total_sub_rate[0,:,top:bottom,left:right]= sub_rate
                total_rate[:,1:,:,:] = torch.max(total_sub_rate[:,1:,:,:],total_rate[:,1:,:,:])
                total_rate[:,0,:,:] = torch.min(total_sub_rate[:,0,:,:],total_rate[:,0,:,:])  
                a = torch.max(total_rate[:,:,:,:], dim=1, keepdim=True)[1]  # troch.max()[1], 只返回最大值的每个索引
             

                a = torch.squeeze(a, dim=0)
                a = torch.squeeze(a, dim=0)
                output_arr = a.cpu().numpy()
                output_img = Image.fromarray(output_arr.astype("uint8"), mode="L")

            
                output_img_color = self.indexColor(output_img,w,h)
               
        else:
            output_img,_,output_img_color   = self.ClothSegGen(img,640)


        return output_img,_,output_img_color 

最终的结果图如下: image

Hi @xinzi2018 @levindabhi @davichen2017 .
I want to extract different classes from the detected image with a confidence level. How I can do so with the code provided in the repository for inferencing?

for image_name in images_list:
    img = Image.open(os.path.join(image_dir, image_name)).convert('RGB')
    img_size = img.size
    img = img.resize((768, 768), Image.BICUBIC)
    image_tensor = transform_rgb(img)
    image_tensor = torch.unsqueeze(image_tensor, 0)
    
    output_tensor = net(image_tensor.to(device))
    output_tensor = F.log_softmax(output_tensor[0], dim=1)
    output_tensor = torch.max(output_tensor, dim=1, keepdim=True)[1]
    output_tensor = torch.squeeze(output_tensor, dim=0)
    output_tensor = torch.squeeze(output_tensor, dim=0)
    output_arr = output_tensor.cpu().numpy()

    output_img = Image.fromarray(output_arr.astype('uint8'), mode='L')
    output_img = output_img.resize(img_size, Image.BICUBIC)
    # output_img.save(os.path.join(result_dir, image_name[:-4]+'_generated.png'))
    output_img.putpalette(palette)
    output_img.save(os.path.join(result_dir, image_name[:-4]+'_generated.png'))

Need your help.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants