Discussion on batch fetch strategy #28

JingyunLiang · 2018-04-03T09:42:07Z

In your TF codes, you first shuffle person ids and then repeat them forever. In training, you choose batch_p from the dataset according to the queue. For one person, you randomly choose batch_k examples each time. Am I right? For this situation, I have two questions:

The order of person ids is repeated. It means that each person will only be compared within batch_p=25 people around them. You know, some person are easier to identify. You just maximize the margins in this small group.
How about choosing examples of one person in a repeated way ? It enables every example to be trained repeated. I know that randomly choose first batch_k is theoretically OK. What about their differences on performance? Are them totally equivalent?

By the way, based on your codes, I try to implement a ResNet-50 fine-tuning baseline (just modify the last FC layer) for image classification (on CUB-200-2011). For testing, I feed data to the model in a normal ordered way. But the test accuracy (for classification) is only 22%. (It is supposed to be abound 81%). The training accuracy rises to 100% and the loss drops to 0.03 in 5000 iterations (about 100 epochs). Though training accuracy is of nonsense in this situation. What might be wrong? Is is due to the sampling strategy? Thank you.

The text was updated successfully, but these errors were encountered:

lucasb-eyer · 2018-04-03T11:35:27Z

Thanks for your comments!

No, I believe your understanding of tf.data is incorrect; I recommend looking at the documentation. Specifically, there is the reshuffle_each_iteration flag which is enabled by default. Alternatively, you could just try it out in a small example, which is what we did back when we implemented it and the documentation was very bad 😄
Sorry, I don't understand what you mean, maybe you are right. Could you give an example of the sampling you are thinking of?
re CUB-200-2011: please keep issues focused and discuss separate points in separate issues. You actually opened one for this already. I don't think this has anything to do with the sampling, but rather what you describe is classic overfitting of a huge model on a tiny dataset, which can be combated in many different ways, but in general is hard to do.

JingyunLiang · 2018-04-03T12:19:23Z

Thank your for your quick answer.
1, I output datset.make_one_shot_iterator().get_next(), and find that it do reshuffle each iteration. Sorry that I am new to tensorflow.

2, Just repeat examples (maybe 60 images per class) of a class (or a person) infinitely. And choose index of 1-25, 26-50,51-75... for the first fetch, second fetch and the third fetch... I mean to fetch training examples in an order and ensure that every image is used.

3, Glad that you didn't close this issue^_^. Actually, this issue is about batch fetch strategy. I further tried to test the training set with checkpoint 1500. Theoretically, 1500 iterations with batch 25*4=100 equals to about 25 epochs for a training set with 5994 images. It is generally enough for the fine-tuning of ResNet-50. For checkpoint 1500, the real training loss (calculated using normal batch fetch strategy) and training accuracy are 2.58 and 42% respectively. the testing loss and testing accuracy are 3.58 and 28% respectively. It indicates that the network needs more training (not over-fitting).

However, the used loss (calculated using your batch fetch strategy) for training in Tensorflow is close to 0.01, as shown in the above figure, which means that the model will almost not optimize any more. I want to use your batch fetch strategy for image classification problems.

lucasb-eyer · 2018-04-03T12:46:33Z

No worries, just always checking the docs is a good idea 😉
Oh, I think I see what you mean now. It's actually not a good idea in Market1501 to ensure all images are seen, because some PIDs have a huge number of images and many have just very few. IIRC the first two or so PIDs are actually the dataset authors and seen many, many times. That would bias the network a lot towards them, and lose importance of many "rare" PIDs. We actually did something similar to that in the very first experiments (in Theano) and it was worse maybe 1-2% mAP.
I see why you mention it in this issue now. Although I'm still somewhat confused, you are talking about softmax-classification now, right? I don't have experience with softmax-classification and this batching strategy, though what you report does sound wrong either way. Probably there is a mistake in the batch-loss computation? You could compute some other batch-level statistics, like the batch-level classification accuracy. But definitely write unit-tests for these things.

JingyunLiang changed the title ~~Discussion on batch fetch startegy~~ Discussion on batch fetch strategy Apr 3, 2018

Pandoro added the discussion (not bug) label Feb 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion on batch fetch strategy #28

Discussion on batch fetch strategy #28

JingyunLiang commented Apr 3, 2018

lucasb-eyer commented Apr 3, 2018

JingyunLiang commented Apr 3, 2018

lucasb-eyer commented Apr 3, 2018

Discussion on batch fetch strategy #28

Discussion on batch fetch strategy #28

Comments

JingyunLiang commented Apr 3, 2018

lucasb-eyer commented Apr 3, 2018

JingyunLiang commented Apr 3, 2018

lucasb-eyer commented Apr 3, 2018