Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent number of iterations over a dataset #745

Open
eldarkurtic opened this issue Nov 5, 2020 · 1 comment
Open

Inconsistent number of iterations over a dataset #745

eldarkurtic opened this issue Nov 5, 2020 · 1 comment
Assignees

Comments

@eldarkurtic
Copy link

Hi!
I have two (somehow related) questions.

  1. The first one is about the reset method.

    def reset(self):
    """
    Resets the iterator after the full epoch.
    DALI iterators do not support resetting before the end of the epoch
    and will ignore such request.
    """
    if self._counter > self._size:
    self._counter = self._counter % self._size
    else:
    logging.warning("DALI iterator does not support resetting while epoch is not finished. Ignoring...")

    What we can see here is that self._counter cycles in a way that will cause that the number of iterations over a dataset is not going to be the same every time. For example, let's say that our dataset has 5 samples and that our batch_size is 4. In the epoch = 0, a data loader will make 2 iterations over a dataset (the first batch will have 4 samples, the second batch will also have 4 samples; I have no idea where do these 3 additional samples in the second batch come from, my guess is that they are padding the last batch to get full batch_size?). At this point the self._counter = 8, and before the epoch = 1 starts the reset method will reset self._counter to self._counter = 3. Now, in the epoch = 1 we will have just one iteration over a dataset (the first batch will have 4 samples and this will increase counter to self._counter = 7 which is larger than self._size and it will break the loop). So in the end, we had 2 iterations over a dataset in the first epoch, and then only 1 iteration in the second epoch. Is this behavior intended or is it a bug? My "dirty" fix for this would be just to reset the counter to zero. Then every epoch will have the same number of iterations.

  2. The second question is about these "padded" samples to fill the last batch. How are they created, randomly chosen or something else?

@a-esp-1
Copy link

a-esp-1 commented Nov 16, 2020

up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants