Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std: do not modify iterator state in TokenIterator.peek() #20427

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gpanders
Copy link
Contributor

A common property of "peek()" functions is that they provide the next element in the iterator without modifying the internal iterator state. This is true of many other peek() functions in the Zig stdlib but is not true of TokenIterator.peek(), which advances the iterator's index to the beginning of the next token.

This can result in surprising behavior if a user of this function expects peek() to not modify the internal iterator state, especially since the doc comment of the function says that it "Does not advance to the next token".

A similar PR was made recently but closed by the author before it could be discussed: #20379

A common property of "peek()" functions is that they provide the next
element in the iterator without modifying the internal iterator state.
This is true of many other peek() functions in the Zig stdlib but is not
true of TokenIterator.peek(), which advances the iterator's index to the
beginning of the next token.

This can result in surprising behavior if a user of this function
expects peek() to not modify the internal iterator state, especially
since the doc comment of the function says that it "Does not advance to
the next token".
@squeek502
Copy link
Collaborator

One kinda nice property of the existing implementation is that there is a reduction of redundant work done when calling peek and then next.

For example, if we have the string:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaZZ

and use tokenizeScalar(str, 'a'), then:

  • calling peek would advance the index to just before the ZZ, and return the ZZ slice
  • calling next afterwards would just need to re-read ZZ and return it

With peek not advancing index, then:

  • calling peek would iterate over all the a's and return the ZZ
  • calling next would again iterate over all the a's and return the ZZ

See the original PR that implemented peek for the very similar justification used at the time: #11925 (comment)

@gpanders
Copy link
Contributor Author

While my preference is for peek() to not modify the iterator state, I wouldn't object to the status quo so long as it's clearly documented that self.index is modified (the existing doc comment is a bit misleading, or at least ambiguous).

@andrewrk
Copy link
Member

andrewrk commented Jul 3, 2024

The problem statement here seems weak. The API upholds its guarantees. In what context was this discovered and determined to be problematic?

@gpanders
Copy link
Contributor Author

gpanders commented Jul 3, 2024

The context was looking at the following segment of code:

var iter = std.mem.tokenizeScalar(u8, line, ' ');
var ws_start: usize = 0;
while (iter.peek() != null) {
    const whitespace = line[ws_start..iter.index];
    const word = iter.next().?;
    ws_start = iter.index;
    // the rest of the code which operates on `word` and `whitespace`
}

When I encountered this code I was confused how it worked at all based on the assumption that peek() does not modify the iterator state (if that were true, ws_start would always equal iter.index and whitespace would always be an empty slice). It was not until I looked at the implementation of TokenIterator.peek() that I found that index was modified.

(The above code has since been rewritten to avoid this confusion in either case, but this is what prompted the initial question and PR.)

The API upholds its guarantees

Strictly speaking this is true, though explicitly stating that the index is/can be modified might help to avoid future confusion. If a functional change is unwanted I can repurpose this PR into a documentation change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants