Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change I: Iterator to I: IntoIterator #2

Open
jerry73204 opened this issue Nov 17, 2020 · 4 comments
Open

Change I: Iterator to I: IntoIterator #2

jerry73204 opened this issue Nov 17, 2020 · 4 comments

Comments

@jerry73204
Copy link

Consensus::model() for example, using I: IntoIterator instead of I: Iterator allows users to pass a vector directly. It gives us some convenient to avoid writing data.into_iter().

let data = vec![a, b, c];
consensus.model(&estimator, data);
@vadixidav
Copy link
Member

@jerry73204 I don't think we should do this. The reason is that the iterator is cloned MANY times. It could even be cloned millions of times, depending on the amount of data and the algorithm. The iterator is being used to look up the nth item repeatedly. It would be non-intuitive if you could pass a data structure in and it slowed down your algorithm massively. The intended use is to allocate the container somewhere else and then create an iterator tied to its lifetime that is trivial to clone.

If you think there might be a better way to do this, we can do that, but I would prefer to avoid breaking changes as well. Let me know what you think.

@jerry73204
Copy link
Author

Even we do not have IntoIterator, we can pass vec.into_iter(), causing great cloning cost. Though it could be mintigated by a cloning iterator vec.iter().cloned(), the nth() looking up still incurs O(n) cost. In my case, the Data is a five dimensional point, so a cloning iterator is not cheap. I see it might be resolved by Data = &'a T. I tried it once but got a trouble at something I can't remember.

To recap, my thought is that the original design already suffers cloning cost issue. Having IntoIterator does not save it, but gives user a little bit of convenience.

@vadixidav
Copy link
Member

@jerry73204

The implementation of nth does not take O(n) time: https://doc.rust-lang.org/beta/src/core/slice/iter/macros.rs.html#183

Additionally, the iterator which is cloned is only two pointers: https://doc.rust-lang.org/beta/src/core/slice/iter.rs.html#64-70

The intended use case is to pass vec.iter().copied() or vec.iter().cloned(). This will only take O(1) lookup time, and it is optimized into a simple index under the hood when compiled in release mode.

If you only have five small data points, then it probably isn't that impactful for you if there are a few extra allocations because the vector is cloned. However, it might come as a surprise to a user who has a dataset of millions of elements, in which case execution may never terminate if they accidentally pass the whole data structure.

Unfortunately, there is no container trait that allows borrowing the underlying items (SomeTrait::iter()), and this is not currently possible in part because we would need a streaming iterator to solve this problem, and that is not even possible to create without associated lifetime bounds in Rust (see this RFC: https://github.com/rust-lang/rfcs/blob/master/text/1598-generic_associated_types.md). Effectively, the collection trait would need to parameterize the returned iterator (and the iterator its returned type) with the lifetime of the collection, and this is not possible in the current Rust type system.

As a stop gap measure, we just take an arbitrary clonable iterator over the values rather than an abstraction over both the iterator and the iterable collection (collection traits are not possible due to a lack of generic associated type as per above RFC). The user is forced by the type system to clone the values of the iterator from the underlying references. This is done with .copied() or .cloned().

Let me know if there are any remaining issues. If we need to improve the documentation here to be more transparent, I would be glad to do that.

@jerry73204
Copy link
Author

The implementation of nth does not take O(n) time: https://doc.rust-lang.org/beta/src/core/slice/iter/macros.rs.html#183

Very interesting finding. For the copying cost, your concern is correct. The owning iterator vec.into_iter() is given copies the entire buffer in .clone().

Better if you can clarify it more in documentation and suggest vec.iter().copied() over vec.into_iter(). That what we could do before the lifetime issue could be solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants