-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a hint to tell that the row iterator state is the row number in tables with row access #342
Comments
I think there would be a need for "trait" that verifies if a given table has a known number of rows and these rows are indexable (as not all tables support this). E.g. |
Yes! |
OK. I have looked at it a bit more. So:
The only question is how well this API is supported in the ecosystem, but I think that the best way forward is to just start using it and then the packages that miss it can add it. Probably more work needs to be done with (and in PrettyTables.jl maybe what can be done is e.g. try running |
Ah - and there is an internal
? |
Hi @bkamins ! But even if the interface allow us to obtain the number of rows, there is the problem about how to obtain the n-th row. There is nothing in the interface describing what the In fact, I think that AFAIK, the only way today to safely obtain the n-th row, even if the object can return the number of rows, is to iterate through the rows. |
No, the API is In other words: Tables.jl tables should support
|
Perfect! Thank you very much @bkamins ! I will update PrettyTables.jl to improve this use case. |
Unfortunately, it seems that julia> using GeoTables
julia> a = georef((a=rand(1000,1000),));
julia> Tables.subset(a, 1)
ERROR: ArgumentError: no default `Tables.subset` implementation for type: GeoTable{Meshes.CartesianGrid{2, Float64}, NamedTuple{(:a,), Tuple{Vector{Float64}}}}
Stacktrace:
[1] subset(x::GeoTable{Meshes.CartesianGrid{2, Float64}, NamedTuple{(:a,), Tuple{Vector{Float64}}}}, inds::Int64; viewhint::Nothing, view::Nothing)
@ Tables ~/.julia/packages/Tables/AcRIE/src/Tables.jl:632
[2] subset(x::GeoTable{Meshes.CartesianGrid{2, Float64}, NamedTuple{(:a,), Tuple{Vector{Float64}}}}, inds::Int64)
@ Tables ~/.julia/packages/Tables/AcRIE/src/Tables.jl:608
[3] top-level scope
@ REPL[27]:1 @juliohm Is it possible to implement |
Definitely. We will tackle this tomorrow and will let you know.
Em ter., 12 de set. de 2023 17:44, Ronan Arraes Jardim Chagas <
***@***.***> escreveu:
… Unfortunately, it seems that Tables.subet is only optional. For example,
GeoTables does not implement it:
julia> using GeoTables
julia> a = georef((a=rand(1000,1000),));
julia> Tables.subset(a, 1)
ERROR: ArgumentError: no default `Tables.subset` implementation for type: GeoTable{Meshes.CartesianGrid{2, Float64}, NamedTuple{(:a,), Tuple{Vector{Float64}}}}
Stacktrace:
[1] subset(x::GeoTable{Meshes.CartesianGrid{2, Float64}, NamedTuple{(:a,), Tuple{Vector{Float64}}}}, inds::Int64; viewhint::Nothing, view::Nothing)
@ Tables ~/.julia/packages/Tables/AcRIE/src/Tables.jl:632
[2] subset(x::GeoTable{Meshes.CartesianGrid{2, Float64}, NamedTuple{(:a,), Tuple{Vector{Float64}}}}, inds::Int64)
@ Tables ~/.julia/packages/Tables/AcRIE/src/Tables.jl:608
[3] top-level scope
@ REPL[27]:1
@juliohm <https://github.com/juliohm> Is it possible to implement
Tables.subset for GeoTables? I am planning to check if there is a valid
implementation of Tables.subset to obtain the i-th row. If not, I will
fallback to the default iteration. Hence, if you implement this function,
that problem related with vcrop_mode = :middle will vanish.
—
Reply to this email directly, view it on GitHub
<#342 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZQW3JBBB2ZB6ZP6UOJ34TX2DCRRANCNFSM6AAAAAA4NLS5NU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
And also I guess preferably |
We do support nrow and ncol from DataAPI.jl, thanks for checking @bkamins. As we discussed in another issue, it would be really nice if we could port these names to Tables.nrows and Tables.ncols so that anything related to the tables interface lives in a single package. |
Yes. Let us wait for other maintainers to comment there. |
@ronisbr a new patch release is available with the Please let us know if something else is missing.
It would be nice if Tables.jl had other maintainers with decision power besides @quinnj, to advance things more quickly. |
In general I like the fact that we have a single person who maintains a coherent design. I don't think think that an interface package like Tables.jl should prioritize quick changes; a lot of packages in the ecosystem rely on the API so it should move carefully. |
Yeah, On the @juliohm, any JuliaData admin/maintainer is free to make changes to Tables.jl and don't require approval from me. We've built a, IMO, healthy process to act by committee though, hence the invitation from other maintainers to weigh in. IMO, there's always going to be a bit of awkwardness between DataAPI.jl and Tables.jl interfaces. Some interfaces more obviously belong to one package or the other, but it also doesn't make sense for them to all live in one or the other. The functions in DataAPI have, so far, been those that aren't exactly tables-specific, so a more generic home seems appropriate for packages where it wouldn't make sense to take Tables.jl as a dependency but they want to overload a data-related interface. |
Can Tables.jl with unknown rows be considered the exceptional case instead
of the most common case? Perhaps a separate trait to distinguish the two
cases and then move nrows and ncols to Tables.jl? That would be more
intuitive IMO.
Em qua., 13 de set. de 2023 12:36, Jacob Quinn ***@***.***>
escreveu:
… Yeah, Tables.subset is the right answer here, as pointed out by @bkamins
<https://github.com/bkamins>.
On the Tables.rowcount question, that's really only an internal function
meant to operate on column-oriented table sources. So it's safe to call on
the result of Tables.columns, but not otherwise, so that makes me
hesitate to set it as the default for DataAPI.nrows.
@juliohm <https://github.com/juliohm>, any JuliaData admin/maintainer is
free to make changes to Tables.jl and don't require approval from me. We've
built a, IMO, healthy process to act by committee though, hence the
invitation from other maintainers to weigh in.
IMO, there's always going to be a bit of awkwardness between DataAPI.jl
and Tables.jl interfaces. Some interfaces more obviously belong to one
package or the other, but it also doesn't make sense for them to all live
in one or the other. The functions in DataAPI have, so far, been those that
aren't exactly tables-specific, so a more generic home seems appropriate
for packages where it wouldn't make sense to take Tables.jl as a dependency
but they want to overload a *data*-related interface. nrows/ncols are
admittedly a bit awkward here because they *seem* tables related, but in
reality, they impose a more strict interface that what Tables.jl requires
(i.e. Tables.jl doesn't require the # of rows to be known), so it seemed
prudent to have them live in DataAPI. So the awkwardness is understandable,
but hopefully the clarification here helps explain the current arrangement?
—
Reply to this email directly, view it on GitHub
<#342 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZQW3NOVP2E2MAOWSRCRD3X2HHJLANCNFSM6AAAAAA4NLS5NU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Alternatively define nrows(table) = undef by default. We just need the name
in the context of tables as opposed to general data.
Em qua., 13 de set. de 2023 12:41, Júlio Hoffimann <
***@***.***> escreveu:
… Can Tables.jl with unknown rows be considered the exceptional case instead
of the most common case? Perhaps a separate trait to distinguish the two
cases and then move nrows and ncols to Tables.jl? That would be more
intuitive IMO.
Em qua., 13 de set. de 2023 12:36, Jacob Quinn ***@***.***>
escreveu:
> Yeah, Tables.subset is the right answer here, as pointed out by @bkamins
> <https://github.com/bkamins>.
>
> On the Tables.rowcount question, that's really only an internal function
> meant to operate on column-oriented table sources. So it's safe to call on
> the result of Tables.columns, but not otherwise, so that makes me
> hesitate to set it as the default for DataAPI.nrows.
>
> @juliohm <https://github.com/juliohm>, any JuliaData admin/maintainer is
> free to make changes to Tables.jl and don't require approval from me. We've
> built a, IMO, healthy process to act by committee though, hence the
> invitation from other maintainers to weigh in.
>
> IMO, there's always going to be a bit of awkwardness between DataAPI.jl
> and Tables.jl interfaces. Some interfaces more obviously belong to one
> package or the other, but it also doesn't make sense for them to all live
> in one or the other. The functions in DataAPI have, so far, been those that
> aren't exactly tables-specific, so a more generic home seems appropriate
> for packages where it wouldn't make sense to take Tables.jl as a dependency
> but they want to overload a *data*-related interface. nrows/ncols are
> admittedly a bit awkward here because they *seem* tables related, but in
> reality, they impose a more strict interface that what Tables.jl requires
> (i.e. Tables.jl doesn't require the # of rows to be known), so it seemed
> prudent to have them live in DataAPI. So the awkwardness is understandable,
> but hopefully the clarification here helps explain the current arrangement?
>
> —
> Reply to this email directly, view it on GitHub
> <#342 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAZQW3NOVP2E2MAOWSRCRD3X2HHJLANCNFSM6AAAAAA4NLS5NU>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Hi!
We have a problem in PrettyTables.jl related to huge matrices with row access. The current method, AFAIK, to access an element in those tables is to iterate the rows until we reach the desired one and then retrieve the column:
This method is OK for any usage that requires to read the entire data. However, when you are printing huge tables, you usually want a part of the table. In the case of PrettyTables.jl, we have a mode in which we print the beginning and the end of the table. Hence, for each printed column, we need to iterate through all the rows to the very bottom, even though we only need a few of them, leading to a very slow process.
The problem appeared here: ronisbr/PrettyTables.jl#217 (comment)
If we add to the Tables.jl API some hint that the state in the iteration is the row id, I can access, let's say, the element in the row 1,000,000 by just:
instead of looping through the entire iterator.
In the case of GeoTables.jl, which uses the row ID as the state, the printing process was reduced from 8s to 0.001s when we only show 10 rows and 10 columns with "middle" cropping mode.
The text was updated successfully, but these errors were encountered: