New table API endpoints for lower level table access #564

knabar · 2024-06-21T10:10:36Z

This PR adds two new endpoints for lower level table access:

For querying: /webgateway/table/123/rows/ takes a query argument in the usual format and returns matching row numbers. No actual data is returned. Limited paging support by always returning up to MAX_TABLE_SLICE_SIZE rows and taking a start argument from where to start the search. rowCount and end are returned to easily detect the end of results or need for additional calls (with new start set to previous end).
For data retrieval: /webgateway/table/123/slice/ takes numeric lists of rows and columns and performs a slice. Data is returned in columnar format (vs. rows in the current table API). The number of requested rows times columns cannot exceed MAX_TABLE_SLICE_SIZE, which defaults to 1 million. Both GET and POST are supported to allow for large numbers of rows and columns (exceeding the possible query string length). For retrieval of consecutive rows or columns, ranges can be specified as e.g. 1-5 instead of 1,2,3,4,5.

Examples:

/webgateway/table/123/rows/?query=object%3E100000&start=8470

{"rows":[8470,8471,8472,8473,8474,8475,8476],"meta":{"rowCount":8477,"start":8470,"end":8477}}

/webgateway/table/123/slice/?rows=8005-8010&columns=0-1

{"columns":[[114391,114392,114393,114394,114395,114396],[NaN,NaN,NaN,NaN,NaN,NaN]],"meta":{"columns":["object","name"],"rowCount":8477}}

Notes:

Added a way to supply additional formatting options to json.dumps, allowing e.g. for removing whitespace from the output
Unrelated to this PR, since NaNs are allowed in the output but not supported by plain JSON, clients can use JSON5 for parsing

will-moore · 2024-06-28T15:32:01Z

It looks like rowCount and end are based on the whole table rather than the results of the query:
e.g. on merge-ci with webgateway/table/22909/rows/?query=(Count>12)%26(Count<23) I'm seeing:

{
"rows": [
  1,
  3
],
"meta": {
  "rowCount": 13,
  "start": 0,
  "end": 13
}
}

Is that expected?

knabar · 2024-06-28T15:42:19Z

@will-moore Yes, that is expected.

The total number of results is not available, since the query is not run on the whole table, but only from start to start+MAX_TABLE_SLICE_SIZE.

Figuring out the total is left to the client, by adding up the number of returned rows for each call. For tables with less than MAX_TABLE_SLICE_SIZE rows, it'll be just one call anyway.

end is not actually the number of rows of the table, but the row where querying ended, so the client can easily continue querying by passing in the end of the first query as the start of the second. end==rowCount indicates that there is nothing else to query.

start is there to indicate at which row the querying started, which is redundant, since the client would know anyway, but perhaps there are situations where it is useful.

will-moore · 2024-06-28T16:02:05Z

Everything is working just fine for /slice when I use the query correctly, so I tried playing about a bit (Not all these need to be fixed/changed)...
If I make a mistake, the error handling is varied:

E.g. If I make an invalid query like this:
/slice/?rows=3-1&columns=oops
I get a nice "error": "Need to specify comma-separated list of rows and columns".

If I forget what needs to be in the query e.g.
/slice/?rows=3-1
I get a less friendly:

"message": "'NoneType' object has no attribute 'split'",
"stacktrace": "Traceback (most recent call last):\n  File \"/home/omero/workspace/OMERO-web/.venv3/lib64/python3.9/site-packages/omeroweb/webgateway/views.py\", line 1447, in wrap\n    rv = f(request, *args, **kwargs)\n  File \"/home/omero/workspace/OMERO-web/.venv3/lib64/python3.9/site-packages/omeroweb/webgateway/views.py\", line 3586, in perform_slice\n    for item in source.get(\"columns\").split(\",\")\nAttributeError: 'NoneType' object has no attribute 'split'\n"

if I use the range the wrong way around I get the whole table (regardless of the number of rows and columns):

?rows=3-1&columns=4-3

If I go out of range,

/slice/?rows=3-1&columns=1-100

I get "error": "Error slicing table". Don't know how hard/expensive it is to check this before trying the slicing to give a more helpful message?

for more information, see https://pre-commit.ci

knabar · 2024-07-01T13:15:17Z

@will-moore added better error handling and checks

for more information, see https://pre-commit.ci

knabar · 2024-07-02T09:12:55Z

Made some more convenience changes:

Added a collapse query string argument for /webgateway/table/123/rows/ that collapses sequential row numbers into the same format that is supported by the /slice/ call, so the results can be passed back in easily while significantly reducing the amount of data transferred:

{
  "rows": [
    2,
    3,
    "5-7",
    9,
    10,
    "12-20",
    "22-27",
    "33-35"
  ],
  "meta": {}
}

Added additional items to the returned metadata, including columnCount and maxCells to give more information to the client on how to range check requests before submitting them, and partialCount, which is the number of matching rows returned from the /rows/ call. Note that this is not necessarily the number of matches in the whole table, which is why I called it partialCount, but open to suggestions for better names.

"meta": {
  "partialCount": 25,
  "rowCount": 14336,
  "columnCount": 105,
  "start": 0,
  "end": 14336,
  "maxCells": 1000000
}

chris-allan · 2024-07-02T09:30:57Z

I don't think having collapse is a good precedent to set. Looping over the results in pure Python is going to perform wildly differently depending on the result set so knowing whether to use collapse is not something that is easy to do.

Edit: Furthermore, I think it's a usability downgrade. The client then would need to decompress the result set of getWhereList() in order to know which rows actually match or use it with slice(). If you want contiguous slices then exposing read() is better anyway as it just takes the column numbers and a start stop.

will-moore

Error handling is improved thanks.
Looks good.

knabar · 2024-07-03T12:19:48Z

Removed the collapse option after some performance testing that did not show a worthwhile improvement

chris-allan · 2024-07-15T12:54:23Z

Before this goes in we definitely need to expand the ome/openmicroscopy integration tests to cover these new endpoints. Specifically components/tools/OmeroWeb/test/integration/test_table.py.

omeroweb/settings.py

omeroweb/webgateway/urls.py

knabar added 5 commits June 20, 2024 15:49

Add new API endpoints for row-based query and slice

d78b71a

Add doc strings

7e019bf

Simplify code

c73b151

Add comment

ae9e25e

Allow JSON output without whitespace and use for new endpoints

387f991

knabar requested review from chris-allan and will-moore June 21, 2024 10:10

Code cleanup

db88511

knabar force-pushed the feature-row-queries branch from 0fa91ba to db88511 Compare June 21, 2024 10:24

knabar and others added 3 commits July 1, 2024 14:28

Better errors

3623886

Abort when too many items given

72e5d9f

[pre-commit.ci] auto fixes from pre-commit.com hooks

a2104c0

for more information, see https://pre-commit.ci

knabar and others added 8 commits July 1, 2024 16:10

Add max cell setting to metadata

d138dac

Allow collapsing of resulting rows

0ca9a08

Better comments and logging

4c59b42

Return result count for this request

53438ec

Update docstrings

134d6a4

Fix counter when not collapsing

e7962e3

[pre-commit.ci] auto fixes from pre-commit.com hooks

8314ae8

for more information, see https://pre-commit.ci

Add missing arg docstring

9fc9273

will-moore approved these changes Jul 2, 2024

View reviewed changes

Remove optional collapse

e71cf99

Remove boundary check

091e5fc

knabar added this to the 5.27.0 milestone Jul 5, 2024

knabar mentioned this pull request Jul 19, 2024

Tests for get_where_list and slice table API calls ome/openmicroscopy#6397

Merged

sbesson self-requested a review July 22, 2024 21:21

Check row range; check both upper and lower limits

9674aba

sbesson reviewed Jul 24, 2024

View reviewed changes

omeroweb/settings.py Outdated Show resolved Hide resolved

omeroweb/webgateway/urls.py Show resolved Hide resolved

knabar added 2 commits July 24, 2024 10:38

Better setting description

6f6d818

Better method and URL names

c581586

knabar merged commit b071a89 into ome:master Jul 29, 2024
10 checks passed

knabar deleted the feature-row-queries branch July 29, 2024 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New table API endpoints for lower level table access #564

New table API endpoints for lower level table access #564

knabar commented Jun 21, 2024

will-moore commented Jun 28, 2024

knabar commented Jun 28, 2024

will-moore commented Jun 28, 2024

knabar commented Jul 1, 2024

knabar commented Jul 2, 2024

chris-allan commented Jul 2, 2024 •

edited

Loading

will-moore left a comment

knabar commented Jul 3, 2024

chris-allan commented Jul 15, 2024

New table API endpoints for lower level table access #564

New table API endpoints for lower level table access #564

Conversation

knabar commented Jun 21, 2024

will-moore commented Jun 28, 2024

knabar commented Jun 28, 2024

will-moore commented Jun 28, 2024

knabar commented Jul 1, 2024

knabar commented Jul 2, 2024

chris-allan commented Jul 2, 2024 • edited Loading

will-moore left a comment

Choose a reason for hiding this comment

knabar commented Jul 3, 2024

chris-allan commented Jul 15, 2024

chris-allan commented Jul 2, 2024 •

edited

Loading