Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize permission lookups for a user #10906

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from

Conversation

stevenwinship
Copy link
Contributor

@stevenwinship stevenwinship commented Oct 3, 2024

What this PR does / why we need it: The need to query "what dataverses does User x have Permission y on"

Which issue(s) this PR closes: #6467

Special notes for your reviewer: I'm not crazy about the api being called /allowedcollections/. Any suggestions for a better name would be appreciated.

Suggestions on how to test this: See IT test. Also test a group within a group for nested group permissions. I have tested it manually and it was working. For Shib testing I tested the sql manually. This still needs to be tested in a running environment.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: Included

Additional documentation:

@stevenwinship stevenwinship self-assigned this Oct 3, 2024
@stevenwinship stevenwinship added Feature: Permissions Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) labels Oct 3, 2024
@coveralls
Copy link

coveralls commented Oct 3, 2024

Coverage Status

coverage: 20.856% (-0.01%) from 20.869%
when pulling be2c143 on 6467-optimize-permission-lookups-for-a-user
into a0cb73d on develop.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@stevenwinship stevenwinship removed their assignment Oct 4, 2024
@pdurbin pdurbin self-assigned this Oct 4, 2024
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial feedback. My eyes glazed over at the SQL but I'll request a review from @scolapasta for that part.

@@ -0,0 +1,9 @@
The following API have been added:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we've finalized the name of the API endpoint, please add it to the API Guide.

@@ -0,0 +1,9 @@
The following API have been added:

/api/users/{identifier}/allowedcollections/{permission}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with this name. I would probably camel case it as allowedCollections.

How would we extend this API to datasets or files, if we needed to? Like below?

/api/users/{identifier}/allowedDatasets/{permission}
/api/users/{identifier}/allowedFiles/{permission}

Just playing around below, maybe instead we could have...

/api/users/{identifier}/permissions/collections/{permission}
/api/users/{identifier}/permissions/datasets/{permission}
/api/users/{identifier}/permissions/files/{permission}

? You might want to get some other opinions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the top APIs better. I think the allowedCollections etc. is more descriptive of what is being returned.

@qqmyers @scolapasta Any comment :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either set of names.

Comment on lines +8 to +9
Valid Permissions are: AddDataverse, AddDataset, ViewUnpublishedDataverse, ViewUnpublishedDataset, DownloadFile, EditDataverse, EditDataset, ManageDataversePermissions,
ManageDatasetPermissions, ManageFilePermissions, PublishDataverse, PublishDataset, DeleteDataverse, DeleteDatasetDraft, and "any" as a wildcard option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to list these here but it might be nice to have an API to list all these permissions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't have bother listing them in the release note but I wanted to point out the "any" permission for a wildcard. But since this wasn't asked for maybe we just leave it out of the release notes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but do you think it's potentially useful to list these permissions via API? Maybe not for this PR but in the future.

The permissions should probably be listed in the guides for now. That way, the release note could link to them.

Copy link
Contributor Author

@stevenwinship stevenwinship Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An API would be nice. There is already a way to list the permissions in a role for the UI so adding a simple list of all permissions should be a quick add.


/api/users/{identifier}/allowedcollections/{permission}

This API lists the dataverses/collections that the user has access to via the permission passed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cases where the user is granted access at runtime based on their membership in Shibboleth groups or IP groups?

We have this related comment in the code:

/**
 * We don't expect this to support Shibboleth groups because even though
 * a Shibboleth user can have an API token the transient
 * shibIdentityProvider String on AuthenticatedUser is only set when a
 * SAML assertion is made at runtime via the browser.
 */

On a related note, that comment is above the call to this method on ServiceDocumentManagerImpl

public List<Dataverse> getDataversesUserHasPermissionOn(AuthenticatedUser user, Permission permission)

Should this method be replaced by the new one in this PR:

public List<Dataverse> findPermittedCollections(AuthenticatedUser user, int permissionBit)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll swap it out and test it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pdurbin I added IP Group support but unfortunately ServiceDocumentManagerImpl doesn't have access to the request and therefore the ip address of the caller. Unless you know of a way to get it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an obvious way. However, the SWORD API has never supported IP Groups and to my knowledge nobody has asked for it, so I think it's ok.

The main use case for IP Groups is read-only access, such as walking into a library and having access to data because you are on the library's IP range.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a way to pass the IPAddress from the request made to Sword.

src/main/java/edu/harvard/iq/dataverse/api/Users.java Outdated Show resolved Hide resolved

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@cmbz cmbz added Size: 10 A percentage of a sprint. 7 hours. FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) and removed Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) labels Oct 9, 2024
@pdurbin pdurbin added the Type: Bug a defect label Oct 9, 2024
@scolapasta scolapasta self-assigned this Oct 16, 2024
@cmbz cmbz added the FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) label Oct 23, 2024
@cmbz cmbz added the FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) label Nov 7, 2024
Copy link
Contributor

@scolapasta scolapasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, finally am managing to review - looks good overall, a few questions, easier to ask as a general comment imo:

  1. Does the sql query deal with nested groups (I think so, with the "With" but wanted to confrim?
  2. Does the last union (ip groups) need the exists / permissions bit clause?
  3. Do we have any sense on performace of this query yet?

@@ -0,0 +1,9 @@
The following API have been added:

/api/users/{identifier}/allowedcollections/{permission}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either set of names.

@stevenwinship
Copy link
Contributor Author

Hi, finally am managing to review - looks good overall, a few questions, easier to ask as a general comment imo:

  1. Does the sql query deal with nested groups (I think so, with the "With" but wanted to confrim?
  2. Does the last union (ip groups) need the exists / permissions bit clause?
  3. Do we have any sense on performace of this query yet?
  1. Yes it handles groups within groups. I did test that
  2. Sorry it's a bit confusing. The permission bit part is added in the"AND @IPRANGESQL" by calling findPermittedCollections
  3. There hasn't been any formal performance testing but local tests as well as testing on the perf server were pretty good.

This comment has been minimized.

@scolapasta
Copy link
Contributor

Hi, finally am managing to review - looks good overall, a few questions, easier to ask as a general comment imo:

  1. Does the sql query deal with nested groups (I think so, with the "With" but wanted to confrim?
  2. Does the last union (ip groups) need the exists / permissions bit clause?
  3. Do we have any sense on performace of this query yet?
  1. Yes it handles groups within groups. I did test that
  2. Sorry it's a bit confusing. The permission bit part is added in the"AND @IPRANGESQL" by calling findPermittedCollections
  3. There hasn't been any formal performance testing but local tests as well as testing on the perf server were pretty good.

OK on 1 and 3. For 2, maybe I'm still missing it - for the query part added in lines 143-152 for the ip groups, I don't see any reference to permisison bits).

@stevenwinship
Copy link
Contributor Author

stevenwinship commented Nov 21, 2024

OK on 1 and 3. For 2, maybe I'm still missing it - for the query part added in lines 143-152 for the ip groups, I don't see any reference to permisison bits).

Ok. You are correct. there is no permission bits in IP groups. This is a bit confusing but here goes

in regular groups:
"_roleAlias": "contributor",
where contributor has permissions where each bit denotes a permission (for file download bit 4 is set)
sql: (dataverserole.permissionbits & 16 !=0)

in IP Groups the role is the specific permission so there is no need to check the bit
sql: look for 'fileDownloader' in WHERE roleassignment.assigneeidentifier IN (SELECT CONCAT('&ip/', persistedglobalgroup.persistedgroupalias) as assignee...)
{
"status": "OK",
"data": {
"assignee": "&ip/ipGroupebc7aa00",
"roleId": 2,
"_roleAlias": "fileDownloader",
"definitionPointId": 4
}
}

@scolapasta
Copy link
Contributor

in IP Groups the role is the specific permission so there is no need to check the bit

I don't think that's accurate (or at least now how it's designed. Sure, IP groups are mostly meant to allow read access (and usually file download), BUT it could be for the role "viewer".

And technically, if you're logged in and from a certain ip address, then contributor role should work too. (as examples)

@stevenwinship
Copy link
Contributor Author

I added the PERMISSIONBIT to the IpGroup sql and changed the test to test with "CURATOR" role

This comment has been minimized.

This comment has been minimized.

Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:6467-optimize-permission-lookups-for-a-user
ghcr.io/gdcc/configbaker:6467-optimize-permission-lookups-for-a-user

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Permissions FY25 Sprint 7 FY25 Sprint 7 (2024-09-25 - 2024-10-09) FY25 Sprint 8 FY25 Sprint 8 (2024-10-09 - 2024-10-23) FY25 Sprint 9 FY25 Sprint 9 (2024-10-23 - 2024-11-06) FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) Size: 10 A percentage of a sprint. 7 hours. Type: Bug a defect
Projects
Status: In Review 🔎
Development

Successfully merging this pull request may close these issues.

Optimize permission lookups for a user
6 participants