Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change direct access handling to support JupyterHub #880

Closed
wants to merge 11 commits into from

Conversation

Alex-Lewandowski
Copy link

@Alex-Lewandowski Alex-Lewandowski commented Nov 16, 2024

Closes #444

Currently, earthaccess attempts to determine a client's AWS region by by accessing EC2 metadata at IP 169.254.169.254. However, Zero-to-Jupyterhub-k8s explicitly denies access to the EC2 metadata endpoint for security.
jupyterhub/zero-to-jupyterhub-k8s/jupyterhub/values.yaml

  cloudMetadata:
    # block set to true will append a privileged initContainer using the
    # iptables to block the sensitive metadata server at the provided ip.
    blockWithIptables: true
    ip: 169.254.169.254

This means that no JupyterHub can currently take advantage of direct access unless it is configured to bypass recommended security settings. Additionally, it fails falsely, informing the user that they are not in us-west-2 even if they are. There is a workaround, in which you can set in_region to True before attempting direct access, but you have to know to do this and know to ignore the incorrect out-of-region error.

This PR removes the in_region member variable and adds out_of_region_handling and access parameters to the API download function and the store.get functions.

Unless access is set to external, direct access will be attempted under the assumption that it will succeed. If it fails and out_of_region_handling is set to "raise" (the default value), the exception is raised. If direct access fails and out_of_region_handling is set to "handle," a warning is raised, the exception is printed in the warning, and HTTPS is attempted.

If 'access' is set to external, HTTPS access will be used, and direct access will not be attempted at all.

Pull Request (PR) draft checklist - click to expand
  • [ X] Please review our
    contributing documentation
    before getting started.
  • [ X] Populate a descriptive title. For example, instead of "Updated README.md", use a
    title such as "Add testing details to the contributor section of the README".
    Example PRs: #763
  • [ X] Populate the body of the pull request with:
  • [ X] Update CHANGELOG.md with details about your change in a section titled
    ## Unreleased. If such a section does not exist, please create one. Follow
    Common Changelog for your additions.
    Example PRs: #763
  • [ X] Update the documentation and/or the README.md with details of changes to the
    earthaccess interface, if any. Consider new environment variables, function names,
    decorators, etc.

Click the "Ready for review" button at the bottom of the "Conversation" tab in GitHub
once these requirements are fulfilled. Don't worry if you see any test failures in
GitHub at this point!

Pull Request (PR) merge checklist - click to expand

Please do your best to complete these requirements! If you need help with any of these
requirements, you can ping the @nsidc/earthaccess-support team in a comment and we
will help you out!

  • Add unit tests for any new features.
  • [X ] Apply formatting and linting autofixes. You can add a GitHub comment in this Pull
    Request containing "pre-commit.ci autofix" to automate this.
  • Ensure all automated PR checks (seen at the bottom of the "conversation" tab) pass.
  • Get at least one approving review.

📚 Documentation preview 📚: https://earthaccess--880.org.readthedocs.build/en/880/

Copy link

github-actions bot commented Nov 16, 2024

Binder 👈 Launch a binder notebook on this branch for commit 3652637

I will automatically update this comment whenever this PR is modified

Binder 👈 Launch a binder notebook on this branch for commit 9f34d5a

@Alex-Lewandowski
Copy link
Author

pre-commit.ci autofix

@Alex-Lewandowski
Copy link
Author

I need to add tests for the out_of_region_handling argument, but I could use some help figuring out how best to do that. I have all unit and integration tests passing on a JupyterHub in us-west-2 but I don't fully understand how the GitHub actions environment where they will run is set up. Should we test for direct access failing (since I am guessing it won't be in us-west-2) and mock the case where it succeeds?

@Alex-Lewandowski Alex-Lewandowski marked this pull request as ready for review November 17, 2024 00:08
@itcarroll
Copy link
Collaborator

Really happy to see action on #444! Is there a link to any discussion notes leading to the decision to yank in_region? I'm not opposed, but I think its a decision that merits documentation?

Why not include the access kwarg for earthaccess.open, so that its logic is the same as for earthaccess.download?

@chuckwondo
Copy link
Collaborator

@Alex-Lewandowski, thanks for re-raising this issue. This is definitely a long-standing issue that deserves attention.

I believe a thorough solution would be a bit more involved, as there are many issues that have been created around this general problem, so I've just opened a new discussion for this, as I feel it warrants deeper design discussion: #881

Although I appreciate your effort in this PR, I suggest you close it in light of the further discussion it warrants. In the end, the solution in this PR adds more parameters to an already rather cluttered list of parameters for the relevant functions. As part of the discussion, I want to see the parameter list simplified, yet provide more general support for this general issue.

@Alex-Lewandowski
Copy link
Author

Really happy to see action on #444! Is there a link to any discussion notes leading to the decision to yank in_region? I'm not opposed, but I think its a decision that merits documentation?

Why not include the access kwarg for earthaccess.open, so that its logic is the same as for earthaccess.download?

Hi @itcarroll! There is a bit of discussion on the topic at the bottom of #444. It appears that there is no reliable way to determine a client's region. Boto will provide the default region that the user defined in their configuration, which is not guaranteed to be correct. Most JupyterHubs block access to the EC2 metadata endpoint that earthaccess is currently using to set in_region.

@Alex-Lewandowski
Copy link
Author

@Alex-Lewandowski, thanks for re-raising this issue. This is definitely a long-standing issue that deserves attention.

I believe a thorough solution would be a bit more involved, as there are many issues that have been created around this general problem, so I've just opened a new discussion for this, as I feel it warrants deeper design discussion: #881

Although I appreciate your effort in this PR, I suggest you close it in light of the further discussion it warrants. In the end, the solution in this PR adds more parameters to an already rather cluttered list of parameters for the relevant functions. As part of the discussion, I want to see the parameter list simplified, yet provide more general support for this general issue.

@chuckwondo Okay, it sounds like more discussion is warranted. I will close the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change us-west-2 check, as only working in EC2 instances
3 participants