Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

omero.gateway.BlitzGateway.__del__ hangs server #5637

Merged
merged 2 commits into from
Jan 26, 2018

Conversation

joshmoore
Copy link
Member

Problem: having any blocking actions in a del method
can lead to hung gunicorn processes in OMERO.web. The addition
of the method was intended to detect dangling services, but
each call left further resources, detectable with lsof.

Short-term fix: By removing the method, OMERO.web should
no longer need to be periodically restarted.

Long-term fix: as a next step, the _assert_unregistered
method will need to again be invoked, perhaps by integration
tests, to detect the resources that were being left open.
Eventually, a rewrite of the login_requred decorator as
well as the close logic of BlitzGateway should be considered
so that resource cleanup can be guaranteed.

Testing this PR

  1. start OMERO.web 5.4.2 with public user (see public_user: add configuration for running public user omero-test-infra#6)
  2. run a number of calls against the server (e.g. ab -c 5 -n 1000 http://host/webgateway...)
  3. find the gunicorn processes of OMERO.web (ps auxw -H | grep gunicorn)
  4. find the number of file descriptors used by those processes (lsof -p $PID | grep pipe)
  5. watch it grow!
  6. install this patch and repeat the process

*Problem*: having any blocking actions in a __del__ method
can lead to hung gunicorn processes in OMERO.web. The addition
of the method was intended to detect dangling services, but
each call left further resources, detectable with `lsof`.

*Short-term fix*: By removing the method, OMERO.web should
no longer need to be periodically restarted.

*Long-term fix*: as a next step, the `_assert_unregistered`
method will need to again be invoked, perhaps by integration
tests, to detect the resources that were being left open.
Eventually, a rewrite of the `login_requred` decorator as
well as the `close` logic of BlitzGateway should be considered
so that resource cleanup can be guaranteed.
@sbesson
Copy link
Member

sbesson commented Jan 26, 2018

--rebased-to #5638

@chris-allan
Copy link
Member

For reference this change was made in #5545.

@hflynn
Copy link
Contributor

hflynn commented Jan 26, 2018

👍 to history

@joshmoore joshmoore merged commit 475a7fe into ome:develop Jan 26, 2018
@joshmoore joshmoore deleted the gateway-del-fix branch January 26, 2018 16:56
@joshmoore joshmoore added this to the 5.4.3 milestone Jan 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants