Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterized launch_pytest tests don't close node log files #793

Open
asymingt opened this issue Aug 6, 2024 · 5 comments
Open

Parameterized launch_pytest tests don't close node log files #793

asymingt opened this issue Aug 6, 2024 · 5 comments
Assignees

Comments

@asymingt
Copy link

asymingt commented Aug 6, 2024

Bug report

When @pytest.mark.parametrize is used in a function scope with launch_pytest the node log files from each test parameterization remain open even when execution of that specific parameterized test is complete. For large projects with dozens or hundreds of nodes this eventually causes a OSError: [Errno 24] Too many open files error.

Required Info:

  • Operating System: Ubuntu Linux 20.04 amd64
  • Installation type: source
  • Version or commit hash: 9e517a3
  • DDS implementation: RTI Connext
  • Client library (if applicable): rclpy

Steps to reproduce issue

The code below cannot be run directly, but illustrates the issue. The TestContext object represents a specific test setup we want to run, and is constructed by name from a list of acceptable names in ALL_TEST_NAMES. This allows us to cherry-pick a subset of tests to run from our entire suite, with pytest -k or pytest --regex. The idea is each tests is run in sequence, against a new execution context and saved the results to a TEST_RESULTS data structure in the global context. When the test session ends, the AnalyzeResults function is called to analyze all results from all tests.

import pytest

import launch
import launch_pytest

from my_project.test_framework import TestContext, ALL_TEST_NAMES, AnalyzeResults

TEST_RESULTS = []

@pytest.fixture(scope='function')
def test_context(request):
    test_context = TestContext(launch_test=request.param)
    test_context.do_some_preprocessing_steps()
    return test_context

@launch_pytest.fixture(scope='function')
def launch_description(test_context):
    return launch.LaunchDescription(
        test_context.get_all_actions() + [launch_pytest.actions.ReadyToTest()])

@pytest.mark.parametrize('test_context', ALL_TEST_NAMES, indirect=True)
@pytest.mark.launch(fixture=launch_description)
def test_execution(test_context):
    results = test_context.do_some_post_postprocessing_steps()
    TEST_RESULTS.append(results)

@pytest.fixture(scope='session', autouse=True)
def run_assertions():
    yield
    AnalyzeResults(TEST_RESULTS)

Expected behavior

Test completes without throwing an error.

Actual behavior

Python throws an internal error OSError: [Errno 24] Too many open files.

This only happens several tests into the sequence, pointing to the fact that something internally is not cleaning up after itself. I have painfully tracked lsof -p <pid_of_pytest_invocation> over time and realized that the ROS2 node log files created by each parameterized test are not being closed, even though the stack has shut down. So, let's say we have k nodes and therefore k log files created by ROS per launch test that is run. If I run n parameterized tests sequentially the number of log files open at test i, where i is in {1 ... n}, will be i * k.

Additional information

You can always increase the number of files that may be opened at once with ulimit but this is a bandaid and not a solution.

@asymingt asymingt changed the title Parameterized launch_pytest tests don't close log files, eventually leading to an OSError Parameterized launch_pytest tests don't appear to close node log files Aug 6, 2024
@asymingt asymingt changed the title Parameterized launch_pytest tests don't appear to close node log files Parameterized launch_pytest tests don't close node log files Aug 7, 2024
@MichaelOrlov
Copy link

@wjwwood Could you please help to triage this issue?

@asymingt
Copy link
Author

asymingt commented Aug 27, 2024

@wjwwood, here's a standalone example that spawns 100 TF2 static broadcasters with their own log files, and then repeats this test 100 times to stress the logging system. Set ulimit -n 1024 and you should be able to easily open all the ROS shared libraries and log files for one test. Then, if you run the following test with pytest -vs ... you will see the first couple tests run until the unclosed logs hit the barrier and you 'll get a OSError: [Errno 24] Too many open files.

import pytest
import launch
import launch_pytest
import launch_ros
import random
import string

ALL_TEST_NAMES = [f"test_{k}" for k in range(0, 100)]
TEST_DURATION = 5.0

def gen_random_frame_name(N : int) -> string:
    """Generate a random frame name of N characters"""
    return ''.join(random.choices(string.ascii_uppercase + string.digits, k=N))
                      
@pytest.fixture(scope='function')
def test_context(request):
    """Indirect param access to get the test name"""
    return request.param

@launch_pytest.fixture(scope='function')
def launch_description(test_context):
    # Launch a bunch of nodes, each with their own log file.
    nodes = [
        launch_ros.actions.Node(
            name=f"{test_context}_node_{k}",
            package="tf2_ros",
            executable="static_transform_publisher",
            arguments=[
                "0",
                "0",
                "0",
                "0",
                "0",
                "0",
                gen_random_frame_name(8),
                gen_random_frame_name(8),
            ],
            output='own_log')
        for k in range(0, 100)
    ]
    # Shutdown this launch after TEST_DURATION seconds.
    shutdown_timer = launch.actions.TimerAction(
        period=TEST_DURATION,
        actions=[
            launch.actions.Shutdown(reason='launch is shutting down')
        ],
    )
    return launch.LaunchDescription(nodes + [
        shutdown_timer,
        launch_pytest.actions.ReadyToTest()
    ])

@pytest.mark.parametrize('test_context', ALL_TEST_NAMES, indirect=True)
@pytest.mark.launch(fixture=launch_description)
def test_execution(test_context):
    yield

@asymingt
Copy link
Author

asymingt commented Sep 4, 2024

Any progress on this? Is there anything I can do on my side to help?

@clalancette
Copy link
Contributor

Any progress on this? Is there anything I can do on my side to help?

Just to give you some kind of answer; we don't really use this functionality very much in the core. And the person who initially developed it is no longer working on ROS 2. So we don't have a lot of great insight into what's going on here.

My best suggestion for now is to dig into it and see what is happening. We'll happily review pull requests to improve it.

@asymingt
Copy link
Author

asymingt commented Sep 4, 2024

I did some work on this and confirm that this bug still affects Jazzy, and I suspect it's because launch.logging.get_output_loggers(...) calls made on these lines results in a file handler being opened and stored to the global launch.logging.launch_config here. I think that the ExecuteLocal class should probably close its own two specific loggers on cleanup here.

This workaround resets the logging system after each test completes and fixes the exception in my example:

@pytest.mark.parametrize('test_context', ALL_TEST_NAMES, indirect=True)
@pytest.mark.launch(fixture=launch_description)
def test_execution(test_context):
    yield
    launch.logging.reset()    # <------ THIS LINE

As a consequence, the above line forces each parameterized test now to create a new directory for its log files, which in a sense is actually more desirable, so I might just go with this solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants