Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHP CLI process hangs on the PeriodicTaskExecutor destructor's thread_.join() call #1209

Closed
kisprof opened this issue Aug 17, 2024 · 1 comment · Fixed by #1213
Closed

PHP CLI process hangs on the PeriodicTaskExecutor destructor's thread_.join() call #1209

kisprof opened this issue Aug 17, 2024 · 1 comment · Fixed by #1213
Assignees
Labels
agent-php bug Something isn't working

Comments

@kisprof
Copy link

kisprof commented Aug 17, 2024

Describe the bug
In our setup a main PHP CLI process triggered by cron is creating sub PHP processes to do work parallelly. We have observed that after some time, there are multiple child processes that are stuck. Their unit of work is completed, but they are not terminating. Upon further investigating these processes, we have seen that they are waiting for a lock to be released. After reproducing the issue locally we have seen in the traces that the PHP CLI process hangs on the PeriodicTaskExecutor class destructor's thread_.join() call.

  • strace outpout of the stuck process

    strace: Process 488 attached
    futex(0x7f66c8bff910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 492, NULL, FUTEX_BITSET_MATCH_ANY
    
  • gbd backtrace

    (gdb) backtrace
    #0  __futex_abstimed_wait_common64 (private=128, cancel=true, abstime=0x0, op=265, expected=492, futex_word=0x7f66c8bff910) at ./nptl/futex-internal.c:57
    #1  __futex_abstimed_wait_common (cancel=true, private=128, abstime=0x0, clockid=0, expected=492, futex_word=0x7f66c8bff910) at ./nptl/futex-internal.c:87
    #2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f66c8bff910, expected=492, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128) at ./nptl/futex-internal.c:139
    #3  0x00007f66d0636624 in __pthread_clockjoin_ex (threadid=140079431415360, thread_return=0x0, clockid=0, abstime=0x0, block=<optimized out>) at ./nptl/pthread_join_common.c:105
    #4  0x00007f66c93d9a63 in __gthread_join (__value_ptr=0x0, __threadid=<optimized out>) at /home/build/prerequisities/gcc-12.2.0/x86_64-linux-gnu/libstdc++-v3/include/x86_64-linux-gnu/bits/gthr-default.h:669
    #5  std::thread::join (this=0x5653fcdf7688) at ../../../.././libstdc++-v3/src/c++11/thread.cc:112
    #6  0x00007f66c9328f1a in elasticapm::php::PeriodicTaskExecutor::~PeriodicTaskExecutor (this=0x5653fcdf7640, __in_chrg=<optimized out>) at /source/agent/native/libcommon/code/./PeriodicTaskExecutor.h:35
    #7  elasticapm::php::PeriodicTaskExecutor::~PeriodicTaskExecutor (this=0x5653fcdf7640, __in_chrg=<optimized out>) at /source/agent/native/libcommon/code/./PeriodicTaskExecutor.h:37
    #8  std::default_delete<elasticapm::php::PeriodicTaskExecutor>::operator() (__ptr=0x5653fcdf7640, this=<optimized out>) at /opt/gcc-12.2.0/include/c++/12.2.0/bits/unique_ptr.h:95
    #9  std::unique_ptr<elasticapm::php::PeriodicTaskExecutor, std::default_delete<elasticapm::php::PeriodicTaskExecutor> >::~unique_ptr (this=0x5653fce212c8, __in_chrg=<optimized out>) at /opt/gcc-12.2.0/include/c++/12.2.0/bits/unique_ptr.h:396
    #10 elasticapm::php::AgentGlobals::~AgentGlobals (this=0x5653fce21290, __in_chrg=<optimized out>) at /source/agent/native/libcommon/code/./AgentGlobals.h:12
    #11 zm_globals_dtor_elastic_apm (elastic_apm_globals=0x7f66c947dd80 <elastic_apm_globals>) at /source/agent/native/ext/elastic_apm.cpp:299
    #12 0x00005653fafc4758 in module_destructor (module=0x7f66c947d8a0 <elastic_apm_module_entry>) at ./Zend/zend_API.c:3075
    #13 0x00005653fafd1015 in _zend_hash_del_el_ex (prev=<optimized out>, p=0x5653fca2df00, idx=59, ht=0x5653fb2458a0 <module_registry>) at ./Zend/zend_hash.c:1425
    #14 _zend_hash_del_el (p=0x5653fca2df00, idx=59, ht=0x5653fb2458a0 <module_registry>) at ./Zend/zend_hash.c:1452
    #15 zend_hash_graceful_reverse_destroy (ht=0x5653fb2458a0 <module_registry>) at ./Zend/zend_hash.c:1977
    #16 0x00005653fafc2e96 in zend_destroy_modules () at ./Zend/zend_API.c:2384
    #17 0x00005653fafbded7 in zend_shutdown () at ./Zend/zend.c:1099
    #18 0x00005653faf575ba in php_module_shutdown () at ./main/main.c:2453
    #19 php_module_shutdown () at ./main/main.c:2430
    #20 0x00005653fadecbf2 in main (argc=5, argv=0x5653fc9e96e0) at ./sapi/cli/php_cli.c:1346
    

    image

To Reproduce
Running the following setup reproduces the situation consistently for me:

  • Files:

    • ./Dockerfile
      FROM ubuntu:22.04
      ARG DEBIAN_FRONTEND=noninteractive
      ARG PHP_VERSION=8.2
      
      RUN apt -y update
      RUN apt-get -y install software-properties-common
      RUN apt-get -y install tar gdb strace
      
      RUN add-apt-repository -c main -c main/debug ppa:ondrej/php
      RUN apt -y update
      
      RUN apt-get -y install php${PHP_VERSION} php${PHP_VERSION}-cli-dbgsym
      RUN update-alternatives --set php /usr/bin/php${PHP_VERSION}
      
      RUN apt-get -y install php${PHP_VERSION}-dom php${PHP_VERSION}-mysql php${PHP_VERSION}-soap
      
      ADD --chmod=0755 https://github.com/elastic/apm-agent-php/releases/download/v1.13.0/apm-agent-php_1.13.0_amd64.deb /root
      ADD --chmod=0755 https://github.com/elastic/apm-agent-php/releases/download/v1.13.0/apm-agent-php-debugsymbols-linux-x86-64.tar /root
      
      RUN dpkg -i /root/apm-agent-php_1.13.0_amd64.deb
      RUN tar xvf /root/apm-agent-php-debugsymbols-linux-x86-64.tar -C /usr/lib/debug
      
      RUN touch /var/log/apm_process.log
      
      COPY ./src /root/src
      WORKDIR /root/src
      
    • ./docker-compose.yml
      name: apm_process
      services:
          apm_process:
              build:
                  context: "."
                  args:
                      PHP_VERSION: ${PHP_VERSION-8.2}
              volumes:
                  - "./src:/root/src"
                  - "./99-elastic-apm-custom.ini:/etc/php/${PHP_VERSION-8.2}/cli/conf.d/99-elastic-apm-custom.ini"
              cap_add:
                  -   SYS_PTRACE
              tty: true
              stdin_open: true
      
    • ./99-elastic-apm-custom.ini
      [elastic]
      elastic_apm.enabled = true
      elastic_apm.secret_token = "{{REPLACE_ME}}"
      elastic_apm.server_url = "{{REPLACE_ME}}"
      elastic_apm.environment = "local"
      elastic_apm.log_level = "DEBUG"
      elastic_apm.log_level_stderr = "DEBUG"
      elastic_apm.server_timeout = "30s"
      elastic_apm.service_name = "apm_process"
      elastic_apm.service_version = "v1"
      elastic_apm.transaction_max_spans = 500
      elastic_apm.transaction_sample_rate = 1.0
      elastic_apm.verify_server_cert = true
      elastic_apm.dev_internal_backend_comm_log_verbose = true
      elastic_apm.profiling_inferred_spans_enabled = true
      
    • ./src/test.php
      <?php
      
      use Elastic\Apm\ElasticApm;
      
      chdir(__DIR__);
      $transaction = ElasticApm::getCurrentTransaction();
      $transaction->setName('PARENT');
      
      echo 'PARENT transaction id:' . $transaction->getId() . "\n";
      
      for ($i = 0; $i < 10000; $i++) {
          $param = random_int(0, 60);
      
          $command = "nohup /usr/bin/php -f run.php " . $param . " " . getmypid() . '-' . $i . " 2>>/var/log/apm_process.log 1>&2 &";
          echo "Running: " . $command . "\n";
      
          $span = $transaction->beginChildSpan('child span ' . $i, 'process', 'run.php', 'execute');
          shell_exec($command);
          $span->end();
      
          $sleepBetweenChildren = random_int(0, 1000);
          echo "Sleeping for " . $sleepBetweenChildren . "(us)\n";
          usleep($sleepBetweenChildren);
      }
      
    • ./src/run.php
      <?php
      
      use Elastic\Apm\ElasticApm;
      
      $transaction = ElasticApm::getCurrentTransaction();
      $transaction->setName('CHILD');
      echo getmypid() . ' Child transaction id:' . $transaction->getId() . "\n";
      
      $span = $transaction->beginChildSpan('wait', 'custom');
      $sleepTime = $argv[1];
      echo getmypid() . ' Child sleep for ' . $sleepTime . "(ms)\n";
      usleep(1000 * $sleepTime);
      $span->end();
      
  • Steps:

    1. docker compose up
    2. docker compose exec apm_process /bin/bash
    3. php test.php 2>>/var/log/apm_process.log 1>&2
    4. After the process finished, check with top command the list of processes and there should be at least one, but possibly multiple like this /usr/bin/php -f run.php {{delay_time}} {{main_process_id}}-{{counter_of_child_processes}} stuck forever.
    5. with strace -p {{process_id}} we should see that the process is stuck on a futex() call like above.
  • Notes:

    • The issue is possibly a race condition, so it might take a few runs to reproduce, but for me the above setup consistently ended in multiple stuck processes.
    • I have originally started with our production setup and dependencies to reproduce the issue, but managed to strip down to this relative minimalistic example. The 3 extensions (dom, mysql, soap) seems to be needed to reproduce the issue. If any of them is taken out of the example, the problem did not occur for me.

Expected behavior
PHP process should not stuck upon terminating when APM extension is enabled.

@kisprof kisprof added the bug Something isn't working label Aug 17, 2024
@intuibase intuibase self-assigned this Aug 20, 2024
intuibase added a commit to intuibase/apm-agent-php that referenced this issue Aug 22, 2024
intuibase added a commit to intuibase/apm-agent-php that referenced this issue Aug 22, 2024
@intuibase
Copy link
Contributor

Hey @kisprof

Could you please verify if latest (1.14) release fixed issue for you?

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-php bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants