Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#95: Detach on Linux #6513

Merged
merged 46 commits into from
Dec 30, 2023
Merged

i#95: Detach on Linux #6513

merged 46 commits into from
Dec 30, 2023

Conversation

onroadmuwl
Copy link
Contributor

@onroadmuwl onroadmuwl commented Dec 18, 2023

Enable the feature of detachment on Linux.

  • I setup a temporary separate stack for the nudged thread, and resume the normal execution of this thread after detachment.
  • A new variable named nudged_sigcxt is used to save the sigcxt of nudged thread.
  • And the extra code dcontext == GLOBAL_DCONTEXT is added to cover the new code path that won't be executed before during the process of thread exit.
  • What's more, I add the is_detach_external variable for the smooth resumption of nudged thread on X64 ISA.
  • Finally, the frontend, automated test cases and documentation are modified accordingly.

Issue: #95

@onroadmuwl onroadmuwl mentioned this pull request Dec 18, 2023
@derekbruening
Copy link
Contributor

(Please request review when this is ready.)

Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing this feature! I think it will be very useful as some users had wanted it for a long time.

I had a few questions and suggestions: see the review comments.

tools/drdeploy.c Show resolved Hide resolved
tools/drdeploy.c Outdated Show resolved Hide resolved
tools/drdeploy.c Show resolved Hide resolved
suite/tests/runall.cmake Outdated Show resolved Hide resolved
suite/runsuite_wrapper.pl Show resolved Hide resolved
core/nudge.c Outdated Show resolved Hide resolved
core/nudge.c Show resolved Hide resolved
core/synch.c Outdated Show resolved Hide resolved
core/nudge.c Show resolved Hide resolved
core/synch.c Show resolved Hide resolved
@derekbruening
Copy link
Contributor

(With holidays and traveling there may be delays in reviews)

@onroadmuwl
Copy link
Contributor Author

(With holidays and traveling there may be delays in reviews)

Enjoy your holidays and travels.

core/synch.c Outdated Show resolved Hide resolved
core/synch.c Outdated Show resolved Hide resolved
core/synch.h Outdated Show resolved Hide resolved
@onroadmuwl
Copy link
Contributor Author

onroadmuwl commented Dec 30, 2023

Thanks for your advice. My codes are ready now, could you merge it if you think there's no problem?

@derekbruening derekbruening merged commit 10222c6 into DynamoRIO:master Dec 30, 2023
14 of 15 checks passed
@derekbruening
Copy link
Contributor

The client.detach_test added here is flaky, making the automated tests red in 2 out of 4 PR's yesterday. Filed as #6536. @onroadmuwl have you seen that in local testing -- can you reproduce that?

@abudankov
Copy link

Hello,

i am trying to use attach/detach/reattach feature on Ubuntu 20.04 AArch64 for some single thread application and currently it crashes. i built dynamorio in debug mode and enabled full logging:
./bin64/drrun -attach 4042305 -ops "-logdir /home/abudanko/dynamorio_fixes/build/log_dir -logmask 0x0fffffff -loglevel 4" -c ./clients/lib64/debug/libdrmemtrace.so -offline
Below is a part of execution log of signal handler processing nudge detach signal:
...
86800:d_r_dispatch:
d_r_dispatch: target = 0x0000ffffab42755c
86802:check_thread_vm_area: check_thread_vm_area: pc = 0x0000ffffab42755c
86804:prepend_entry_to_fraglist: prepend_entry_to_fraglist: putting fragment @0x0000ffffab42755c (shared) on vmarea 0x0000ffffab427000-0
86806:check_thread_vm_area: check_thread_vm_area: check_stop = 0x0000ffffab428000
86808:build_bb_ilist:
interp: 86810:build_bb_ilist: start_pc = 0x0000ffffab42755c
0x0000ffffab42755c b98007e0 ldrsw +0x04(%sp)[4byte] -> %x0
0x0000ffffab427560 f94007e1 ldr +0x08(%sp)[8byte] -> %x1
0x0000ffffab427564 8b000020 add %x1 %x0 lsl $0x00 -> %x0
0x0000ffffab427568 f90007e0 str %x0 -> +0x08(%sp)[8byte]
0x0000ffffab42756c b94007e0 ldr +0x04(%sp)[4byte] -> %w0
0x0000ffffab427570 11000400 add %w0 $0x0001 lsl $0x00 -> %w0
0x0000ffffab427574 b90007e0 str %w0 -> +0x04(%sp)[4byte]
0x0000ffffab427578 b94007e1 ldr +0x04(%sp)[4byte] -> %w1
0x0000ffffab42757c 5290d3c0 movz $0x869e lsl $0x00 -> %w0
0x0000ffffab427580 72a00020 movk %w0 $0x0001 lsl $0x10 -> %w0
0x0000ffffab427584 6b00003f subs %w1 %w0 lsl $0x00 -> %wzr
0x0000ffffab427588 54fffead b.le $0x0000ffffab42755c
86812:build_bb_ilist: end_pc = 0x0000ffffab42758c

87048:dispatch_enter_fcache_stats: Entry into F2(0x0000ffffab42755c).0x0000aaaa929fb170
...
87086:dispatch_exit_fcache_stats: Exit from F2(0x0000ffffab42755c).0x0000aaaa929fb36c
...

87612:main_signal_handler_C:
main_signal_handler: thread=4042305, sig=32, xsp=0x0000fffda61c3da0, retaddr=0x1000000400000020
87614:main_signal_handler_C: siginfo: sig = 32, pid = 0, status = 0, errno = 268435460, si_code = -1
87616:dump_sigcontext: Signal context:
87618:dump_sigcontext: x0 = 0x000000000000a248
87620:dump_sigcontext: x1 = 0x0000ffffab425560
...
87682:dump_sigcontext: pc = 0x0000aaaa929fb208
...
87756:dump_sigcontext:
87758:is_signal_for_us: is_signal_for_us T4042305: sig=32 code=-1 errno=268435460
87760:is_signal_for_us: is_signal_for_us T4042305: sig=32 code=-1 errno=268435460
87762:handle_nudge_signal: received nudge version=1 flags=0x0 mask=0x4 id=0x00000000 arg=0x0000000000000000
87766:d_r_notify: SYSLOG_INFORMATION: received nudge mask=0x4 id=0x00000000 arg=0x0000000000000000
87768:unlink_fragment_for_signal: F2, f->tag = 0x0000ffffab42755c, f->start_pc = 0x0000aaaa929fb16c, pc = 0x0000aaaa929fb208
87770:unlink_fragment_for_signal: unlinking outgoing for interrupted F2
87772:unlink_fragment_outgoing: unlinking outgoing from frag F2(0x0000ffffab42755c)
87774:unlink_branch: unlinking branch F2.0x0000aaaa929fb368
87776:unlink_branch: unlinking branch F2.0x0000aaaa929fb36c
87778:unlink_fragment_for_signal: dcontext->nudge_pending = 0x0000000000000000
87780:enter_nolinking: entered function
87782:enter_nolinking: with CODE_CACHE
87784:flush_fragments_synch_unlink_priv: FLUSH STAGE 1: synch_unlink_priv(thread 4042305 flushtime 0): 0x0000000000000000-0xffffffffffff
87786:make_writable: make_writable: pc 0x0000aaaa929f5000 -> 0x0000aaaa929f5000-0x0000aaaa929f7000 0
87808:make_unwritable: make_unwritable: pc 0x0000aaaa929f5000 -> 0x0000aaaa929f5000-0x0000aaaa929f7000
87824:flush_fragments_synch_priv: considering thread #1/1 = 4042305
87826:flush_fragments_synch_priv: thread 4042305 synch not required
87828:flush_fragments_thread_unlink: thread 4042305 has no fragments in region to flush

i added code that implements unlink of fragment that nudge detach signal hit at the end of unlink_fragment_for_signal() function at signal.c file:

static bool
unlink_fragment_for_signal(dcontext_t *dcontext, fragment_t *f,
byte *pc /interruption pc/)
{
...
if (!waslinking)
enter_nolinking(dcontext, NULL, false);

/* Added here to get out to the dispatcher in a forced order */
flush_fragments_synch_unlink_priv(dcontext, UNIVERSAL_REGION_BASE, UNIVERSAL_REGION_SIZE, false, false, false);
flush_fragments_unlink_shared(dcontext, UNIVERSAL_REGION_BASE, UNIVERSAL_REGION_SIZE, NULL _IF_DGCDIAG(NULL));
flush_fragments_end_synch(dcontext, false);

return changed;

}

87830:flush_fragments_unlink_shared: FLUSH STAGE 2: unlink_shared(thread 4042305): flusher is 4042305
87832:flush_fragments_unlink_shared: flushing shared fragments
87836:add_process_lock: add_process_lock: 0 lock 0x00000000715f3e58: name=shared_delete_lock(mutex)@/home/abudanko/dynamorio_fixes/core/
rank=14 owner=4042305 owning_dc=0x0000fffda6188680 contended_event=0x7ffffffff prev=0x0000000000000000
lock count_times_acquired= 1 0 0 0
87838:vm_area_unlink_fragments: vm_area_unlink_fragments 0x0000000000000000..0xffffffffffffffff
87840:vm_area_unlink_fragments: marking region 0x0000ffffab427000..0x0000ffffab428000 for deletion & unlinking all its frags
87860:hashtable_fragment_add: hashtable_fragment_add(0x0000ffffab427578) mask=0x00000000000003ff offset=0 trying 0x000000000000032d:
87862:hashtable_fragment_add: hashtable_fragment_add: added 0x0000ffffab427578 to shared_future at table[813]
87864:hashtable_fragment_remove_helper_open_address: hashtable_fragment_remove_helper_open_address(table=0x0000fffda6180248, hindex=813)
87880:hashtable_fragment_add: hashtable_fragment_add(0x0000ffffab427554) mask=0x00000000000003ff offset=0 trying 0x000000000000022d:
87882:hashtable_fragment_add: hashtable_fragment_add: added 0x0000ffffab427554 to shared_future at table[557]
87884:hashtable_fragment_remove_helper_open_address: hashtable_fragment_remove_helper_open_address(table=0x0000fffda6180248, hindex=557)
87902:hashtable_fragment_add: hashtable_fragment_add(0x0000ffffab42758c) mask=0x00000000000003ff offset=0 trying 0x000000000000009e:
87904:hashtable_fragment_add: hashtable_fragment_add: added 0x0000ffffab42758c to shared_future at table[158]
87906:hashtable_fragment_remove_helper_open_address: hashtable_fragment_remove_helper_open_address(table=0x0000fffda6180248, hindex=158)
87922:hashtable_fragment_add: hashtable_fragment_add(0x0000ffffab42755c) mask=0x00000000000003ff offset=0 trying 0x00000000000001f4:
87924:hashtable_fragment_add: hashtable_fragment_add: added 0x0000ffffab42755c to shared_future at table[500]
87926:hashtable_fragment_remove_helper_open_address: hashtable_fragment_remove_helper_open_address(table=0x0000fffda6180248, hindex=500)
87942:hashtable_fragment_add: hashtable_fragment_add(0x0000ffffab42757c) mask=0x00000000000003ff offset=0 trying 0x0000000000000110:
87944:hashtable_fragment_add: hashtable_fragment_add: added 0x0000ffffab42757c to shared_future at table[272]
87946:hashtable_fragment_remove_helper_open_address: hashtable_fragment_remove_helper_open_address(table=0x0000fffda6180248, hindex=272)
87954:vm_area_unlink_fragments: Before removing vm area:
0x0000ffffab427000-0x0000ffffab428000 ---- ELF SO test_lib.so
87956:vm_area_unlink_fragments: Removing shared vm area 0x0000ffffab427000-0x0000ffffab428000
87962:vm_area_unlink_fragments: After removing vm area:
87964:vm_area_unlink_fragments: Unlinked 5 frags
87966:make_writable: make_writable: pc 0x0000aaaa929f5000 -> 0x0000aaaa929f5000-0x0000aaaa929f7000 0
87982:make_unwritable: make_unwritable: pc 0x0000aaaa929f5000 -> 0x0000aaaa929f5000-0x0000aaaa929f7000
87998:flush_fragments_unlink_shared: Flushed 5 fragments from 0x0000000000000000-0xffffffffffffffff
88000:flush_fragments_end_synch: FLUSH STAGE 3: end_synch(thread 4042305): flusher is 4042305
88002:main_signal_handler_C: main_signal_handler 32 returning now to 0x0000aaaa929fb208
...
88014:vm_area_check_shared_pending: thread 4042305 (flushtime 0) walking pending deletion list (was_I_flushed==F2)
88016:vm_area_check_shared_pending: Considering #0: 0x0000ffffab427000..0x0000ffffab428000 flushtime 1
88018:vm_area_check_shared_pending: dec => ref_count is now 0, flushtime diff is 0
88020:add_process_lock: add_process_lock: 0 lock 0x00000000715f3ed0: name=lazy_delete_lock(mutex)@/home/abudanko/dynamorio_fixes/core/vm
rank=15 owner=4042305 owning_dc=0x0000fffda6188680 contended_event=0xfffdffffffff prev=0x0000000000000000
lock count_times_acquired= 1 0 0 0
88022:check_lazy_deletion_list: checking lazy list @ timestamp 1
88024:vm_area_check_shared_pending: deleting all fragments in region 0x0000ffffab427000..0x0000ffffab428000 flushtime 1
88026:fragment_delete: fragment_delete: *0x0000fffda6151740 F5(0x0000ffffab427578).0x0000aaaa929fb6dc bb 0x47
88028:fcache_remove_fragment: fcache_remove_fragment: F5 from Basic block (shared) unit
88030:fragment_delete: fragment_delete: *0x0000fffda61516d0 F4(0x0000ffffab427554).0x0000aaaa929fb598 bb 0x47
88032:fcache_remove_fragment: fcache_remove_fragment: F4 from Basic block (shared) unit
88034:fragment_delete: fragment_delete: *0x0000fffda6151650 F3(0x0000ffffab42758c).0x0000aaaa929fb3c8 bb 0x47
88036:fcache_remove_fragment: fcache_remove_fragment: F3 from Basic block (shared) unit
88038:fragment_delete: fragment_delete: *0x0000fffda61515d0 F2(0x0000ffffab42755c).0x0000aaaa929fb16c bb 0x47
88040:fcache_remove_fragment: fcache_remove_fragment: F2 from Basic block (shared) unit
88042:fragment_delete: fragment_delete: *0x0000fffda6151520 F1(0x0000ffffab42757c).0x0000aaaa929fb008 bb 0x47
88044:fcache_remove_fragment: fcache_remove_fragment: F1 from Basic block (shared) unit
88046:add_process_lock: add_process_lock: 0 lock 0x00000000715f36a8: name=unit_flush_lock(mutex)@/home/abudanko/dynamorio_fixes/core/fca
rank=23 owner=4042305 owning_dc=0x0000fffda6188680 contended_event=0xfffdffffffff prev=0x0000000000000000
lock count_times_acquired= 1 0 0 0
88048:vm_area_check_shared_pending: thread 4042305 done walking pending list @flushtime 1
88050:vm_area_check_shared_pending: Flushed 5 frags
88052:dispatch_enter_dynamorio: Just flushed last_fragment

After return from signal handler thread gets back to dynamorio dispatch and starts instrumenting code and building code cache again:

88054:d_r_dispatch:
d_r_dispatch: target = 0x0000ffffab42755c
88056:check_thread_vm_area: check_thread_vm_area: pc = 0x0000ffffab42755c
new shared vm area: 0x0000ffffab427000-0x0000ffffab428000 ---- ELF SO test_lib.so
88062:check_thread_vm_area: checking thread vmareas against executable_areas
88064:prepend_entry_to_fraglist: prepend_entry_to_fraglist: putting fragment @0x0000ffffab42755c (shared) on vmarea 0x0000ffffab427000-0
88066:check_thread_vm_area: check_thread_vm_area: check_stop = 0x0000ffffab428000
88068:build_bb_ilist:
interp: 88070:build_bb_ilist: start_pc = 0x0000ffffab42755c
0x0000ffffab42755c b98007e0 ldrsw +0x04(%sp)[4byte] -> %x0
0x0000ffffab427560 f94007e1 ldr +0x08(%sp)[8byte] -> %x1
0x0000ffffab427564 8b000020 add %x1 %x0 lsl $0x00 -> %x0
0x0000ffffab427568 f90007e0 str %x0 -> +0x08(%sp)[8byte]
0x0000ffffab42756c b94007e0 ldr +0x04(%sp)[4byte] -> %w0
0x0000ffffab427570 11000400 add %w0 $0x0001 lsl $0x00 -> %w0
0x0000ffffab427574 b90007e0 str %w0 -> +0x04(%sp)[4byte]
0x0000ffffab427578 b94007e1 ldr +0x04(%sp)[4byte] -> %w1
0x0000ffffab42757c 5290d3c0 movz $0x869e lsl $0x00 -> %w0
0x0000ffffab427580 72a00020 movk %w0 $0x0001 lsl $0x10 -> %w0
0x0000ffffab427584 6b00003f subs %w1 %w0 lsl $0x00 -> %wzr
0x0000ffffab427588 54fffead b.le $0x0000ffffab42755c
88072:build_bb_ilist: end_pc = 0x0000ffffab42758c
88074:instrument_basic_block:
instrument_basic_block ******************
88076:instrument_basic_block:
before instrumentation:
TAG 0x0000ffffab42755c
+0 L3 @0x0000fffda61a9448 b98007e0 ldrsw +0x04(%sp)[4byte] -> %x0
+4 L3 @0x0000fffda623b898 f94007e1 ldr +0x08(%sp)[8byte] -> %x1
+8 L3 @0x0000fffda61a8190 8b000020 add %x1 %x0 lsl $0x00 -> %x0
...

Should detach code like: handle_nidge() -> call_switch_stack(dcontext, (byte )((ptr_uint_t)d_r_detachstack + DYNAMORIO_STACK_SIZE), (void ()(void *))detach_externally_on_new_stack, NULL, true);
execute prior getting back to dynamorio dispatcher so application code instrumentation would not happen till the next attach?

Thanks,
Alexei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants