target/riscv: Ensure to handle all triggered a halt events #1171

lz-bro · 2024-11-20T12:28:54Z

If all current halted states are due to a halt group, then a new "triggered a halt" event has occurred.

aap-sc · 2024-11-20T12:46:26Z

src/target/riscv/riscv.c

+				else
+					break;
+			}
+		}


This loop seems weird. What is it's purpose? DCSR is writable, so we can occasionally trick debugger into wrong conclusion.

halt groups are an optional feature and I'm quite confused that we don't check for it.

Could you please provide a test scenario to reproduce your issue? Is it possible to use spike to model it? Or do you need a specific HW ?

This bug was discovered when I was testing Semihosting on our hardware and smp is enable. It fails when the following happens:
If all current halted states are due to a halt group, and other harts state was running. In fact, there was a hart halted, which caused the other harts to halt because of the hart group.

riscv-openocd/src/target/riscv/riscv.c

Lines 3792 to 3796 in f51900b

} else if (halted && running) {

LOG_TARGET_DEBUG(target, "halt all; halted=%d",

halted);

riscv_halt(target);

} else {

If there is such a halted hart，but the record status is running，it would not process riscv_semihosting.

riscv-openocd/src/target/riscv/riscv.c

Lines 3605 to 3623 in f51900b

if (halt_reason == RISCV_HALT_EBREAK) {

int retval;

/* Detect if this EBREAK is a semihosting request. If so, handle it. */

switch (riscv_semihosting(target, &retval)) {

case SEMIHOSTING_NONE:

break;

case SEMIHOSTING_WAITING:

/* This hart should remain halted. */

*next_action = RPH_REMAIN_HALTED;

break;

case SEMIHOSTING_HANDLED:

/* This hart should be resumed, along with any other

* harts that halted due to haltgroups. */

*next_action = RPH_RESUME;

return ERROR_OK;

case SEMIHOSTING_ERROR:

return retval;

}

}

@aap-sc I think this is a bug, would you provide some suggestions?

@lz-bro I'm still trying to understand your reasoning and what the issue is exactly (the situation is still not quite obvious to me). It will take a couple of days - I'll ask additional question if necessary.

JanMatCodasip · 2024-11-28T08:55:27Z

@lz-bro I am afraid I have not understood what case this merge request addresses; not even after reading the commit description and the discussion so far.

Please, could you provide a very clear description in the commit message. Doing so will help:

Code reviewers to understand your changes (and have them merged)
Other users of OpenOCD source code - both current and future.

Thank you.

zqb-all · 2024-11-29T17:03:41Z

Let me try to explain this issue.
Suppose we have two cores, core0 and core1, belong to one smp group.
Openocd constantly calls riscv_openocd_poll to check the status.

When openocd calls riscv_openocd_poll on core0:

A. if no hardware state change occurs，sequence is:
A.1. Check that core0 belongs to smp and start checking the status of all smp cores.
A.2. riscv_poll_hart checks core0. core0's status is running without change. next_action=RPH_NONE, running++
A.3. riscv_poll_hart checks core1. core1's status is running without change. . next_action=RPH_NONE, running++
A.4. Finally, should_remain_halt=0, should_remain_resume=0, halted=0, and running=2, nothing happen

B. If core0 hit soft breakpoints on hardware, one possible sequence is
B.1. Check that core0 belongs to smp and start to check the status of all smp cores.
B.2. riscv_poll_hart checks core0. core0's status is running without change. next_action=RPH_NONE, running++
[Between B.2 and B.3, core0 hits the breakpoint and becomes halted, and core1 also halted immediately due to hw haltgroup]
B.3. riscv_poll_hart checks core1 and finds that core1's status has changed from running to halted. The reason is RISCV_HALTED_GROUP. This results in next_action=RPH_NONE, halted++
4. Finally, should_remain_halt=0, should_remain_resume=0, halted=1, running=1. Enter else if (halted && running), and call riscv_halt to halt all cores in the smp group. In the process core0's target->status will also be corrected to halted, and next time riscv_openocd_poll will consider core0's status unchanged.
There is no problem in this case.

Let's re-assume that core0/core1 are both running and consider case C

C. If core0 hit semihosting ebeak on hardware, one possible sequence is:
C.1. Check that core0 belongs to smp and start to check the status of all smp cores.
C.2. riscv_poll_hart checks core0. core0's status is running without change. next_action=RPH_NONE, running++
[Between C.2 and C.3, core0 hit semihosting ebreak and halted, and core1 also halted immediately due to hw haltgroup]
C.3. riscv_poll_hart checks core1 and finds that core1 status has changed from running to halted. The reason is RISCV_HALT_GROUP. This results in next_action=RPH_NONE, halted++
C.4. Finally, should_remain_halt=0, should_remain_resume=0, halted=1, running=1. Enter else if (halted && running), and call riscv_halt to halt all cores in the smp group. In the process core0's target->status will also be corrected to halted, and next time riscv_openocd_poll will consider core0's status unchanged.
Now here's the problem: as status not change, poll will not enter this if , cannot realize that core0 is actually semihosting ebreak and will not handle it. core0 and core1 will remain halted.

Let's re-assume that core0/core1 are both running and consider case D

D. If core0 hit semihosting ebeak on the hardware, but the timing was earlier than in case C, one possible sequence is:
D.1. Check that core0 belongs to smp and start checking the status of all smp cores.
[Between D.1 and D.2, core0 hit semihosting ebreak and halted, and core1 also halted immediately due to hw haltgroup]
D.2. riscv_poll_hart checks core0 and finds that core0 has changed from running to halted. Because halt_reason is RISCV_HALT_EBREAK, semihosting will be further checked and processed, after that one possible result is next_action=RPH_RESUME, should_resume++
D.3. riscv_poll_hart checks core1 and finds that core1 status has changed from running to halted. The reason is RISCV_HALTED_GROUP. This results in next_action=RPH_NONE, halted++
D.4. Finally, should_remain_halt=0, should_remain_resume=1, halted=1, running=0. Enter the else if (should_resume), then call riscv_resume, and eventually core0 and core1 will return to the running state.
There is no problem in this case.

Thank you.

zqb-all · 2024-11-30T04:38:41Z

Things are a bit complicated.
The poll function of the software takes time to run, and hart may hit ebreak (breakpoint or semihosting) at any time, then harts in the same smp group, may be halted immediately (by hardware haltgroup) or may continue running temporarily (for example, hart is controlled by another DM and no hardware supports synchronous halt).
We expect the poll function to eventually adjust the smp group to halted or running. But before the software completes processing, these temporarily running harts may also hit ebreak, make the situation more complicated.

zqb-all · 2024-12-12T15:32:14Z

@JanMatCodasip @aap-sc Does my description of the issue help you understand what the issue is ?

JanMatCodasip · 2024-12-17T10:46:58Z

@zqb-all Thank you for describing the situation in more detail. It will take me some time to get back to it.

target/riscv: Ensure to handle all triggered a halt events

21d836a

If all current halted states are due to a halt group, then a new "triggered a halt" event has occurred.

lz-bro force-pushed the handle-all-trigger-halt branch from 7652128 to 21d836a Compare November 20, 2024 12:34

aap-sc reviewed Nov 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

target/riscv: Ensure to handle all triggered a halt events #1171

target/riscv: Ensure to handle all triggered a halt events #1171

lz-bro commented Nov 20, 2024

aap-sc Nov 20, 2024

lz-bro Nov 20, 2024 •

edited

Loading

lz-bro Nov 22, 2024

aap-sc Nov 22, 2024

JanMatCodasip commented Nov 28, 2024

zqb-all commented Nov 29, 2024 •

edited

Loading

zqb-all commented Nov 30, 2024

zqb-all commented Dec 12, 2024

JanMatCodasip commented Dec 17, 2024

	} else if (halted && running) {
	LOG_TARGET_DEBUG(target, "halt all; halted=%d",
	halted);
	riscv_halt(target);
	} else {

	if (halt_reason == RISCV_HALT_EBREAK) {
	int retval;
	/* Detect if this EBREAK is a semihosting request. If so, handle it. */
	switch (riscv_semihosting(target, &retval)) {
	case SEMIHOSTING_NONE:
	break;
	case SEMIHOSTING_WAITING:
	/* This hart should remain halted. */
	*next_action = RPH_REMAIN_HALTED;
	break;
	case SEMIHOSTING_HANDLED:
	/* This hart should be resumed, along with any other
	* harts that halted due to haltgroups. */
	*next_action = RPH_RESUME;
	return ERROR_OK;
	case SEMIHOSTING_ERROR:
	return retval;
	}
	}

target/riscv: Ensure to handle all triggered a halt events #1171

Are you sure you want to change the base?

target/riscv: Ensure to handle all triggered a halt events #1171

Conversation

lz-bro commented Nov 20, 2024

aap-sc Nov 20, 2024

Choose a reason for hiding this comment

lz-bro Nov 20, 2024 • edited Loading

Choose a reason for hiding this comment

lz-bro Nov 22, 2024

Choose a reason for hiding this comment

aap-sc Nov 22, 2024

Choose a reason for hiding this comment

JanMatCodasip commented Nov 28, 2024

zqb-all commented Nov 29, 2024 • edited Loading

zqb-all commented Nov 30, 2024

zqb-all commented Dec 12, 2024

JanMatCodasip commented Dec 17, 2024

lz-bro Nov 20, 2024 •

edited

Loading

zqb-all commented Nov 29, 2024 •

edited

Loading