-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropped samples after system clock adjustments #5019
Comments
In my opinion, the stack functionality should be as independent of the system clock as possible. Otherwise, hard-to-diagnose errors will occur at random points in time (when the system clock jumps forward or backward), such as @ma30002000 pointed out above. From this point of view, the use of
Please comment on the five points presented above. |
Hi @ma30002000, |
Any indications and hints concerning the root cause (and a possible fix) would be highly appreciated.. |
Thank you for you patience here. I have rebased #5018 and added some commits that
These changes improve resilience against Regarding lifespan, section 2.2.3.16 of the DDS Standard states:
So there's not much we can do here. |
@ma30002000 @i-and FYI, we have just merged #5018 to master, and ordered backports to the supported versions (2.14.x, and 2.10.x) |
Hi @MiguelCompany, |
The proposed changes do not completely solve the issue. To confirm this, I wrote a test based on the example of hello_world and the revision of git 70314ce (Date: Fri Sep 6 08:48:14 2024 +0200). diff --git a/examples/cpp/hello_world/PublisherApp.cpp b/examples/cpp/hello_world/PublisherApp.cpp
index 63b433113..3bb9cc515 100644
--- a/examples/cpp/hello_world/PublisherApp.cpp
+++ b/examples/cpp/hello_world/PublisherApp.cpp
@@ -130,6 +130,16 @@ void PublisherApp::run()
{
while (!is_stopped() && ((samples_ == 0) || (hello_.index() < samples_)))
{
+ if (hello_.index() >= 5)
+ {
+ std::cout << "Index=" << hello_.index() << ". Paused forever..." << std::endl;
+ std::unique_lock<std::mutex> period_lock(mutex_);
+ cv_.wait(period_lock, [&]()
+ {
+ return is_stopped();
+ });
+ }
+
if (publish())
{
std::cout << "Message: '" << hello_.message() << "' with index: '" << hello_.index()
diff --git a/examples/cpp/hello_world/WaitsetSubscriberApp.cpp b/examples/cpp/hello_world/WaitsetSubscriberApp.cpp
index f039365b6..2581a83f6 100644
--- a/examples/cpp/hello_world/WaitsetSubscriberApp.cpp
+++ b/examples/cpp/hello_world/WaitsetSubscriberApp.cpp
@@ -21,6 +21,7 @@
#include <condition_variable>
#include <stdexcept>
+#include <thread>
#include <fastdds/dds/core/condition/GuardCondition.hpp>
#include <fastdds/dds/core/condition/WaitSet.hpp>
@@ -134,11 +135,11 @@ void WaitsetSubscriberApp::run()
reader_->get_subscription_matched_status(status_);
if (status_.current_count_change == 1)
{
- std::cout << "Waitset Subscriber matched." << std::endl;
+ std::cout << "Waitset Subscriber matched. Handle=" << status_.last_publication_handle << std::endl;
}
else if (status_.current_count_change == -1)
{
- std::cout << "Waitset Subscriber unmatched." << std::endl;
+ std::cout << "Waitset Subscriber unmatched." << status_.last_publication_handle << std::endl;
}
else
{
@@ -150,6 +151,12 @@ void WaitsetSubscriberApp::run()
if (changed_statuses.is_active(StatusMask::data_available()))
{
SampleInfo info;
+ if (received_samples_ == 1)
+ {
+ auto pause = std::chrono::milliseconds(60000);
+ std::cout << "When data available WAITING for " << pause.count() << std::endl;
+ std::this_thread::sleep_for(pause);
+ }
while ((!is_stopped()) &&
(RETCODE_OK == reader_->take_next_sample(&hello_, &info)))
{
@@ -158,7 +165,7 @@ void WaitsetSubscriberApp::run()
received_samples_++;
// Print Hello world message data
std::cout << "Message: '" << hello_.message() << "' with index: '"
- << hello_.index() << "' RECEIVED" << std::endl;
+ << hello_.index() << "' RECEIVED from " << info.publication_handle << std::endl;
if (samples_ > 0 && (received_samples_ >= samples_))
{
stop();
diff --git a/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp b/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp
index 0dcd18a45..f0bcfed2e 100644
--- a/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp
+++ b/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp
@@ -262,6 +262,7 @@ bool DataReaderHistory::received_change_keep_last(
DataReaderInstance::ChangeCollection& instance_changes = vit->second->cache_changes;
if (instance_changes.size() < static_cast<size_t>(history_qos_.depth))
{
+ std::cout << "history_qos_.depth=" << history_qos_.depth << std::endl;
ret_value = true;
}
else
@@ -270,11 +271,17 @@ bool DataReaderHistory::received_change_keep_last(
CacheChange_t* first_change = instance_changes.at(0);
if (a_change->sourceTimestamp >= first_change->sourceTimestamp)
{
+ std::cout << "As the instance is ordered by source timestamp, we can always remove the first one. "
+ "history_qos_.depth=" << history_qos_.depth << ", a_change->sourceTimestamp=" << a_change->sourceTimestamp <<
+ ", first_change->sourceTimestamp=" << first_change->sourceTimestamp << std::endl;
// As the instance is ordered by source timestamp, we can always remove the first one.
ret_value = remove_change_sub(first_change);
}
else
{
+ std::cout << "Received change is older than oldest, and should be discarded. "
+ "history_qos_.depth=" << history_qos_.depth << ", a_change->sourceTimestamp=" << a_change->sourceTimestamp <<
+ ", first_change->sourceTimestamp=" << first_change->sourceTimestamp << std::endl;
// Received change is older than oldest, and should be discarded
return true;
} After compiling the hello_world example, open three terminals and run the following in each:
Subscriber running. Please press Ctrl+C to stop the Subscriber at any time.
Waitset Subscriber matched. Handle=1.f.f7.f3.ed.58.f4.9f.0.0.0.0.0.0.1.3
history_qos_.depth=1
Message: 'Hello world' with index: '1' RECEIVED from 1.f.f7.f3.ed.58.f4.9f.0.0.0.0.0.0.1.3
history_qos_.depth=1
When data available WAITING for 60000
As the instance is ordered by source timestamp, we can always remove the first one. history_qos_.depth=1, a_change->sourceTimestamp=1725831240.997453690, first_change->sourceTimestamp=1725831240.896825205
As the instance is ordered by source timestamp, we can always remove the first one. history_qos_.depth=1, a_change->sourceTimestamp=1725831241.97925046, first_change->sourceTimestamp=1725831240.997453690
As the instance is ordered by source timestamp, we can always remove the first one. history_qos_.depth=1, a_change->sourceTimestamp=1725831241.198400104, first_change->sourceTimestamp=1725831241.97925046
Received change is older than oldest, and should be discarded. history_qos_.depth=1, a_change->sourceTimestamp=1725829275.63839469, first_change->sourceTimestamp=1725831241.198400104
Received change is older than oldest, and should be discarded. history_qos_.depth=1, a_change->sourceTimestamp=1725829275.164280786, first_change->sourceTimestamp=1725831241.198400104
Received change is older than oldest, and should be discarded. history_qos_.depth=1, a_change->sourceTimestamp=1725829275.264997918, first_change->sourceTimestamp=1725831241.198400104
Received change is older than oldest, and should be discarded. history_qos_.depth=1, a_change->sourceTimestamp=1725829275.365657331, first_change->sourceTimestamp=1725831241.198400104
Received change is older than oldest, and should be discarded. history_qos_.depth=1, a_change->sourceTimestamp=1725829275.466348101, first_change->sourceTimestamp=1725831241.198400104
Message: 'Hello world' with index: '5' RECEIVED from 1.f.f7.f3.ed.58.f4.9f.0.0.0.0.0.0.1.3
Waitset Subscriber matched. Handle=1.f.f7.f3.b.59.9d.7d.0.0.0.0.0.0.1.3 Publisher running. Please press Ctrl+C to stop the Publisher at any time.
Publisher matched.
Message: 'Hello world' with index: '1' SENT
Message: 'Hello world' with index: '2' SENT
Message: 'Hello world' with index: '3' SENT
Message: 'Hello world' with index: '4' SENT
Message: 'Hello world' with index: '5' SENT
Index=5. Paused forever... Publisher running. Please press Ctrl+C to stop the Publisher at any time.
Publisher matched.
Message: 'Hello world' with index: '1' SENT
Message: 'Hello world' with index: '2' SENT
Message: 'Hello world' with index: '3' SENT
Message: 'Hello world' with index: '4' SENT
Message: 'Hello world' with index: '5' SENT
Index=5. Paused forever... As you can see from terminal 1, all samples from the second publisher were discarded in the method About the new implementation of the function Fast-DDS/src/cpp/utils/time_t_helpers.hpp Lines 37 to 57 in 70314ce
I suggest not to do this and return to its previous implementation for the following reasons:
Based on the above, I propose to correct the situation with the system clock where they are used and at the same time use the standard system clock (without freezing at the time of FastDDS launch). Section 2.2.3.16 of the DDS Standard contains the following: In order for the Service (FastDDS) to detect the lack of synchronization, for example, you can implement an additional parameter set by the user. At the same time, the user would set this parameter to the "do not use a source timestamp" state in the case when he knew for sure that the time in the system was not synchronized. This would probably be a compromise solution that would not contradict the provision of the DDS standard. |
@i-and Thank you for testing. First, let me remark one of my comments above
So yes, the merged PR was only addressing the case of updating the system clock after the application has been launched.
We will try to review the implementation of the reader history to improve this situation, but in the mean time I suggest you use |
@MiguelCompany Thanks for the quick response.
When rework the reader's history, please take into account the actual state of |
Hi, sorry for the long delay.... I also do agree on reverting the implementation of A proper implementation of I helped myself with the following patch, allowing system_clock jumps on single host only usage of fast dds diff --git a/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp b/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp
index d8455e497..94ae73ad2 100644
--- a/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp
+++ b/src/cpp/fastdds/subscriber/history/DataReaderHistory.cpp
@@ -274,7 +274,9 @@ bool DataReaderHistory::received_change_keep_last(
{
// Try to substitute the oldest sample.
CacheChange_t* first_change = instance_changes.at(0);
- if (a_change->sourceTimestamp >= first_change->sourceTimestamp)
+ if (a_change->writerGUID == first_change->writerGUID ?
+ a_change->sequenceNumber >= first_change->sequenceNumber :
+ a_change->sourceTimestamp >= first_change->sourceTimestamp)
{
// As the instance is ordered by source timestamp, we can always remove the first one.
ret_value = remove_change_sub(first_change);
@@ -808,7 +810,9 @@ bool DataReaderHistory::completed_change_keep_last(
{
// Try to substitute the oldest sample.
CacheChange_t* first_change = instance_changes.at(0);
- if (change->sourceTimestamp >= first_change->sourceTimestamp)
+ if (change->writerGUID == first_change->writerGUID ?
+ change->sequenceNumber >= first_change->sequenceNumber :
+ change->sourceTimestamp >= first_change->sourceTimestamp)
{
// As the instance is ordered by source timestamp, we can always remove the first one.
ret_value = remove_change_sub(first_change); This should make sample substitution consistent with the history ordering in my opinion. |
@ma30002000 Thank you very much for that diff! |
@MiguelCompany Yes, the attached patch (together with the changes in #5018 + patching of high_resolution_clock) fixed the issue in my specific use case (single host system). |
Is there an already existing issue for this?
Expected behavior
When dealing with system clock adjustments (manually or due to clock server synchronization), my subscriber drops all samples once the system has been set back into the past.
This seems to be due to DataReaderHistory::received_change_keep_last / DataReaderHistory::completed_change_keep_last comparing all the incoming samples' sourceTimestamp (which will be a timestamp in the past once the clock has been changed) to the first change's, leading to all samples to be dropped:
If I remove the if and simply drop the first sample everything seems to work unaffected from the system clock adjustments.
Note that I would expect the current fast dds behaviour if DestinationOrderQosPolicy were implemented (which it is not) and set to BY_SOURCE_TIMESTAMP_DESTINATIONORDER_QOS. However, according to the manual should be BY_RECEPTION_TIMESTAMP_DESTINATIONORDER_QOS .
Note that I have created a pull request for additional observations when the system clock is adjusted (PR #5018).
This might be related to #4850.
Current behavior
Samples get dropped when publisher's system clock is set into the past.
Steps to reproduce
Set back system clock after disabling automatic synchronization via NTP.
Fast DDS version/commit
2.14.2
Platform/Architecture
Ubuntu Focal 20.04 amd64
Transport layer
Shared Memory Transport (SHM)
Additional context
No response
XML configuration file
No response
Relevant log output
No response
Network traffic capture
No response
The text was updated successfully, but these errors were encountered: