Evaluation of synchronization edge cases #53

bruno-f-cruz · 2024-08-31T21:07:03Z

bruno-f-cruz
Aug 31, 2024
Maintainer

This discussion is relative to #48

Definitions

Let:

S -> Subordinate device (the one that received the synchronization pulse)
G -> Clock generator device
SE-> Synchronization pulse/event, the sequence of bits sent from G to S each second to allow S to synchronize
Heartbeat -> the period event sent by the board roughly every second

Discussion points

Immediate vs scheduled (deferred) synchronization

If I am not mistaken the two cores use slightly different approaches.
The RP2040 tries to schedule a predicted heartbeat into the future (1 second) using the most recent synch event. This prevents double hits (e.g. the if S is going faster than G, it is possible that G might emit 2 heartbeat events back to back since it will cross the full second on its own clock and will be brought back by SE.) since the next second will always be assumed to be "correct" and any corrections will be applied to the next second scheduling.

The AtMega seems to force the synchronization, this would in theory allow for better synchronization at the cost of potential double hits

Drift per second as part of the spec.

Should we enforce a maximum drift per second?
How should it be benchmarked?

How to handle edge cases of "going back in time"?

Do you think it should be allowed?
Should we have an arbitrary threshold between small out-of-synch (e.g. < drift per second) and larger out-of-synch events (e.g. when forcing a new timestamp in G)

bruno-f-cruz · 2024-09-03T23:35:03Z

bruno-f-cruz
Sep 3, 2024
Maintainer Author

After collecting some data with @Poofjunior we have also came across an interesting delay in clock synchronization events between G and S.
The experiment goes as follows:

Probe the heartbeat led from a timestamp generator gen 3. Remember that this LED should toggle on top of each full second
Probe the heartbeat led from a behavior board. This LED will also toggle on the full second.
It then follows that if the two boards are perfectly synchronized, the two leads should toggle at the same time.

What we see is that the harp behavior led turns on 170us after the Generator (we repeated the tests on two different boards). I should point out that, as I stressed out in a previous meeting, this has never been a problem before because the clock generators have always been mute in functionality past the synch event itself. However, should one add a digital input to the clock board and record events from the clock and the behavior board simultaneously, this delay would become apparent.

This is very interesting because prior to this tests we acquired some data to verify if the behavior board and a pico core board were synchronized (see below) and came across an unexplained gap of 215us. This may be not because the pico core is not synchronized to the generator, but instead because the atmega core devices are not synchronized perfectly to the atxmega generator.

If one were to take these differences in consideration, the final different is much closer to the expected 32us jitter (i.e. 215 - 170 = 45us).

I will add the same oscilloscope test ran between the clock generator and a pico device later this week.

As a final thought, I think that we should really revise the synchronization algorithm and ensure that we are all in agreement and also support new core implementations.

0 replies

bruno-f-cruz · 2024-09-04T15:07:02Z

bruno-f-cruz
Sep 4, 2024
Maintainer Author

Another option is to calibrate each of the subordinate devices by
R_TIMESTAMP_OFFSET that already exists in the core.
I don't like this solution as it seems like a bit of an hack not to mention that it would not work "within" architecture since clock generators would have a different delay than subordinates.

1 reply

bruno-f-cruz Sep 4, 2024
Maintainer Author

On a second thought this wouldnt work as the register spec doesn't even have enough expressivity to solve the issue. The current spec is a U8 that doesn't allow for negative values to be introduced nor does it allow for >255 (or 127 if signed) numeric offsets.

bruno-f-cruz · 2024-09-10T16:00:41Z

bruno-f-cruz
Sep 10, 2024
Maintainer Author

For completeness sake, heres the same experiment repeated with the rp2040 core (remember this is done via scheduling, hence the negative difference)

Sitting at around 66us which starts to explain the full 215us of difference between the two boards.

0 replies

bruno-f-cruz · 2024-09-10T23:40:11Z

bruno-f-cruz
Sep 10, 2024
Maintainer Author

Additional details/experiments

Heartbeats in Atxmega are not respecting the synchronization standard

Brown = clock synchronizer input
Red = Behavior board heartbeat led
White = White rabbit output triggered on the heartbeat

If everything was working as expected, one would assume that the heartbeat callbacks should be as close as possible to the 672us spec of the protocol. While this is true in the case of the White signal, the behavior board heartbeat appears to lag 225us (perfectly matching what we saw between the harp behavior and cuttlefish board above).

Heartbeat-led may not be a good way to benchmark synchrony.

We thought that maybe the led is not a perfect way to validate this approach. Indeed, if you see when the led of a clock generator blinks relative to ITS OWN generated clock synch signal (red against brown traces), we still see an unexpected 722 - 672 = 50us lag.

It seems the atxmega core is 170us off

The previous two plots together with the one shown here (#53 (comment)) where (well this is assuming that the delay to turn on the led on top of the full second is shared across all atxmega devices) we remove the contribution of the 50us delay, we end up with roughly 170us. I believe this should be considered a bug given the current spec that should be patch as soon as possible to ensure interoperability across all devices.

Moreover, we should write down somewhere what we expect to be validated from a new core. I believe that this should only be possible if the new core has a way to guarantee the materialization of a heartbeat callback in a very timely tight manner, but i am open to other suggestions. This way we could benchmark it against the generated clock signal and not to an already existing device.

1 reply

bruno-f-cruz Sep 11, 2024
Maintainer Author

It is still weird that if the delay is 170us the difference between boards is 225. Are the digital input timestamping also affected by those 50us tha affect the heartbeat led?

Alternatively, it could be that the difference really is 225 and even the clock generator itself is not properly aligned to its own synch out.

filcarv · 2024-11-29T22:49:21Z

filcarv
Nov 29, 2024
Maintainer

The atxmega delay of 208 +/- 16 us was fixed on this commit.

The current deviation is now a delay (A) to the the timestamp generator of 22 +/- 16 us or, in other words, between 6 and 38 which seems perfectly acceptable.

The oscilloscope photo shows the delay saved during 2 hours.
The blue falling is the second elapse on the timestamp generator and the yellow falling is the second elapse in the timestamp listener.
The photo shows that the drift stays inside the 32 us precision of the timer used.

We can do better, which is to have a delay (B) between -6 and 29 us. I've used instruments to reduce the timestamp listener's cristal to 0 and 80 degrees to see a clock drifting for both sides and to see significant drift. The test shows that the implementation of delay (B) is robust but very very close to the edges, so I've preferred the implementation of delay (A).

1 reply

bruno-f-cruz Dec 12, 2024
Maintainer Author

Thanks for looking into this! Lets move this discussion to the issue #62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation of synchronization edge cases #53

{{title}}

Replies: 5 comments 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Evaluation of synchronization edge cases #53

bruno-f-cruz Aug 31, 2024 Maintainer

Definitions

Discussion points

Immediate vs scheduled (deferred) synchronization

Drift per second as part of the spec.

How to handle edge cases of "going back in time"?

Replies: 5 comments · 3 replies

bruno-f-cruz Sep 3, 2024 Maintainer Author

bruno-f-cruz Sep 4, 2024 Maintainer Author

bruno-f-cruz Sep 4, 2024 Maintainer Author

bruno-f-cruz Sep 10, 2024 Maintainer Author

bruno-f-cruz Sep 10, 2024 Maintainer Author

Additional details/experiments

Heartbeats in Atxmega are not respecting the synchronization standard

Heartbeat-led may not be a good way to benchmark synchrony.

It seems the atxmega core is 170us off

bruno-f-cruz Sep 11, 2024 Maintainer Author

filcarv Nov 29, 2024 Maintainer

bruno-f-cruz Dec 12, 2024 Maintainer Author

bruno-f-cruz
Aug 31, 2024
Maintainer

Replies: 5 comments 3 replies

bruno-f-cruz
Sep 3, 2024
Maintainer Author

bruno-f-cruz
Sep 4, 2024
Maintainer Author

bruno-f-cruz Sep 4, 2024
Maintainer Author

bruno-f-cruz
Sep 10, 2024
Maintainer Author

bruno-f-cruz
Sep 10, 2024
Maintainer Author

bruno-f-cruz Sep 11, 2024
Maintainer Author

filcarv
Nov 29, 2024
Maintainer

bruno-f-cruz Dec 12, 2024
Maintainer Author