-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fw_update: implement retry with backoff #703
Conversation
Visit the preview URL for this PR (updated for commit f18f360): https://golioth-firmware-sdk-doxygen-dev--pr703-szczys-fw-upda-r3e0ovzc.web.app (expires Fri, 20 Dec 2024 21:36:49 GMT) 🔥 via Firebase Hosting GitHub Action 🌎 Sign: a9993e61697a3983f3479e468bcb0b616f9a0578 |
0e88b3b
to
051546f
Compare
1d412b1
to
16fbd8c
Compare
051546f
to
8507b7b
Compare
16fbd8c
to
5144bca
Compare
8507b7b
to
31d5b66
Compare
Codecov ReportAttention: Patch coverage is
|
31d5b66
to
505a7d0
Compare
be31140
to
c60c269
Compare
ec86659
to
288f72d
Compare
c82b93e
to
867e299
Compare
288f72d
to
83faa32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @szczys! I left one comment about rollover. We should think about it holistically across all the places where we're making comparisons or performing math using the result of golioth_sys_now_ms()
so I don't think that should block merging this PR.
src/fw_update.c
Outdated
|
||
static int32_t backoff_ms_before_expiration(struct fw_update_component_context *ctx) | ||
{ | ||
uint64_t actual_duration = golioth_sys_now_ms() - ctx->last_fail_ts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to consider overflow on FreeRTOS. The API for golioth_sys_now_ms()
returns a uint64_t, but it's calculated from the tick count which is a uint32_t (usually, technically it can be configured). Depending on the tick period this could rollover in as little as 49 days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this with rollover in mind. Though I did not realize we don't have access to the full bit-width of the type being return. Since I'm returning an int32 for the duration, why don't I just mask the uint64 tick count to uint32 before calculation? I think it will perform the same way.
867e299
to
7ec533d
Compare
83faa32
to
5888e01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @szcysy! Appreciate the extra attention to detail around rollovers 🙏
Record the timestamp at which a download fails and the desired backoff duration in the component context. Add functions to set these variables after each failure, check them to see if the backoff duration has expired, and to reset them at the beginning and end of a component download operation. Stored and calculated values for backoff timing use uint32_t. While golioth_sys_now_ms() returns uint64_t, not all platforms (ESP-IDF) support that full bit-width. Signed-off-by: Mike Szczys <[email protected]>
Move the mutex lock/unlock functions inside of received_new_target_component() as it should never be called without guarding the manifest pointer which is also updated on another thread by the on_ota_manifest callback. Check stored target component against what was received with manifest and use along with the backoff_duration_ms time value in the context to detect when a component is already being downloaded. Reset the backoff values in the context when a new component is received from the manifest. Signed-off-by: Mike Szczys <[email protected]>
When awaiting a retry, use the timeout value of the manifest semaphore to wait until the appropriate backoff duration has passed before retrying a download. This allows a new manifest to be received and acted upon while awaiting a retry. Set up the component for a retry as soon as the download step has finished (whether it is a success or failure). The retry continues to be in effect until all validation and other steps of the fw_update process have been confirmed. Finally, reset the backoff values to indicate a retry is not necessary. Signed-off-by: Mike Szczys <[email protected]>
Use the new GOLIOTH_OTA_REASON_AWAIT_RETRY reason when reporting IDLE state during retry backoff period. Signed-off-by: Mike Szczys <[email protected]>
5888e01
to
f18f360
Compare
Resolves https://github.com/golioth/firmware-issue-tracker/issues/699