Message ID | 20200928234945.3417905-1-jacob.e.keller@intel.com |
---|---|
State | New |
Headers | show |
Series | [RFC,iproute2-next] devlink: display elapsed time during flash update | expand |
On Mon, 28 Sep 2020 16:49:45 -0700 Jacob Keller wrote: > For some devices, updating the flash can take significant time during > operations where no status can meaningfully be reported. This can be > somewhat confusing to a user who sees devlink appear to hang on the > terminal waiting for the device to update. > > Provide a ticking counter of the time elapsed since the previous status > message in order to make it clear that the program is not simply stuck. > > Do not display this message unless a few seconds have passed since the > last status update. Additionally, if the previous status notification > included a timeout, display this as part of the message. If we do not > receive an error or a new status without that time out, replace it with > the text "timeout reached". > > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> > --- > Sending this as an RFC because I doubt this is the best implementation. For > one, I get a weird display issue where the cursor doesn't always end up on > the end of line in my shell.. The % display works properly, so I'm not sure > what's wrong here. > > Second, even though select should be timing out every 1/10th of a second for > screen updates, I don't seem to get that behavior in my test. It takes about > 8 to 10 seconds for the first elapsed time message to be displayed, and it > updates really slowly. Is select just not that precise? I even tried using a > timeout of zero, but this means we refresh way too often and it looks bad. I > am not sure what is wrong here... Strange. Did you strace it? Perhaps it's some form of output buffering? > diff --git a/devlink/devlink.c b/devlink/devlink.c > index 0374175eda3d..7fb4b5ef1ebe 100644 > --- a/devlink/devlink.c > +++ b/devlink/devlink.c > @@ -33,6 +33,7 @@ > #include <sys/select.h> > #include <sys/socket.h> > #include <sys/types.h> > +#include <sys/time.h> > #include <rt_names.h> > > #include "version.h" > @@ -3066,6 +3067,9 @@ static int cmd_dev_info(struct dl *dl) > > struct cmd_dev_flash_status_ctx { > struct dl *dl; > + struct timeval last_status_msg; > + char timeout_msg[128]; Really you just need the length (as returned by snprintf), right? > + uint64_t timeout; > char *last_msg; > char *last_component; > uint8_t not_first:1, > @@ -3083,6 +3087,14 @@ static int nullstrcmp(const char *str1, const char *str2) > return str1 ? 1 : -1; > } > > +static void cmd_dev_flash_clear_elapsed_time(struct cmd_dev_flash_status_ctx *ctx) > +{ > + int i; > + > + for (i = 0; i < strlen(ctx->timeout_msg); i++) > + pr_out_tty("\b"); I wonder if it's not easier to \r, I guess the existing code likes \b so be it. > +} > + > static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data) > { > struct cmd_dev_flash_status_ctx *ctx = data; > +static void cmd_dev_flash_time_elapsed(struct cmd_dev_flash_status_ctx *ctx) > +{ > + struct timeval now, res; > + > + gettimeofday(&now, NULL); > + timersub(&now, &ctx->last_status_msg, &res); > + > + /* Don't start displaying a timeout message until we've elapsed a few > + * seconds... > + */ > + if (res.tv_sec > 3) { > + uint elapsed_m, elapsed_s; This may be the first uint use in iproute2.. > + /* clear the last elapsed time message, if we have one */ > + cmd_dev_flash_clear_elapsed_time(ctx); > + > + elapsed_m = res.tv_sec / 60; > + elapsed_s = res.tv_sec % 60;
On 9/29/2020 10:18 AM, Jakub Kicinski wrote: > On Mon, 28 Sep 2020 16:49:45 -0700 Jacob Keller wrote: >> For some devices, updating the flash can take significant time during >> operations where no status can meaningfully be reported. This can be >> somewhat confusing to a user who sees devlink appear to hang on the >> terminal waiting for the device to update. >> >> Provide a ticking counter of the time elapsed since the previous status >> message in order to make it clear that the program is not simply stuck. >> >> Do not display this message unless a few seconds have passed since the >> last status update. Additionally, if the previous status notification >> included a timeout, display this as part of the message. If we do not >> receive an error or a new status without that time out, replace it with >> the text "timeout reached". >> >> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> >> --- >> Sending this as an RFC because I doubt this is the best implementation. For >> one, I get a weird display issue where the cursor doesn't always end up on >> the end of line in my shell.. The % display works properly, so I'm not sure >> what's wrong here. >> >> Second, even though select should be timing out every 1/10th of a second for >> screen updates, I don't seem to get that behavior in my test. It takes about >> 8 to 10 seconds for the first elapsed time message to be displayed, and it >> updates really slowly. Is select just not that precise? I even tried using a >> timeout of zero, but this means we refresh way too often and it looks bad. I >> am not sure what is wrong here... > > Strange. Did you strace it? Perhaps it's some form of output buffering? > Haven't yet, just noticed the weird output behavior and timing inconsistency. >> diff --git a/devlink/devlink.c b/devlink/devlink.c >> index 0374175eda3d..7fb4b5ef1ebe 100644 >> --- a/devlink/devlink.c >> +++ b/devlink/devlink.c >> @@ -33,6 +33,7 @@ >> #include <sys/select.h> >> #include <sys/socket.h> >> #include <sys/types.h> >> +#include <sys/time.h> >> #include <rt_names.h> >> >> #include "version.h" >> @@ -3066,6 +3067,9 @@ static int cmd_dev_info(struct dl *dl) >> >> struct cmd_dev_flash_status_ctx { >> struct dl *dl; >> + struct timeval last_status_msg; >> + char timeout_msg[128]; > > Really you just need the length (as returned by snprintf), right? Hmm, yea probably don't need the full message. > >> + uint64_t timeout; >> char *last_msg; >> char *last_component; >> uint8_t not_first:1, >> @@ -3083,6 +3087,14 @@ static int nullstrcmp(const char *str1, const char *str2) >> return str1 ? 1 : -1; >> } >> >> +static void cmd_dev_flash_clear_elapsed_time(struct cmd_dev_flash_status_ctx *ctx) >> +{ >> + int i; >> + >> + for (i = 0; i < strlen(ctx->timeout_msg); i++) >> + pr_out_tty("\b"); > > I wonder if it's not easier to \r, I guess the existing code likes \b > so be it. > If we '\r', we'd have to re-write the whole line, right? Hmm. I'm also thinking that perhaps part of the display issue is that these "\b" outputs aren't being sent as one unit so some sort of buffering results in the inconsistent display. Note, the text does fully display properly, it just doesn't seem to update the cursor location correctly, resulting in the cursor position bouncing around rapidly and being very distracting. >> +} >> + >> static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data) >> { >> struct cmd_dev_flash_status_ctx *ctx = data; > >> +static void cmd_dev_flash_time_elapsed(struct cmd_dev_flash_status_ctx *ctx) >> +{ >> + struct timeval now, res; >> + >> + gettimeofday(&now, NULL); >> + timersub(&now, &ctx->last_status_msg, &res); >> + >> + /* Don't start displaying a timeout message until we've elapsed a few >> + * seconds... >> + */ >> + if (res.tv_sec > 3) { >> + uint elapsed_m, elapsed_s; > > This may be the first uint use in iproute2.. >> + /* clear the last elapsed time message, if we have one */ >> + cmd_dev_flash_clear_elapsed_time(ctx); >> + >> + elapsed_m = res.tv_sec / 60; >> + elapsed_s = res.tv_sec % 60;
On Tue, Sep 29, 2020 at 10:56:23AM -0700, Jacob Keller wrote: > > > On 9/29/2020 10:18 AM, Jakub Kicinski wrote: > > On Mon, 28 Sep 2020 16:49:45 -0700 Jacob Keller wrote: > >> For some devices, updating the flash can take significant time during > >> operations where no status can meaningfully be reported. This can be > >> somewhat confusing to a user who sees devlink appear to hang on the > >> terminal waiting for the device to update. > >> > >> Provide a ticking counter of the time elapsed since the previous status > >> message in order to make it clear that the program is not simply stuck. > >> > >> Do not display this message unless a few seconds have passed since the > >> last status update. Additionally, if the previous status notification > >> included a timeout, display this as part of the message. If we do not > >> receive an error or a new status without that time out, replace it with > >> the text "timeout reached". > >> > >> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> > >> --- > >> Sending this as an RFC because I doubt this is the best implementation. For > >> one, I get a weird display issue where the cursor doesn't always end up on > >> the end of line in my shell.. The % display works properly, so I'm not sure > >> what's wrong here. > >> > >> Second, even though select should be timing out every 1/10th of a second for > >> screen updates, I don't seem to get that behavior in my test. It takes about > >> 8 to 10 seconds for the first elapsed time message to be displayed, and it > >> updates really slowly. Is select just not that precise? I even tried using a > >> timeout of zero, but this means we refresh way too often and it looks bad. I > >> am not sure what is wrong here... > > > > Strange. Did you strace it? Perhaps it's some form of output buffering? > > > > Haven't yet, just noticed the weird output behavior and timing > inconsistency. Might be similar to this: https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=8e6bce735a132150c23503a55ea0aef55a01425f
On 9/29/20 10:56 AM, Jacob Keller wrote: > On 9/29/2020 10:18 AM, Jakub Kicinski wrote: >> On Mon, 28 Sep 2020 16:49:45 -0700 Jacob Keller wrote: >>> For some devices, updating the flash can take significant time during >>> operations where no status can meaningfully be reported. This can be >>> somewhat confusing to a user who sees devlink appear to hang on the >>> terminal waiting for the device to update. >>> >>> Provide a ticking counter of the time elapsed since the previous status >>> message in order to make it clear that the program is not simply stuck. >>> >>> Do not display this message unless a few seconds have passed since the >>> last status update. Additionally, if the previous status notification >>> included a timeout, display this as part of the message. If we do not >>> receive an error or a new status without that time out, replace it with >>> the text "timeout reached". >>> >>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> >>> --- >>> Sending this as an RFC because I doubt this is the best implementation. For >>> one, I get a weird display issue where the cursor doesn't always end up on >>> the end of line in my shell.. The % display works properly, so I'm not sure >>> what's wrong here. >>> >>> Second, even though select should be timing out every 1/10th of a second for >>> screen updates, I don't seem to get that behavior in my test. It takes about >>> 8 to 10 seconds for the first elapsed time message to be displayed, and it >>> updates really slowly. Is select just not that precise? I even tried using a >>> timeout of zero, but this means we refresh way too often and it looks bad. I >>> am not sure what is wrong here... >> Strange. Did you strace it? Perhaps it's some form of output buffering? >> > Haven't yet, just noticed the weird output behavior and timing > inconsistency. Add a fflush(stdout) at the end of pr_out_tty()? Or maybe simply disable stdout buffering with setvbuf(). >>> diff --git a/devlink/devlink.c b/devlink/devlink.c >>> index 0374175eda3d..7fb4b5ef1ebe 100644 >>> --- a/devlink/devlink.c >>> +++ b/devlink/devlink.c >>> @@ -33,6 +33,7 @@ >>> #include <sys/select.h> >>> #include <sys/socket.h> >>> #include <sys/types.h> >>> +#include <sys/time.h> >>> #include <rt_names.h> >>> >>> #include "version.h" >>> @@ -3066,6 +3067,9 @@ static int cmd_dev_info(struct dl *dl) >>> >>> struct cmd_dev_flash_status_ctx { >>> struct dl *dl; >>> + struct timeval last_status_msg; Having some 'time' concept in this name will make it less likely for the reader to confuse it with something that holds a text message - 't_', or '_secs' or something. sln
On 9/29/2020 11:07 AM, Ido Schimmel wrote: > On Tue, Sep 29, 2020 at 10:56:23AM -0700, Jacob Keller wrote: >> >> >> On 9/29/2020 10:18 AM, Jakub Kicinski wrote: >>> On Mon, 28 Sep 2020 16:49:45 -0700 Jacob Keller wrote: >>>> For some devices, updating the flash can take significant time during >>>> operations where no status can meaningfully be reported. This can be >>>> somewhat confusing to a user who sees devlink appear to hang on the >>>> terminal waiting for the device to update. >>>> >>>> Provide a ticking counter of the time elapsed since the previous status >>>> message in order to make it clear that the program is not simply stuck. >>>> >>>> Do not display this message unless a few seconds have passed since the >>>> last status update. Additionally, if the previous status notification >>>> included a timeout, display this as part of the message. If we do not >>>> receive an error or a new status without that time out, replace it with >>>> the text "timeout reached". >>>> >>>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> >>>> --- >>>> Sending this as an RFC because I doubt this is the best implementation. For >>>> one, I get a weird display issue where the cursor doesn't always end up on >>>> the end of line in my shell.. The % display works properly, so I'm not sure >>>> what's wrong here. >>>> >>>> Second, even though select should be timing out every 1/10th of a second for >>>> screen updates, I don't seem to get that behavior in my test. It takes about >>>> 8 to 10 seconds for the first elapsed time message to be displayed, and it >>>> updates really slowly. Is select just not that precise? I even tried using a >>>> timeout of zero, but this means we refresh way too often and it looks bad. I >>>> am not sure what is wrong here... >>> >>> Strange. Did you strace it? Perhaps it's some form of output buffering? >>> >> >> Haven't yet, just noticed the weird output behavior and timing >> inconsistency. > > Might be similar to this: > https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=8e6bce735a132150c23503a55ea0aef55a01425f > Yep, I needed the fflush! That resolved the display issue! Thanks, Jake
On 9/29/2020 11:45 AM, Jacob Keller wrote: > > > On 9/29/2020 11:07 AM, Ido Schimmel wrote: >> On Tue, Sep 29, 2020 at 10:56:23AM -0700, Jacob Keller wrote: >>> >>> >>> On 9/29/2020 10:18 AM, Jakub Kicinski wrote: >>>> On Mon, 28 Sep 2020 16:49:45 -0700 Jacob Keller wrote: >>>>> For some devices, updating the flash can take significant time during >>>>> operations where no status can meaningfully be reported. This can be >>>>> somewhat confusing to a user who sees devlink appear to hang on the >>>>> terminal waiting for the device to update. >>>>> >>>>> Provide a ticking counter of the time elapsed since the previous status >>>>> message in order to make it clear that the program is not simply stuck. >>>>> >>>>> Do not display this message unless a few seconds have passed since the >>>>> last status update. Additionally, if the previous status notification >>>>> included a timeout, display this as part of the message. If we do not >>>>> receive an error or a new status without that time out, replace it with >>>>> the text "timeout reached". >>>>> >>>>> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> >>>>> --- >>>>> Sending this as an RFC because I doubt this is the best implementation. For >>>>> one, I get a weird display issue where the cursor doesn't always end up on >>>>> the end of line in my shell.. The % display works properly, so I'm not sure >>>>> what's wrong here. >>>>> >>>>> Second, even though select should be timing out every 1/10th of a second for >>>>> screen updates, I don't seem to get that behavior in my test. It takes about >>>>> 8 to 10 seconds for the first elapsed time message to be displayed, and it >>>>> updates really slowly. Is select just not that precise? I even tried using a >>>>> timeout of zero, but this means we refresh way too often and it looks bad. I >>>>> am not sure what is wrong here... >>>> >>>> Strange. Did you strace it? Perhaps it's some form of output buffering? >>>> >>> >>> Haven't yet, just noticed the weird output behavior and timing >>> inconsistency. >> >> Might be similar to this: >> https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=8e6bce735a132150c23503a55ea0aef55a01425f >> > > Yep, I needed the fflush! That resolved the display issue! > > Thanks, > Jake > This also appears to have resolved the weird timing aspect.. the line wasn't flushed right away. strace with relative timestamps confirmed that select was infact waking up. I'll post a non-RFC version that includes the suggested cleanups from Jakub. Shannon, I'd appreciate if you could try out the next revision with your device to see if it looks good! Thanks, Jake
diff --git a/devlink/devlink.c b/devlink/devlink.c index 0374175eda3d..7fb4b5ef1ebe 100644 --- a/devlink/devlink.c +++ b/devlink/devlink.c @@ -33,6 +33,7 @@ #include <sys/select.h> #include <sys/socket.h> #include <sys/types.h> +#include <sys/time.h> #include <rt_names.h> #include "version.h" @@ -3066,6 +3067,9 @@ static int cmd_dev_info(struct dl *dl) struct cmd_dev_flash_status_ctx { struct dl *dl; + struct timeval last_status_msg; + char timeout_msg[128]; + uint64_t timeout; char *last_msg; char *last_component; uint8_t not_first:1, @@ -3083,6 +3087,14 @@ static int nullstrcmp(const char *str1, const char *str2) return str1 ? 1 : -1; } +static void cmd_dev_flash_clear_elapsed_time(struct cmd_dev_flash_status_ctx *ctx) +{ + int i; + + for (i = 0; i < strlen(ctx->timeout_msg); i++) + pr_out_tty("\b"); +} + static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data) { struct cmd_dev_flash_status_ctx *ctx = data; @@ -3116,6 +3128,11 @@ static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data) return MNL_CB_STOP; } + cmd_dev_flash_clear_elapsed_time(ctx); + gettimeofday(&ctx->last_status_msg, NULL); + ctx->timeout_msg[0] = '\0'; + ctx->timeout = 0; + if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG]) msg = mnl_attr_get_str(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_MSG]); if (tb[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT]) @@ -3124,6 +3141,8 @@ static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data) done = mnl_attr_get_u64(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE]); if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL]) total = mnl_attr_get_u64(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL]); + if (tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT]) + ctx->timeout = mnl_attr_get_u64(tb[DEVLINK_ATTR_FLASH_UPDATE_STATUS_TIMEOUT]); if (!nullstrcmp(msg, ctx->last_msg) && !nullstrcmp(component, ctx->last_component) && @@ -3155,11 +3174,66 @@ static int cmd_dev_flash_status_cb(const struct nlmsghdr *nlh, void *data) return MNL_CB_STOP; } +static void cmd_dev_flash_time_elapsed(struct cmd_dev_flash_status_ctx *ctx) +{ + struct timeval now, res; + + gettimeofday(&now, NULL); + timersub(&now, &ctx->last_status_msg, &res); + + /* Don't start displaying a timeout message until we've elapsed a few + * seconds... + */ + if (res.tv_sec > 3) { + uint elapsed_m, elapsed_s; + + /* clear the last elapsed time message, if we have one */ + cmd_dev_flash_clear_elapsed_time(ctx); + + elapsed_m = res.tv_sec / 60; + elapsed_s = res.tv_sec % 60; + + /** + * If we've elapsed a few seconds without receiving any status + * notification from the device, we display a time elapsed + * message. This has a few possible formats: + * + * 1) just time elapsed, when no timeout was provided + * " ( Xm Ys )" + * 2) time elapsed out of a timeout that came from the device + * driver via DEVLINK_CMD_FLASH_UPDATE_STATUS_TIMEOUT + * " ( Xm Ys : Am Ys)" + * 3) time elapsed if we still receive no status after + * reaching the provided timeout. + * " ( Xm Ys : timeout reached )" + */ + if (!ctx->timeout) { + snprintf(ctx->timeout_msg, sizeof(ctx->timeout_msg), + " ( %um %us )", elapsed_m, elapsed_s); + } else if (res.tv_sec <= ctx->timeout) { + uint timeout_m, timeout_s; + + timeout_m = ctx->timeout / 60; + timeout_s = ctx->timeout % 60; + + snprintf(ctx->timeout_msg, sizeof(ctx->timeout_msg), + " ( %um %us : %um %us )", + elapsed_m, elapsed_s, timeout_m, timeout_s); + } else { + snprintf(ctx->timeout_msg, sizeof(ctx->timeout_msg), + " ( %um %us : timeout reached )", elapsed_m, elapsed_s); + } + + pr_out_tty("%s", ctx->timeout_msg); + } +} + static int cmd_dev_flash_fds_process(struct cmd_dev_flash_status_ctx *ctx, struct mnlg_socket *nlg_ntf, int pipe_r) { int nlfd = mnlg_socket_get_fd(nlg_ntf); + struct timeval timeout; fd_set fds[3]; int fdmax; int i; @@ -3174,7 +3248,14 @@ static int cmd_dev_flash_fds_process(struct cmd_dev_flash_status_ctx *ctx, if (nlfd >= fdmax) fdmax = nlfd + 1; - while (select(fdmax, &fds[0], &fds[1], &fds[2], NULL) < 0) { + /* select only for a short while (1/20th of a second) in order to + * allow periodically updating the screen with an elapsed time + * indicator. + */ + timeout.tv_sec = 0; + timeout.tv_usec = 100000; + + while (select(fdmax, &fds[0], &fds[1], &fds[2], &timeout) < 0) { if (errno == EINTR) continue; pr_err("select() failed\n"); @@ -3196,6 +3277,7 @@ static int cmd_dev_flash_fds_process(struct cmd_dev_flash_status_ctx *ctx, return err2; ctx->flash_done = 1; } + cmd_dev_flash_time_elapsed(ctx); return 0; } @@ -3256,6 +3338,11 @@ static int cmd_dev_flash(struct dl *dl) } close(pipe_w); + /* initialize starting time to allow comparison for when to begin + * displaying a time elapsed message. + */ + gettimeofday(&ctx.last_status_msg, NULL); + do { err = cmd_dev_flash_fds_process(&ctx, nlg_ntf, pipe_r); if (err)
For some devices, updating the flash can take significant time during operations where no status can meaningfully be reported. This can be somewhat confusing to a user who sees devlink appear to hang on the terminal waiting for the device to update. Provide a ticking counter of the time elapsed since the previous status message in order to make it clear that the program is not simply stuck. Do not display this message unless a few seconds have passed since the last status update. Additionally, if the previous status notification included a timeout, display this as part of the message. If we do not receive an error or a new status without that time out, replace it with the text "timeout reached". Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> --- Sending this as an RFC because I doubt this is the best implementation. For one, I get a weird display issue where the cursor doesn't always end up on the end of line in my shell.. The % display works properly, so I'm not sure what's wrong here. Second, even though select should be timing out every 1/10th of a second for screen updates, I don't seem to get that behavior in my test. It takes about 8 to 10 seconds for the first elapsed time message to be displayed, and it updates really slowly. Is select just not that precise? I even tried using a timeout of zero, but this means we refresh way too often and it looks bad. I am not sure what is wrong here... devlink/devlink.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 88 insertions(+), 1 deletion(-) base-commit: b8663da04939dd5de223ca0de23e466ce0a372f7