Message ID | dc0006176e90cf3fb90e5b1c1917b54fe07c91cd.1701540918.git.lduncan@suse.com |
---|---|
State | New |
Headers | show |
Series | scsi: target: iscsi: Fix two protocol issues | expand |
On Thu, Dec 07, 2023 at 09:42:34AM -0800, lduncan@suse.com wrote: > From: Lee Duncan <lduncan@suse.com> > > Some iSCSI initiators send SCSI PDUs with the "immediate" bit > set, and this is allowed according to RFC 3720. Commands with > the "Immediate" bit set are called "immediate commands". From > section 3.2.2.1. "Command Numbering and Acknowledging": > > The target MUST NOT transmit a MaxCmdSN that is less than > ExpCmdSN-1. For non-immediate commands, the CmdSN field can take any > value from ExpCmdSN to MaxCmdSN inclusive. The target MUST silently > ignore any non-immediate command outside of this range or non- > immediate duplicates within the range. The CmdSN carried by > immediate commands may lie outside the ExpCmdSN to MaxCmdSN range. > For example, if the initiator has previously sent a non-immediate > command carrying the CmdSN equal to MaxCmdSN, the target window is > closed. For group task management commands issued as immediate > commands, CmdSN indicates the scope of the group action (e.g., on > ABORT TASK SET indicates which commands are aborted). > > This fixed an issue with fastlinq qedi Converged Network Adapter > initiator firmware, trying to use an LIO target for booting. These > changes made booting possible, with or without ImmediateData enabled. > > Signed-off-by: Lee Duncan <lduncan@suse.com> > Reviewed-by: David Bond <dbond@suse.com> > --- > drivers/target/iscsi/iscsi_target.c | 12 +++--------- > drivers/target/iscsi/iscsi_target_util.c | 10 ++++++++-- > 2 files changed, 11 insertions(+), 11 deletions(-) > > diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c > index 1d25e64b068a..f246e5015868 100644 > --- a/drivers/target/iscsi/iscsi_target.c > +++ b/drivers/target/iscsi/iscsi_target.c > @@ -1060,13 +1060,6 @@ int iscsit_setup_scsi_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd, > ISCSI_REASON_BOOKMARK_INVALID, buf); > } > > - if (hdr->opcode & ISCSI_OP_IMMEDIATE) { > - pr_err("Illegally set Immediate Bit in iSCSI Initiator" > - " Scsi Command PDU.\n"); > - return iscsit_add_reject_cmd(cmd, > - ISCSI_REASON_BOOKMARK_INVALID, buf); > - } > - > if (payload_length && !conn->sess->sess_ops->ImmediateData) { > pr_err("ImmediateData=No but DataSegmentLength=%u," > " protocol error.\n", payload_length); This seems right, as the flag is checked again later in the same function. > @@ -1255,14 +1248,15 @@ int iscsit_process_scsi_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd, > /* > * Check the CmdSN against ExpCmdSN/MaxCmdSN here if > * the Immediate Bit is not set, and no Immediate > - * Data is attached. > + * Data is attached. Also skip the check if this is > + * an immediate command. This comment addition seems redundant, isn't that what the "Immediate Bit is not set" already means? > * > * A PDU/CmdSN carrying Immediate Data can only > * be processed after the DataCRC has passed. > * If the DataCRC fails, the CmdSN MUST NOT > * be acknowledged. (See below) > */ > - if (!cmd->immediate_data) { > + if (!cmd->immediate_data && !cmd->immediate_cmd) { > cmdsn_ret = iscsit_sequence_cmd(conn, cmd, > (unsigned char *)hdr, hdr->cmdsn); > if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER) Are you sure this needs to be checking both conditions here? I'm struggling to understand why CmdSN checking would be bypassed for immediate data. Is this a longstanding bug where the condition should have been on immediate_cmd (and only immediate_cmd) instead? Or is this because of the handling the immediate data with DataCRC case mentioned? I do see iscsit_sequence_cmd also being called in iscsit_get_immediate_data. - Chris Leech
Apologies on my first reply having HTML. I'm learning a new MUA. On Wed, Dec 13, 2023 at 12:06 PM Chris Leech <cleech@redhat.com> wrote: > > On Thu, Dec 07, 2023 at 09:42:34AM -0800, lduncan@suse.com wrote: > > From: Lee Duncan <lduncan@suse.com> > > > > Some iSCSI initiators send SCSI PDUs with the "immediate" bit > > set, and this is allowed according to RFC 3720. Commands with > > the "Immediate" bit set are called "immediate commands". From > > section 3.2.2.1. "Command Numbering and Acknowledging": > > > > The target MUST NOT transmit a MaxCmdSN that is less than > > ExpCmdSN-1. For non-immediate commands, the CmdSN field can take any > > value from ExpCmdSN to MaxCmdSN inclusive. The target MUST silently > > ignore any non-immediate command outside of this range or non- > > immediate duplicates within the range. The CmdSN carried by > > immediate commands may lie outside the ExpCmdSN to MaxCmdSN range. > > For example, if the initiator has previously sent a non-immediate > > command carrying the CmdSN equal to MaxCmdSN, the target window is > > closed. For group task management commands issued as immediate > > commands, CmdSN indicates the scope of the group action (e.g., on > > ABORT TASK SET indicates which commands are aborted). > > > > This fixed an issue with fastlinq qedi Converged Network Adapter > > initiator firmware, trying to use an LIO target for booting. These > > changes made booting possible, with or without ImmediateData enabled. > > > > Signed-off-by: Lee Duncan <lduncan@suse.com> > > Reviewed-by: David Bond <dbond@suse.com> > > --- > > drivers/target/iscsi/iscsi_target.c | 12 +++--------- > > drivers/target/iscsi/iscsi_target_util.c | 10 ++++++++-- > > 2 files changed, 11 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c > > index 1d25e64b068a..f246e5015868 100644 > > --- a/drivers/target/iscsi/iscsi_target.c > > +++ b/drivers/target/iscsi/iscsi_target.c > > @@ -1060,13 +1060,6 @@ int iscsit_setup_scsi_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd, > > ISCSI_REASON_BOOKMARK_INVALID, buf); > > } > > > > - if (hdr->opcode & ISCSI_OP_IMMEDIATE) { > > - pr_err("Illegally set Immediate Bit in iSCSI Initiator" > > - " Scsi Command PDU.\n"); > > - return iscsit_add_reject_cmd(cmd, > > - ISCSI_REASON_BOOKMARK_INVALID, buf); > > - } > > - > > if (payload_length && !conn->sess->sess_ops->ImmediateData) { > > pr_err("ImmediateData=No but DataSegmentLength=%u," > > " protocol error.\n", payload_length); > > This seems right, as the flag is checked again later in the same > function. > > > @@ -1255,14 +1248,15 @@ int iscsit_process_scsi_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd, > > /* > > * Check the CmdSN against ExpCmdSN/MaxCmdSN here if > > * the Immediate Bit is not set, and no Immediate > > - * Data is attached. > > + * Data is attached. Also skip the check if this is > > + * an immediate command. > > This comment addition seems redundant, isn't that what the "Immediate > Bit is not set" already means? The spec is confusing with respect to this. The "Immediate Bit" means an immediate command. These commands are done "now", not queued, and they do not increment the expected sequence number. Immediate data is different, and unfortunately named IMHO. It's when a PDU supplies the data for the SCSI command in the current PDU instead of the next PDU. > > > * > > * A PDU/CmdSN carrying Immediate Data can only > > * be processed after the DataCRC has passed. > > * If the DataCRC fails, the CmdSN MUST NOT > > * be acknowledged. (See below) > > */ > > - if (!cmd->immediate_data) { > > + if (!cmd->immediate_data && !cmd->immediate_cmd) { > > cmdsn_ret = iscsit_sequence_cmd(conn, cmd, > > (unsigned char *)hdr, hdr->cmdsn); > > if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER) > > Are you sure this needs to be checking both conditions here? I'm > struggling to understand why CmdSN checking would be bypassed for > immediate data. Is this a longstanding bug where the condition should > have been on immediate_cmd (and only immediate_cmd) instead? The immediate data check was there already, and there haven't been any bugs I know of, so I assumed that part of the code was ok. > > Or is this because of the handling the immediate data with DataCRC case > mentioned? I do see iscsit_sequence_cmd also being called in > iscsit_get_immediate_data. I will check that but I suspect you are correct. > > - Chris Leech >
On Wed, Dec 13, 2023 at 05:24:54PM -0800, Lee Duncan wrote: > > > > > @@ -1255,14 +1248,15 @@ int iscsit_process_scsi_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd, > > > /* > > > * Check the CmdSN against ExpCmdSN/MaxCmdSN here if > > > * the Immediate Bit is not set, and no Immediate > > > - * Data is attached. > > > + * Data is attached. Also skip the check if this is > > > + * an immediate command. > > > > This comment addition seems redundant, isn't that what the > > "Immediate Bit is not set" already means? > > The spec is confusing with respect to this. The "Immediate Bit" means > an immediate command. These commands are done "now", not queued, and > they do not increment the expected sequence number. > > Immediate data is different, and unfortunately named IMHO. It's when a > PDU supplies the data for the SCSI command in the current PDU instead > of the next PDU. I understand the protocol, just trying to make sense of the implementation and what the existing comment meant. And the existing comment already has two conditions in it, even if the code doesn't. I think I understand now why this is delaying CmdSN validation when there is immediate data, until after the DataCRC can be checked. This comment in iscsit_get_immediate_data, where the delayed processing occurs, also seems to read that "Immediate Bit" is in reference to an immediate command. * A PDU/CmdSN carrying Immediate Data passed * DataCRC, check against ExpCmdSN/MaxCmdSN if * Immediate Bit is not set. but neither of these locations (before these changes) that mention the "Immediate Bit" in the comments actually check for cmd->immediate_cmd. > > > * > > > * A PDU/CmdSN carrying Immediate Data can only > > > * be processed after the DataCRC has passed. > > > * If the DataCRC fails, the CmdSN MUST NOT > > > * be acknowledged. (See below) > > > */ > > > - if (!cmd->immediate_data) { > > > + if (!cmd->immediate_data && !cmd->immediate_cmd) { > > > cmdsn_ret = iscsit_sequence_cmd(conn, cmd, > > > (unsigned char *)hdr, hdr->cmdsn); > > > if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER) > > > > Are you sure this needs to be checking both conditions here? I'm > > struggling to understand why CmdSN checking would be bypassed for > > immediate data. Is this a longstanding bug where the condition should > > have been on immediate_cmd (and only immediate_cmd) instead? > > The immediate data check was there already, and there haven't been any > bugs I know of, so I assumed that part of the code was ok. > > > > > Or is this because of the handling the immediate data with DataCRC case > > mentioned? I do see iscsit_sequence_cmd also being called in > > iscsit_get_immediate_data. > > I will check that but I suspect you are correct. Is it correct to skip all of iscsit_sequence_cmd for an immediate command here? You are already skipping iscsit_check_received_cmdsn inside iscsit_sequence_cmd in this patch. If cmd->immediate_cmd is set, where does iscsit_execute_cmd now get called from? - Chris Leech
Apologies for the delay in the reply, but over the time to address Chris' two questions about this patch set. See below. On Thu, Dec 14, 2023 at 12:30 PM Chris Leech <cleech@redhat.com> wrote: > > On Wed, Dec 13, 2023 at 05:24:54PM -0800, Lee Duncan wrote: > > > > > > > @@ -1255,14 +1248,15 @@ int iscsit_process_scsi_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd, > > > > /* > > > > * Check the CmdSN against ExpCmdSN/MaxCmdSN here if > > > > * the Immediate Bit is not set, and no Immediate > > > > - * Data is attached. > > > > + * Data is attached. Also skip the check if this is > > > > + * an immediate command. > > > > > > This comment addition seems redundant, isn't that what the > > > "Immediate Bit is not set" already means? > > > > The spec is confusing with respect to this. The "Immediate Bit" means > > an immediate command. These commands are done "now", not queued, and > > they do not increment the expected sequence number. > > > > Immediate data is different, and unfortunately named IMHO. It's when a > > PDU supplies the data for the SCSI command in the current PDU instead > > of the next PDU. > > I understand the protocol, just trying to make sense of the > implementation and what the existing comment meant. And the existing > comment already has two conditions in it, even if the code doesn't. > > I think I understand now why this is delaying CmdSN validation when > there is immediate data, until after the DataCRC can be checked. > > This comment in iscsit_get_immediate_data, where the delayed processing > occurs, also seems to read that "Immediate Bit" is in reference to an > immediate command. > > * A PDU/CmdSN carrying Immediate Data passed > * DataCRC, check against ExpCmdSN/MaxCmdSN if > * Immediate Bit is not set. > > but neither of these locations (before these changes) that mention the > "Immediate Bit" in the comments actually check for cmd->immediate_cmd. > I talked to Chris a bit about this offline, for clarification. I believe I understand his concern, and rather than try to assert the patch is ok by inspection, I decided to just test it. Turns out that normal PDU traffic for lots of writes generally includes "immediate data", and so it was easy to test this. Testing showed that Immediate Data still works correctly, in SCSI Write PDUs. Test was: * connect to an iSCSI target * Write a bunch of data * read back the data * disconnect from target and compare data In addition, I captured and analyzed the SCSI/iSCSI tcpdump trace, and immediate data was present, as expected. One co-worker ran a similar test (just the SCSI/iSCSI trace part), and found the same results. > > > > * > > > > * A PDU/CmdSN carrying Immediate Data can only > > > > * be processed after the DataCRC has passed. > > > > * If the DataCRC fails, the CmdSN MUST NOT > > > > * be acknowledged. (See below) > > > > */ > > > > - if (!cmd->immediate_data) { > > > > + if (!cmd->immediate_data && !cmd->immediate_cmd) { > > > > cmdsn_ret = iscsit_sequence_cmd(conn, cmd, > > > > (unsigned char *)hdr, hdr->cmdsn); > > > > if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER) > > > > > > Are you sure this needs to be checking both conditions here? I'm > > > struggling to understand why CmdSN checking would be bypassed for > > > immediate data. Is this a longstanding bug where the condition should > > > have been on immediate_cmd (and only immediate_cmd) instead? > > > > The immediate data check was there already, and there haven't been any > > bugs I know of, so I assumed that part of the code was ok. > > > > > > > > Or is this because of the handling the immediate data with DataCRC case > > > mentioned? I do see iscsit_sequence_cmd also being called in > > > iscsit_get_immediate_data. > > > > I will check that but I suspect you are correct. > > Is it correct to skip all of iscsit_sequence_cmd for an immediate > command here? You are already skipping iscsit_check_received_cmdsn > inside iscsit_sequence_cmd in this patch. If cmd->immediate_cmd is set, > where does iscsit_execute_cmd now get called from? I looked at the code and the SPEC in more detail, and I believe the answer is "yes", it is correct. That function checks the current PDU's sequence number with the following tests (and side effects), but not in this order: * check that seq# is not larger than maximum * check that seq# is not larger than expected * check that seq# is not smaller than expected * else the seq# is correct, so *SIDE* *EFFECT* increment the expected seq# for next PDU It turns out that the SPEC allow the sequence number to be out of range for immediate commands! So none of the checks in iscsit_sequence_check_received_cmndsn() are valid for immediate commands, as far as I can see. > > - Chris Leech >
On 12/7/23 11:42 AM, lduncan@suse.com wrote: > From: Lee Duncan <lduncan@suse.com> > > Some iSCSI initiators send SCSI PDUs with the "immediate" bit > set, and this is allowed according to RFC 3720. Commands with > the "Immediate" bit set are called "immediate commands". From > section 3.2.2.1. "Command Numbering and Acknowledging": > > The target MUST NOT transmit a MaxCmdSN that is less than > ExpCmdSN-1. For non-immediate commands, the CmdSN field can take any > value from ExpCmdSN to MaxCmdSN inclusive. The target MUST silently > ignore any non-immediate command outside of this range or non- > immediate duplicates within the range. The CmdSN carried by > immediate commands may lie outside the ExpCmdSN to MaxCmdSN range. > For example, if the initiator has previously sent a non-immediate > command carrying the CmdSN equal to MaxCmdSN, the target window is > closed. For group task management commands issued as immediate > commands, CmdSN indicates the scope of the group action (e.g., on > ABORT TASK SET indicates which commands are aborted). > > This fixed an issue with fastlinq qedi Converged Network Adapter > initiator firmware, trying to use an LIO target for booting. These > changes made booting possible, with or without ImmediateData enabled. > This is taking me a really long time to review because I've never looked at some of these code paths. Have you tested the error cases? What happens for a scsi command that's marked with the immediate bit and: 1. We get an abort and 1.A The scsi command has completed? 1.B The scsi command is being completed? For example, if the command we want to abort is not in the window, does iscsit_find_cmd_from_itt just not find the command and do we just return ISCSI_TMF_RSP_NO_TASK so the initiator will just escalate to lun reset. 2. For lun reset and abort, if the scsi command we want to abort/reset is not in the window (let's say it's cmdsn is higher than max_cmd_sn), does the iscsi layer complete the scsi command then complete the TMF or if the TMF has a lower cmdsn than the scsi command does the iscsi layer complete the scsi command then the TMF? 3. What happens for 1 and 2 and ERL 2 is used so we have ISCSI_TM_FUNC_TASK_REASSIGN and maybe are hitting the out of order code as well? Does it work ok?
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 1d25e64b068a..f246e5015868 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -1060,13 +1060,6 @@ int iscsit_setup_scsi_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd, ISCSI_REASON_BOOKMARK_INVALID, buf); } - if (hdr->opcode & ISCSI_OP_IMMEDIATE) { - pr_err("Illegally set Immediate Bit in iSCSI Initiator" - " Scsi Command PDU.\n"); - return iscsit_add_reject_cmd(cmd, - ISCSI_REASON_BOOKMARK_INVALID, buf); - } - if (payload_length && !conn->sess->sess_ops->ImmediateData) { pr_err("ImmediateData=No but DataSegmentLength=%u," " protocol error.\n", payload_length); @@ -1255,14 +1248,15 @@ int iscsit_process_scsi_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd, /* * Check the CmdSN against ExpCmdSN/MaxCmdSN here if * the Immediate Bit is not set, and no Immediate - * Data is attached. + * Data is attached. Also skip the check if this is + * an immediate command. * * A PDU/CmdSN carrying Immediate Data can only * be processed after the DataCRC has passed. * If the DataCRC fails, the CmdSN MUST NOT * be acknowledged. (See below) */ - if (!cmd->immediate_data) { + if (!cmd->immediate_data && !cmd->immediate_cmd) { cmdsn_ret = iscsit_sequence_cmd(conn, cmd, (unsigned char *)hdr, hdr->cmdsn); if (cmdsn_ret == CMDSN_ERROR_CANNOT_RECOVER) diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c index 91a75a4a7cc1..546816b3de86 100644 --- a/drivers/target/iscsi/iscsi_target_util.c +++ b/drivers/target/iscsi/iscsi_target_util.c @@ -285,13 +285,19 @@ static inline int iscsit_check_received_cmdsn(struct iscsit_session *sess, u32 c int iscsit_sequence_cmd(struct iscsit_conn *conn, struct iscsit_cmd *cmd, unsigned char *buf, __be32 cmdsn) { - int ret, cmdsn_ret; + int ret, cmdsn_ret = CMDSN_NORMAL_OPERATION; bool reject = false; u8 reason = ISCSI_REASON_BOOKMARK_NO_RESOURCES; mutex_lock(&conn->sess->cmdsn_mutex); - cmdsn_ret = iscsit_check_received_cmdsn(conn->sess, be32_to_cpu(cmdsn)); + /* + * Check the sequence number iff we are not in an immediate command. + * See rfc3730 Section 3.2.2.1. Immediate commands can be outside + * the normal range. + */ + if (!cmd->immediate_cmd) + cmdsn_ret = iscsit_check_received_cmdsn(conn->sess, be32_to_cpu(cmdsn)); switch (cmdsn_ret) { case CMDSN_NORMAL_OPERATION: ret = iscsit_execute_cmd(cmd, 0);