diff mbox series

scsi: sd: Avoid sending an INQUIRY if the page is not supported

Message ID 20230505204950.21645-1-brian@purestorage.com
State New
Headers show
Series scsi: sd: Avoid sending an INQUIRY if the page is not supported | expand

Commit Message

Brian Bunker May 5, 2023, 8:49 p.m. UTC
When SCSI devices are discovered the function sd_read_cpr gets called.
This call results in an INQUIRY to page 0xb9. This VPD page is called
regardless of whether the target has advertised this page as supported.

Instead of just sending this INQUIRY page, first check to see if that
page is in the supported pages. This will avoid sending requests to
targets which do not support the page. The error is unexpected on the
target and leads to questions. I am not sure what percentage of SCSI
devices support this page, but this will eliminate at least one
request to the target in the discovery phase for all that do not. The
function added could also have potential users besides this specific
one.

Signed-off-by: Brian Bunker <brian@purestorage.com>
Reviewed-by: Seamus Connor <sconnor@purestorage.com>
Reviewed-by: Krishna Kant <krishna.kant@purestorage.com>
---
 drivers/scsi/scsi.c        | 40 ++++++++++++++++++++++++++++++++++++++
 drivers/scsi/sd.c          |  4 +++-
 include/scsi/scsi_device.h |  1 +
 3 files changed, 44 insertions(+), 1 deletion(-)

Comments

Brian Bunker May 8, 2023, 4:04 p.m. UTC | #1
> On May 8, 2023, at 5:26 AM, Martin K. Petersen <martin.petersen@oracle.com> wrote:
> 
> 
> Brian,
> 
>> When SCSI devices are discovered the function sd_read_cpr gets called.
>> This call results in an INQUIRY to page 0xb9. This VPD page is called
>> regardless of whether the target has advertised this page as
>> supported.
> 
> We used to check the page list first. However, we found several devices
> that we did not discover correctly because the pages, while present,
> were not advertised in page 0 and thus ignored. So not checking the
> supported pages first is a feature.

Wouldn’t it be better if those targets fixed that problem? Do they report
nothing when VPD page 0 is requested or an incomplete list that doesn’t
include this 0xb9?
> 
> What exactly is the problem you are experiencing wrt. your target
> getting an inquiry for the CPR page?

What happens for us is that our support and customers are used to looking
for errors when they see a problem as you would expect. When we are logging
VPD page 0xb9 not supported for every device every time a Linux host
reboots it leads to that being incorrectly attributed to whatever actual problem
is going on. There is nothing more we can do since we do return the list
of page codes supported and this 0xb9 is not in the list.

Thanks,
Brian
> 
> -- 
> Martin K. Petersen Oracle Linux Engineering
Brian Bunker May 8, 2023, 4:34 p.m. UTC | #2
> On May 8, 2023, at 3:09 AM, Benjamin Block <bblock@linux.ibm.com> wrote:
> 
> On Fri, May 05, 2023 at 01:49:50PM -0700, Brian Bunker wrote:
>> When SCSI devices are discovered the function sd_read_cpr gets called.
>> This call results in an INQUIRY to page 0xb9. This VPD page is called
>> regardless of whether the target has advertised this page as supported.
>> 
>> Instead of just sending this INQUIRY page, first check to see if that
>> page is in the supported pages. This will avoid sending requests to
>> targets which do not support the page. The error is unexpected on the
>> target and leads to questions. I am not sure what percentage of SCSI
>> devices support this page, but this will eliminate at least one
>> request to the target in the discovery phase for all that do not. The
>> function added could also have potential users besides this specific
>> one.
>> 
>> Signed-off-by: Brian Bunker <brian@purestorage.com>
>> Reviewed-by: Seamus Connor <sconnor@purestorage.com>
>> Reviewed-by: Krishna Kant <krishna.kant@purestorage.com>
>> ---
>> drivers/scsi/scsi.c        | 40 ++++++++++++++++++++++++++++++++++++++
>> drivers/scsi/sd.c          |  4 +++-
>> include/scsi/scsi_device.h |  1 +
>> 3 files changed, 44 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
>> index 09ef0b31dfc0..9265b3d6a18f 100644
>> --- a/drivers/scsi/scsi.c
>> +++ b/drivers/scsi/scsi.c
>> @@ -356,6 +356,46 @@ static int scsi_get_vpd_size(struct scsi_device *sdev, u8 page)
>> return result;
>> }
>> 
>> +/**
>> + * scsi_vpd_page_supported - Check if a VPD page is supported on a SCSI device
>> + * @sdev: The device to ask
>> + * @page: Check existence of this Vital Product Data page
>> + *
>> + * Functions which explicitly request a given VPD page
>> + * should first check whether that page is among the
>> + * supported VPD pages. This will avoid targets returning
>> + * unnecessary errors which can cause confusion. -EINVAL is
>> + * returned if the page is not supported and 0 if it is.
>> + */
>> +int scsi_vpd_page_supported(struct scsi_device *sdev, u8 page)
>> +{
>> + const struct scsi_vpd *vpd;
>> + uint16_t page_len;
> 
> Probably rather `u16` as per kernel-style.

Sure. This can easily be done.
> 
>> + int ret = -EINVAL;
> 
> Been wondering, whether it would make sense to have two different error
> levels here. One for the case where the page is not found in the loop
> that searches within page 0, and one for when page 0 is not present when
> we try to dereference the RCU protected pointer.
> 
> That way we could have a safe fallback. If the page is there, we use its
> data, if it is not, we blindly send the INQUIRY like we do today.
> 
> Not sure whether this is a bit too paranoid.. VPD page 0 is mandatory
> after all.

That could be done, but the problem would still exist for the PURE target.
We don’t support the page 0xb9, and we don’t advertise we do in the response
to VPD 0. This approach would still lead to the INQUIRY being sent to devices
who don’t support it, don’t expect it, and report an unexpected error. What I am
trying to avoid is the INQUIRY being sent to devices who don’t invite it.

Thanks,
Brian
> 
>> + int pos = 0;
>> +
>> + rcu_read_lock();
>> + vpd = rcu_dereference(sdev->vpd_pg0);
>> + if (!vpd)
>> + goto out;
>> +
>> + page_len = get_unaligned_be16(&vpd->data[2]);
>> +
>> + /*
>> + * The first supported page starts at byte 4 in the buffer.
>> + * Read from that byte to the last dictated by page_len above.
>> + */
>> + for (pos = 4; pos < page_len + 4; ++pos) {
>> + if (vpd->data[pos] == page)
>> + ret = 0;
>> + }
>> +
>> +out:
>> + rcu_read_unlock();
>> + return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(scsi_vpd_page_supported);
>> +
>> /**
>>  * scsi_get_vpd_page - Get Vital Product Data from a SCSI device
>>  * @sdev: The device to ask
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 1624d528aa1f..0304b7d60747 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3127,7 +3127,9 @@ static void sd_read_cpr(struct scsi_disk *sdkp)
>> */
>> buf_len = 64 + 256*32;
>> buffer = kmalloc(buf_len, GFP_KERNEL);
>> - if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
>> + if (!buffer ||
>> +    scsi_vpd_page_supported(sdkp->device, 0xb9) ||
> 
> Wouldn't it make sense to do this before the allocation? If it really
> turns out to be unsupported, that seems like a waste.
> 
>> +    scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
>> goto out;
>> 
>> /* We must have at least a 64B header and one 32B range descriptor */
>> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
>> index f10a008e5bfa..359cd8b94312 100644
>> --- a/include/scsi/scsi_device.h
>> +++ b/include/scsi/scsi_device.h
>> @@ -431,6 +431,7 @@ extern int scsi_mode_select(struct scsi_device *sdev, int pf, int sp,
>>    struct scsi_sense_hdr *);
>> extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout,
>> int retries, struct scsi_sense_hdr *sshdr);
>> +extern int scsi_vpd_page_supported(struct scsi_device *sdev, u8 page);
>> extern int scsi_get_vpd_page(struct scsi_device *, u8 page, unsigned char *buf,
>>     int buf_len);
>> extern int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer,
> 
> -- 
> Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
> IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
> Vors. Aufs.-R.: Gregor Pillen         /         Geschäftsführung: David Faller
> Sitz der Ges.: Böblingen     /    Registergericht: AmtsG Stuttgart, HRB 243294
Martin K. Petersen May 10, 2023, 2:29 p.m. UTC | #3
Brian,

> Do they report nothing when VPD page 0 is requested or an incomplete
> list that doesn’t include this 0xb9?

Unfortunately both scenarios are relatively common.

>> What exactly is the problem you are experiencing wrt. your target
>> getting an inquiry for the CPR page?
>
> What happens for us is that our support and customers are used to
> looking for errors when they see a problem as you would expect. When
> we are logging VPD page 0xb9 not supported for every device every time
> a Linux host reboots

I suggest you stop logging errors in that case.
Benjamin Block May 11, 2023, 4:44 p.m. UTC | #4
On Mon, May 08, 2023 at 09:34:15AM -0700, Brian Bunker wrote:
> > On May 8, 2023, at 3:09 AM, Benjamin Block <bblock@linux.ibm.com> wrote:
> > On Fri, May 05, 2023 at 01:49:50PM -0700, Brian Bunker wrote:
> > 
> >> + int ret = -EINVAL;
> > 
> > Been wondering, whether it would make sense to have two different error
> > levels here. One for the case where the page is not found in the loop
> > that searches within page 0, and one for when page 0 is not present when
> > we try to dereference the RCU protected pointer.
> > 
> > That way we could have a safe fallback. If the page is there, we use its
> > data, if it is not, we blindly send the INQUIRY like we do today.
> > 
> > Not sure whether this is a bit too paranoid.. VPD page 0 is mandatory
> > after all.
> 
> That could be done, but the problem would still exist for the PURE target.
> We don’t support the page 0xb9, and we don’t advertise we do in the response
> to VPD 0. This approach would still lead to the INQUIRY being sent to devices

I wasn't meaning to send the INQUIRY regardless to what the page says,
if it is present. I was just thinking to having fall-back for when the
page 0 is not there at all (initially, when you call
`rcu_dereference()`). That would support your storage, as you have page
0, and it would be present for the check I assume.

But anyway, it seems this is a no-go regardless. I didn't expect targets
sending a valid page 0, but still supporting pages that are not listed in it.

> who don’t support it, don’t expect it, and report an unexpected error. What I am
> trying to avoid is the INQUIRY being sent to devices who don’t invite it.
Douglas Gilbert May 12, 2023, 2:58 a.m. UTC | #5
On 2023-05-11 12:44, Benjamin Block wrote:
> On Mon, May 08, 2023 at 09:34:15AM -0700, Brian Bunker wrote:
>>> On May 8, 2023, at 3:09 AM, Benjamin Block <bblock@linux.ibm.com> wrote:
>>> On Fri, May 05, 2023 at 01:49:50PM -0700, Brian Bunker wrote:
>>>
>>>> + int ret = -EINVAL;
>>>
>>> Been wondering, whether it would make sense to have two different error
>>> levels here. One for the case where the page is not found in the loop
>>> that searches within page 0, and one for when page 0 is not present when
>>> we try to dereference the RCU protected pointer.
>>>
>>> That way we could have a safe fallback. If the page is there, we use its
>>> data, if it is not, we blindly send the INQUIRY like we do today.
>>>
>>> Not sure whether this is a bit too paranoid.. VPD page 0 is mandatory
>>> after all.
>>
>> That could be done, but the problem would still exist for the PURE target.
>> We don’t support the page 0xb9, and we don’t advertise we do in the response
>> to VPD 0. This approach would still lead to the INQUIRY being sent to devices
> 
> I wasn't meaning to send the INQUIRY regardless to what the page says,
> if it is present. I was just thinking to having fall-back for when the
> page 0 is not there at all (initially, when you call
> `rcu_dereference()`). That would support your storage, as you have page
> 0, and it would be present for the check I assume.
> 
> But anyway, it seems this is a no-go regardless. I didn't expect targets
> sending a valid page 0, but still supporting pages that are not listed in it.

SCSI standards did contain words that implied vendors would report all
VPD/log/mode pages and operation codes using the mechanisms added to
those standards over the years. I guess users pressured vendors to do
this. Then recently, in T10 approved proposal 21-063r2, the weasel
words "may or may not" were added to all those mechanisms. IMO that
was a pretty disgraceful move by the storage vendors that control T10.

Doug Gilbert


P.S. I have a new option appearing in some of my utilities: --examine
The idea is that it uses an iota function to check for unreported
VPD/log/mode pages as the "report supported" device server responses
"may or may not" be trustworthy :-)

>> who don’t support it, don’t expect it, and report an unexpected error. What I am
>> trying to avoid is the INQUIRY being sent to devices who don’t invite it.
>
diff mbox series

Patch

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 09ef0b31dfc0..9265b3d6a18f 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -356,6 +356,46 @@  static int scsi_get_vpd_size(struct scsi_device *sdev, u8 page)
 	return result;
 }
 
+/**
+ * scsi_vpd_page_supported - Check if a VPD page is supported on a SCSI device
+ * @sdev: The device to ask
+ * @page: Check existence of this Vital Product Data page
+ *
+ * Functions which explicitly request a given VPD page
+ * should first check whether that page is among the
+ * supported VPD pages. This will avoid targets returning
+ * unnecessary errors which can cause confusion. -EINVAL is
+ * returned if the page is not supported and 0 if it is.
+ */
+int scsi_vpd_page_supported(struct scsi_device *sdev, u8 page)
+{
+	const struct scsi_vpd *vpd;
+	uint16_t page_len;
+	int ret = -EINVAL;
+	int pos = 0;
+
+	rcu_read_lock();
+	vpd = rcu_dereference(sdev->vpd_pg0);
+	if (!vpd)
+		goto out;
+
+	page_len = get_unaligned_be16(&vpd->data[2]);
+
+	/*
+	 * The first supported page starts at byte 4 in the buffer.
+	 * Read from that byte to the last dictated by page_len above.
+	 */
+	for (pos = 4; pos < page_len + 4; ++pos) {
+		if (vpd->data[pos] == page)
+			ret = 0;
+	}
+
+out:
+	rcu_read_unlock();
+	return ret;
+}
+EXPORT_SYMBOL_GPL(scsi_vpd_page_supported);
+
 /**
  * scsi_get_vpd_page - Get Vital Product Data from a SCSI device
  * @sdev: The device to ask
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 1624d528aa1f..0304b7d60747 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3127,7 +3127,9 @@  static void sd_read_cpr(struct scsi_disk *sdkp)
 	 */
 	buf_len = 64 + 256*32;
 	buffer = kmalloc(buf_len, GFP_KERNEL);
-	if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
+	if (!buffer ||
+	    scsi_vpd_page_supported(sdkp->device, 0xb9) ||
+	    scsi_get_vpd_page(sdkp->device, 0xb9, buffer, buf_len))
 		goto out;
 
 	/* We must have at least a 64B header and one 32B range descriptor */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index f10a008e5bfa..359cd8b94312 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -431,6 +431,7 @@  extern int scsi_mode_select(struct scsi_device *sdev, int pf, int sp,
 			    struct scsi_sense_hdr *);
 extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout,
 				int retries, struct scsi_sense_hdr *sshdr);
+extern int scsi_vpd_page_supported(struct scsi_device *sdev, u8 page);
 extern int scsi_get_vpd_page(struct scsi_device *, u8 page, unsigned char *buf,
 			     int buf_len);
 extern int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer,