diff mbox series

[RFC,3/3] mmc: core: Allow mmc block device to re-claim the host

Message ID 1494506343-28572-4-git-send-email-ulf.hansson@linaro.org
State New
Headers show
Series mmc: core: Prepare mmc host locking for blk-mq | expand

Commit Message

Ulf Hansson May 11, 2017, 12:39 p.m. UTC
The current mmc block device implementation is tricky when it comes to
claim and release of the host, while processing I/O requests. In principle
we need to claim the host at the first request entering the queue and then
we need to release the host, as soon as the queue becomes empty. This
complexity relates to the asynchronous request mechanism that the mmc block
device driver implements.

For the legacy block interface that we currently implements, the above
issue can be addressed, as we can find out when the queue really becomes
empty.

However, to find out whether the queue is empty, isn't really an applicable
method when using the new blk-mq interface, as requests are instead pushed
to us via the struct struct blk_mq_ops and its function pointers.

Being able to support the asynchronous request method using the blk-mq
interface, means we have to allow the mmc block device driver to re-claim
the host from different tasks/contexts, as we may have > 1 request to
operate upon.

Therefore, let's extend the mmc_claim_host() API to support reference
counting for the mmc block device.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

---
 drivers/mmc/core/core.c  | 14 ++++++++++----
 drivers/mmc/core/core.h  |  7 ++++++-
 include/linux/mmc/host.h |  1 +
 3 files changed, 17 insertions(+), 5 deletions(-)

-- 
2.7.4

Comments

Adrian Hunter May 12, 2017, 8:36 a.m. UTC | #1
On 11/05/17 15:39, Ulf Hansson wrote:
> The current mmc block device implementation is tricky when it comes to

> claim and release of the host, while processing I/O requests. In principle

> we need to claim the host at the first request entering the queue and then

> we need to release the host, as soon as the queue becomes empty. This

> complexity relates to the asynchronous request mechanism that the mmc block

> device driver implements.

> 

> For the legacy block interface that we currently implements, the above

> issue can be addressed, as we can find out when the queue really becomes

> empty.

> 

> However, to find out whether the queue is empty, isn't really an applicable

> method when using the new blk-mq interface, as requests are instead pushed

> to us via the struct struct blk_mq_ops and its function pointers.


That is not entirely true.  We can pull requests by running the queue i.e.
blk_mq_run_hw_queues(q, false), returning BLK_MQ_RQ_QUEUE_BUSY and stopping
/ starting the queue as needed.

But, as I have written before, we could start with the most trivial
implementation.  ->queue_rq() puts the requests in a list and then the
thread removes them from the list.

That would be a good start because it would avoid having to deal with other
issues at the same time.

> 

> Being able to support the asynchronous request method using the blk-mq

> interface, means we have to allow the mmc block device driver to re-claim

> the host from different tasks/contexts, as we may have > 1 request to

> operate upon.

> 

> Therefore, let's extend the mmc_claim_host() API to support reference

> counting for the mmc block device.


Aren't you overlooking the possibility that there are many block devices per
host. i.e. one per eMMC internal partition.

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson May 15, 2017, 2:05 p.m. UTC | #2
On 12 May 2017 at 10:36, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 11/05/17 15:39, Ulf Hansson wrote:

>> The current mmc block device implementation is tricky when it comes to

>> claim and release of the host, while processing I/O requests. In principle

>> we need to claim the host at the first request entering the queue and then

>> we need to release the host, as soon as the queue becomes empty. This

>> complexity relates to the asynchronous request mechanism that the mmc block

>> device driver implements.

>>

>> For the legacy block interface that we currently implements, the above

>> issue can be addressed, as we can find out when the queue really becomes

>> empty.

>>

>> However, to find out whether the queue is empty, isn't really an applicable

>> method when using the new blk-mq interface, as requests are instead pushed

>> to us via the struct struct blk_mq_ops and its function pointers.

>

> That is not entirely true.  We can pull requests by running the queue i.e.

> blk_mq_run_hw_queues(q, false), returning BLK_MQ_RQ_QUEUE_BUSY and stopping

> / starting the queue as needed.


I am not sure how that would work. It doesn't sound very effective to
me, but I may be wrong.

>

> But, as I have written before, we could start with the most trivial

> implementation.  ->queue_rq() puts the requests in a list and then the

> thread removes them from the list.


That would work...

>

> That would be a good start because it would avoid having to deal with other

> issues at the same time.


...however this doesn't seem like a step in the direction we want to
take when porting to blkmq.

There will be an extra context switch for each an every request, won't there?

My point is, to be able to convert to blkmq, we must also avoid
performance regression - at leas as long as possible.

>

>>

>> Being able to support the asynchronous request method using the blk-mq

>> interface, means we have to allow the mmc block device driver to re-claim

>> the host from different tasks/contexts, as we may have > 1 request to

>> operate upon.

>>

>> Therefore, let's extend the mmc_claim_host() API to support reference

>> counting for the mmc block device.

>

> Aren't you overlooking the possibility that there are many block devices per

> host. i.e. one per eMMC internal partition.


Right now, yes you are right. I hope soon not. :-)

These internal eMMC partitions are today suffering from the similar
problems as we have for mmc ioctls. That means, requests are being I/O
scheduled separately for each internal partition. Meaning requests for
one partition could starve requests for another.

I really hope we can fix this in some way or the other. Probably
building upon Linus Walleij's series for fixing the problems for mmc
ioctls [1] is the way to go.

Then, when we have managed to fix these issues, I think my approach
for extending the mmc_claim_host() API could be a possible
intermediate step when trying to complete the blkmq port. Then we can
continue to try to remove/re-work the claim host lock altogether as an
optimization task.

Kind regards
Uffe

[1]
https://www.spinics.net/lists/linux-block/msg12677.html
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adrian Hunter May 16, 2017, 1:24 p.m. UTC | #3
On 15/05/17 17:05, Ulf Hansson wrote:
> On 12 May 2017 at 10:36, Adrian Hunter <adrian.hunter@intel.com> wrote:

>> On 11/05/17 15:39, Ulf Hansson wrote:

>>> The current mmc block device implementation is tricky when it comes to

>>> claim and release of the host, while processing I/O requests. In principle

>>> we need to claim the host at the first request entering the queue and then

>>> we need to release the host, as soon as the queue becomes empty. This

>>> complexity relates to the asynchronous request mechanism that the mmc block

>>> device driver implements.

>>>

>>> For the legacy block interface that we currently implements, the above

>>> issue can be addressed, as we can find out when the queue really becomes

>>> empty.

>>>

>>> However, to find out whether the queue is empty, isn't really an applicable

>>> method when using the new blk-mq interface, as requests are instead pushed

>>> to us via the struct struct blk_mq_ops and its function pointers.

>>

>> That is not entirely true.  We can pull requests by running the queue i.e.

>> blk_mq_run_hw_queues(q, false), returning BLK_MQ_RQ_QUEUE_BUSY and stopping

>> / starting the queue as needed.

> 

> I am not sure how that would work. It doesn't sound very effective to

> me, but I may be wrong.


The queue depth is not the arbiter of whether we can issue a request.  That
means there will certainly be times when we have to return
BLK_MQ_RQ_QUEUE_BUSY from ->queue_rq() and perhaps stop the queue as well.

We could start with ->queue_rq() feeding every request to the existing
thread and work towards having it submit requests immediately when possible.
 Currently mmc core cannot submit mmc_requests without waiting, but the
command queue implementation can for read/write requests when the host
controller and card are runtime resumed and the card is switched to the
correct internal partition (and we are not currently discarding flushing or
recovering), so it might be simpler to start with cmdq ;-)

> 

>>

>> But, as I have written before, we could start with the most trivial

>> implementation.  ->queue_rq() puts the requests in a list and then the

>> thread removes them from the list.

> 

> That would work...

> 

>>

>> That would be a good start because it would avoid having to deal with other

>> issues at the same time.

> 

> ...however this doesn't seem like a step in the direction we want to

> take when porting to blkmq.

> 

> There will be an extra context switch for each an every request, won't there?


No, for synchronous requests, it would be the same as now. ->queue_rq()
would be called in the context of the submitter and would wake the thread
just like ->request_fn() does now.

> 

> My point is, to be able to convert to blkmq, we must also avoid

> performance regression - at leas as long as possible.


It would still be better to start simple, and measure the performance, than
to guess where the bottlenecks are.

> 

>>

>>>

>>> Being able to support the asynchronous request method using the blk-mq

>>> interface, means we have to allow the mmc block device driver to re-claim

>>> the host from different tasks/contexts, as we may have > 1 request to

>>> operate upon.

>>>

>>> Therefore, let's extend the mmc_claim_host() API to support reference

>>> counting for the mmc block device.

>>

>> Aren't you overlooking the possibility that there are many block devices per

>> host. i.e. one per eMMC internal partition.

> 

> Right now, yes you are right. I hope soon not. :-)

> 

> These internal eMMC partitions are today suffering from the similar

> problems as we have for mmc ioctls. That means, requests are being I/O

> scheduled separately for each internal partition. Meaning requests for

> one partition could starve requests for another.

> 

> I really hope we can fix this in some way or the other. Probably

> building upon Linus Walleij's series for fixing the problems for mmc

> ioctls [1] is the way to go.

> 

> Then, when we have managed to fix these issues, I think my approach

> for extending the mmc_claim_host() API could be a possible

> intermediate step when trying to complete the blkmq port. Then we can

> continue to try to remove/re-work the claim host lock altogether as an

> optimization task.

> 

> Kind regards

> Uffe

> 

> [1]

> https://www.spinics.net/lists/linux-block/msg12677.html

> 


--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson May 16, 2017, 2:32 p.m. UTC | #4
On 16 May 2017 at 15:24, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 15/05/17 17:05, Ulf Hansson wrote:

>> On 12 May 2017 at 10:36, Adrian Hunter <adrian.hunter@intel.com> wrote:

>>> On 11/05/17 15:39, Ulf Hansson wrote:

>>>> The current mmc block device implementation is tricky when it comes to

>>>> claim and release of the host, while processing I/O requests. In principle

>>>> we need to claim the host at the first request entering the queue and then

>>>> we need to release the host, as soon as the queue becomes empty. This

>>>> complexity relates to the asynchronous request mechanism that the mmc block

>>>> device driver implements.

>>>>

>>>> For the legacy block interface that we currently implements, the above

>>>> issue can be addressed, as we can find out when the queue really becomes

>>>> empty.

>>>>

>>>> However, to find out whether the queue is empty, isn't really an applicable

>>>> method when using the new blk-mq interface, as requests are instead pushed

>>>> to us via the struct struct blk_mq_ops and its function pointers.

>>>

>>> That is not entirely true.  We can pull requests by running the queue i.e.

>>> blk_mq_run_hw_queues(q, false), returning BLK_MQ_RQ_QUEUE_BUSY and stopping

>>> / starting the queue as needed.

>>

>> I am not sure how that would work. It doesn't sound very effective to

>> me, but I may be wrong.

>

> The queue depth is not the arbiter of whether we can issue a request.  That

> means there will certainly be times when we have to return

> BLK_MQ_RQ_QUEUE_BUSY from ->queue_rq() and perhaps stop the queue as well.

>

> We could start with ->queue_rq() feeding every request to the existing

> thread and work towards having it submit requests immediately when possible.

>  Currently mmc core cannot submit mmc_requests without waiting, but the

> command queue implementation can for read/write requests when the host

> controller and card are runtime resumed and the card is switched to the

> correct internal partition (and we are not currently discarding flushing or

> recovering), so it might be simpler to start with cmdq ;-)


In the end I guess the only thing to do is to compare the patchsets to
see how the result would look like. :-)

My current observation is that our current implementation of the mmc
block device and corresponding mmc queue, is still rather messy, even
if you and Linus recently has worked hard to improve the situation.

Moreover it looks quite different compared to other block device
drivers and in the way of striving to make it more robust and
maintainable, that's not good.

Therefore, I am not really comfortable with replacing one mmc hack for
block device management with yet another, as that seems to be what
your approach would do - unless I am mistaken of course.

Instead I would like us to move into a more generic blk device
approach. Whatever that means. :-)

>

>>

>>>

>>> But, as I have written before, we could start with the most trivial

>>> implementation.  ->queue_rq() puts the requests in a list and then the

>>> thread removes them from the list.

>>

>> That would work...

>>

>>>

>>> That would be a good start because it would avoid having to deal with other

>>> issues at the same time.

>>

>> ...however this doesn't seem like a step in the direction we want to

>> take when porting to blkmq.

>>

>> There will be an extra context switch for each an every request, won't there?

>

> No, for synchronous requests, it would be the same as now. ->queue_rq()

> would be called in the context of the submitter and would wake the thread

> just like ->request_fn() does now.


You are right!

However, in my comparison I was thinking of how it *can* work if we
were able to submit/prepare request in the context of the caller.

>

>>

>> My point is, to be able to convert to blkmq, we must also avoid

>> performance regression - at leas as long as possible.

>

> It would still be better to start simple, and measure the performance, than

> to guess where the bottlenecks are.


Yes, starting simple is always good!

Although, even if simple, we need to stop adding more mmc specific
hacks into the mmc block device layer.

[...]

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox series

Patch

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 0701e30..3633699 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -1019,12 +1019,12 @@  unsigned int mmc_align_data_size(struct mmc_card *card, unsigned int sz)
 EXPORT_SYMBOL(mmc_align_data_size);
 
 /**
- *	mmc_claim_host - exclusively claim a host
+ *	__mmc_claim_host - exclusively claim a host
  *	@host: mmc host to claim
  *
  *	Claim a host for a set of operations.
  */
-void mmc_claim_host(struct mmc_host *host)
+void __mmc_claim_host(struct mmc_host *host, bool is_blkdev)
 {
 	DECLARE_WAITQUEUE(wait, current);
 	unsigned long flags;
@@ -1036,7 +1036,11 @@  void mmc_claim_host(struct mmc_host *host)
 	spin_lock_irqsave(&host->lock, flags);
 	while (1) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
-		if (!host->claimed || host->claimer == current)
+		if (!host->claimed)
+			break;
+		if (host->claimer_is_blkdev && is_blkdev)
+			break;
+		if (host->claimer == current)
 			break;
 		spin_unlock_irqrestore(&host->lock, flags);
 		schedule();
@@ -1045,6 +1049,7 @@  void mmc_claim_host(struct mmc_host *host)
 	set_current_state(TASK_RUNNING);
 	host->claimed = 1;
 	host->claimer = current;
+	host->claimer_is_blkdev = is_blkdev;
 	host->claim_cnt += 1;
 	if (host->claim_cnt == 1)
 		pm = true;
@@ -1054,7 +1059,7 @@  void mmc_claim_host(struct mmc_host *host)
 	if (pm)
 		pm_runtime_get_sync(mmc_dev(host));
 }
-EXPORT_SYMBOL(mmc_claim_host);
+EXPORT_SYMBOL(__mmc_claim_host);
 
 /**
  *	mmc_release_host - release a host
@@ -1076,6 +1081,7 @@  void mmc_release_host(struct mmc_host *host)
 	} else {
 		host->claimed = 0;
 		host->claimer = NULL;
+		host->claimer_is_blkdev = 0;
 		spin_unlock_irqrestore(&host->lock, flags);
 		wake_up(&host->wq);
 		pm_runtime_mark_last_busy(mmc_dev(host));
diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index b247b1f..1598a37 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -122,9 +122,14 @@  int mmc_set_blocklen(struct mmc_card *card, unsigned int blocklen);
 int mmc_set_blockcount(struct mmc_card *card, unsigned int blockcount,
 			bool is_rel_write);
 
-void mmc_claim_host(struct mmc_host *host);
+void __mmc_claim_host(struct mmc_host *host, bool is_blkdev);
 void mmc_release_host(struct mmc_host *host);
 void mmc_get_card(struct mmc_card *card);
 void mmc_put_card(struct mmc_card *card);
 
+static inline void mmc_claim_host(struct mmc_host *host)
+{
+	__mmc_claim_host(host, 0);
+}
+
 #endif
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 8a4131f..7199817 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -347,6 +347,7 @@  struct mmc_host {
 
 	wait_queue_head_t	wq;
 	struct task_struct	*claimer;	/* task that has host claimed */
+	bool			claimer_is_blkdev; /* claimer is blkdev */
 	int			claim_cnt;	/* "claim" nesting count */
 
 	struct delayed_work	detect;