diff mbox

[2/2] net/ixgbe: calculate correct number of received packets for ARM NEON-version vPMD

Message ID 1482127758-4904-2-git-send-email-jianbo.liu@linaro.org
State Superseded
Headers show

Commit Message

Jianbo Liu Dec. 19, 2016, 6:09 a.m. UTC
vPMD will check 4 descriptors in one time, but the statuses are not consistent
because the memory allocated for RX descriptors is cacheable huagepage.
This patch is to calculate the number of received packets by scanning DD bit
sequentially, and stops when meeting the first packet with DD bit unset.

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>

---
 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

-- 
2.4.11

Comments

Jerin Jacob Dec. 21, 2016, 10:08 a.m. UTC | #1
On Mon, Dec 19, 2016 at 11:39:18AM +0530, Jianbo Liu wrote:

Hi Jianbo,

> vPMD will check 4 descriptors in one time, but the statuses are not consistent

> because the memory allocated for RX descriptors is cacheable huagepage.

Is it different in X86 case ?i.e Is x86 creating non cacheable hugepages?
I am just looking at what it takes to fix similar issues for all drivers wrt armv8.

Are you able to reproduce this issue any armv8 platform. If so, could
you please the platform detail and commands to reproduce this issue?

> This patch is to calculate the number of received packets by scanning DD bit

> sequentially, and stops when meeting the first packet with DD bit unset.

> 

> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>

> ---

>  drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 16 ++++++++++++----

>  1 file changed, 12 insertions(+), 4 deletions(-)

> 

> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c

> index f96cc85..0b1338d 100644

> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c

> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c

> @@ -196,7 +196,6 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,

>  	struct ixgbe_rx_entry *sw_ring;

>  	uint16_t nb_pkts_recd;

>  	int pos;

> -	uint64_t var;

>  	uint8x16_t shuf_msk = {

>  		0xFF, 0xFF,

>  		0xFF, 0xFF,  /* skip 32 bits pkt_type */

> @@ -255,6 +254,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,

>  		uint64x2_t mbp1, mbp2;

>  		uint8x16_t staterr;

>  		uint16x8_t tmp;

> +		uint32_t var = 0;

>  		uint32_t stat;

>  

>  		/* B.1 load 1 mbuf point */

> @@ -349,11 +349,19 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,

>  		vst1q_u8((uint8_t *)&rx_pkts[pos]->rx_descriptor_fields1,

>  			 pkt_mb1);

>  

> +		stat &= IXGBE_VPMD_DESC_DD_MASK;

> +

>  		/* C.4 calc avaialbe number of desc */

> -		var =  __builtin_popcount(stat & IXGBE_VPMD_DESC_DD_MASK);

> -		nb_pkts_recd += var;

> -		if (likely(var != RTE_IXGBE_DESCS_PER_LOOP))

> +		if (likely(var != IXGBE_VPMD_DESC_DD_MASK)) {

> +			while (stat & 0x01) {

> +				++var;

> +				stat = stat >> 8;

> +			}

> +			nb_pkts_recd += var;

>  			break;

> +		} else {

> +			nb_pkts_recd += RTE_IXGBE_DESCS_PER_LOOP;

> +		}

>  	}

>  

>  	/* Update our internal tail pointer */

> -- 

> 2.4.11

>
Bruce Richardson Dec. 21, 2016, 11:03 a.m. UTC | #2
On Wed, Dec 21, 2016 at 03:38:51PM +0530, Jerin Jacob wrote:
> On Mon, Dec 19, 2016 at 11:39:18AM +0530, Jianbo Liu wrote:

> 

> Hi Jianbo,

> 

> > vPMD will check 4 descriptors in one time, but the statuses are not consistent

> > because the memory allocated for RX descriptors is cacheable huagepage.

> Is it different in X86 case ?i.e Is x86 creating non cacheable hugepages?


This is not a problem on IA, because the instruction ordering rules on
IA guarantee that the reads will be done in the correct program order,
and we never get stale cache data.

/Bruce
Jianbo Liu Dec. 22, 2016, 1:05 a.m. UTC | #3
Hi Jerin,

On 21 December 2016 at 18:08, Jerin Jacob
<jerin.jacob@caviumnetworks.com> wrote:
> On Mon, Dec 19, 2016 at 11:39:18AM +0530, Jianbo Liu wrote:

>

> Hi Jianbo,

>

>> vPMD will check 4 descriptors in one time, but the statuses are not consistent

>> because the memory allocated for RX descriptors is cacheable huagepage.

> Is it different in X86 case ?i.e Is x86 creating non cacheable hugepages?

> I am just looking at what it takes to fix similar issues for all drivers wrt armv8.

>

> Are you able to reproduce this issue any armv8 platform. If so, could

> you please the platform detail and commands to reproduce this issue?

>


I have tested on Huawei D03 and Softiron with Intel X540, same issue
for both of them.
The setup is very simple: loopback 2 ports, then run testpmd.
Jianbo Liu Dec. 22, 2016, 1:18 a.m. UTC | #4
On 21 December 2016 at 19:03, Bruce Richardson
<bruce.richardson@intel.com> wrote:
> On Wed, Dec 21, 2016 at 03:38:51PM +0530, Jerin Jacob wrote:

>> On Mon, Dec 19, 2016 at 11:39:18AM +0530, Jianbo Liu wrote:

>>

>> Hi Jianbo,

>>

>> > vPMD will check 4 descriptors in one time, but the statuses are not consistent

>> > because the memory allocated for RX descriptors is cacheable huagepage.

>> Is it different in X86 case ?i.e Is x86 creating non cacheable hugepages?

>

> This is not a problem on IA, because the instruction ordering rules on

> IA guarantee that the reads will be done in the correct program order,

> and we never get stale cache data.

>


Yes, I think it's an issue for ARM arch.
It's because more than one cacheline-sized data (4/8 descriptors can
be in two cachelines) will be read at one time in bulk alloc RX or
vPMD.
There is the same issue for i40e, I'll send the same patch later.
diff mbox

Patch

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
index f96cc85..0b1338d 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
@@ -196,7 +196,6 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 	struct ixgbe_rx_entry *sw_ring;
 	uint16_t nb_pkts_recd;
 	int pos;
-	uint64_t var;
 	uint8x16_t shuf_msk = {
 		0xFF, 0xFF,
 		0xFF, 0xFF,  /* skip 32 bits pkt_type */
@@ -255,6 +254,7 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		uint64x2_t mbp1, mbp2;
 		uint8x16_t staterr;
 		uint16x8_t tmp;
+		uint32_t var = 0;
 		uint32_t stat;
 
 		/* B.1 load 1 mbuf point */
@@ -349,11 +349,19 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		vst1q_u8((uint8_t *)&rx_pkts[pos]->rx_descriptor_fields1,
 			 pkt_mb1);
 
+		stat &= IXGBE_VPMD_DESC_DD_MASK;
+
 		/* C.4 calc avaialbe number of desc */
-		var =  __builtin_popcount(stat & IXGBE_VPMD_DESC_DD_MASK);
-		nb_pkts_recd += var;
-		if (likely(var != RTE_IXGBE_DESCS_PER_LOOP))
+		if (likely(var != IXGBE_VPMD_DESC_DD_MASK)) {
+			while (stat & 0x01) {
+				++var;
+				stat = stat >> 8;
+			}
+			nb_pkts_recd += var;
 			break;
+		} else {
+			nb_pkts_recd += RTE_IXGBE_DESCS_PER_LOOP;
+		}
 	}
 
 	/* Update our internal tail pointer */