diff mbox

[3/3] net: hisilicon: new hip04 ethernet driver

Message ID 1396672506-9296-4-git-send-email-zhangfei.gao@linaro.org
State Changes Requested
Headers show

Commit Message

Zhangfei Gao April 5, 2014, 4:35 a.m. UTC
Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  777 ++++++++++++++++++++++++++++
 2 files changed, 778 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

Comments

David Miller April 7, 2014, 6:53 p.m. UTC | #1
From: Zhangfei Gao <zhangfei.gao@linaro.org>
Date: Sat,  5 Apr 2014 12:35:06 +0800

> +struct tx_desc {
> +	u32 send_addr;
> +	u16 reserved_16;
> +	u16 send_size;
> +	u32 reserved_32;
> +	u32 cfg;
> +	u32 wb_addr;
> +} ____cacheline_aligned;

I do not think that ____cacheline_aligned is appropriate at all here.

First of all, this is a hardware descriptor, so it has a fixed layout
and therefore size.

Secondly, unless you declare this object statically in the data section
of the object file, the alignment doesn't matter.  These descriptors
are always dynamically allocated, rather than instantiated in the
kernel/driver image.

> +	val = (duplex) ? BIT(0) : 0;

Parenthesis around duplex is not necessary, please remove.

> +static void hip04_reset_ppe(struct hip04_priv *priv)
> +{
> +	u32 val, tmp;
> +
> +	do {
> +		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
> +		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
> +	} while (val & 0xfff);
> +}

This polling loop can loop forever, if the condition never triggers it will
loop forever.  You must add some kind of limit or timeout, and subsequent
error handing up the call chain to handle this.

> +	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
> +	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
> +	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
 ...
> +	/* set bus ctrl */
> +	val = BIT(14);			/* buffer locally release */
> +	val |= BIT(0);			/* big endian */
> +	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);

Instead of having to set only one bit at a time in every register and
adding comments here, just define these register values using macros
properly in a header file or similar.

Then you can go val |= PPE_CFG_BUS_CTRL_VAL_THIS | PPE_CFG_BUS_CTRL_VAL_THAT
on one line.

Document the registers where you define the macros, that way people can learn
what other bits are in these register and what they mean, even if you don't
currently use them in the driver itself.

> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
 ...
> +static void hip04_xmit_timer(unsigned long data)
> +{
> +	struct net_device *ndev = (void *) data;
> +
> +	hip04_tx_reclaim(ndev, false);
> +}
 ...
> +	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);

And this is where I stop reading your driver, I've stated already that this
kind of reclaim scheme is unacceptable.

The kernel timers lack the granularity necessary to service TX reclaim
with a reasonable amount of latency.

You must use some kind of hardware notification of TX slots becomming
available, I find it totally impossible that a modern ethernet controller
was created without a TX done interrupt.

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller April 7, 2014, 6:56 p.m. UTC | #2
From: Zhangfei Gao <zhangfei.gao@linaro.org>
Date: Sat,  5 Apr 2014 12:35:06 +0800

> +#define DESC_DEF_CFG			0x14

You absolutely cannot do this.

You must document what the bits in the TX descriptor config field
mean, all of them.

I bet there is a bit in there somewhere which tells the chip to signal
an interrupt when the packet has been sent.

But since you haven't documented the descriptor fields properly with
a full set of macro defines, we can't know what bit that is.

I really am very disappointed in the quality of this driver, and you
can expect that there will be a lot of push back and requests for
changes before this driver will be even close to being ready for
inclusion.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhangfei Gao April 8, 2014, 8:07 a.m. UTC | #3
Dear David

On 04/08/2014 02:53 AM, David Miller wrote:
> From: Zhangfei Gao <zhangfei.gao@linaro.org>
> Date: Sat,  5 Apr 2014 12:35:06 +0800
>
>> +struct tx_desc {
>> +	u32 send_addr;
>> +	u16 reserved_16;
>> +	u16 send_size;
>> +	u32 reserved_32;
>> +	u32 cfg;
>> +	u32 wb_addr;
>> +} ____cacheline_aligned;
>
> I do not think that ____cacheline_aligned is appropriate at all here.
>
> First of all, this is a hardware descriptor, so it has a fixed layout
> and therefore size.
>
> Secondly, unless you declare this object statically in the data section
> of the object file, the alignment doesn't matter.  These descriptors
> are always dynamically allocated, rather than instantiated in the
> kernel/driver image.

The ____cacheline_aligned used here is only for the requirement of 
alignment, and use dma_alloc_coherent, while at first dma_pool is used 
for the requirement of alignment.
Otherwise desc[1] is not aligned and can not be used directly, the 
structure is smaller.

>
>> +	val = (duplex) ? BIT(0) : 0;
>
> Parenthesis around duplex is not necessary, please remove.
OK
>
>> +static void hip04_reset_ppe(struct hip04_priv *priv)
>> +{
>> +	u32 val, tmp;
>> +
>> +	do {
>> +		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
>> +		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
>> +	} while (val & 0xfff);
>> +}
>
> This polling loop can loop forever, if the condition never triggers it will
> loop forever.  You must add some kind of limit or timeout, and subsequent
> error handing up the call chain to handle this.
OK
>
>> +	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
>> +	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
>> +	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
>   ...
>> +	/* set bus ctrl */
>> +	val = BIT(14);			/* buffer locally release */
>> +	val |= BIT(0);			/* big endian */
>> +	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
>
> Instead of having to set only one bit at a time in every register and
> adding comments here, just define these register values using macros
> properly in a header file or similar.
>
> Then you can go val |= PPE_CFG_BUS_CTRL_VAL_THIS | PPE_CFG_BUS_CTRL_VAL_THAT
> on one line.
>
> Document the registers where you define the macros, that way people can learn
> what other bits are in these register and what they mean, even if you don't
> currently use them in the driver itself.
OK, got it.
However, some bits is not planed to open since used internally and may 
be removed later.

>
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>   ...
>> +static void hip04_xmit_timer(unsigned long data)
>> +{
>> +	struct net_device *ndev = (void *) data;
>> +
>> +	hip04_tx_reclaim(ndev, false);
>> +}
>   ...
>> +	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>
> And this is where I stop reading your driver, I've stated already that this
> kind of reclaim scheme is unacceptable.
>
> The kernel timers lack the granularity necessary to service TX reclaim
> with a reasonable amount of latency.
>
> You must use some kind of hardware notification of TX slots becomming
> available, I find it totally impossible that a modern ethernet controller
> was created without a TX done interrupt.
>

I am sorry, but unfortunately this series really does NOT have TX done 
interrupt after checked with hardware guy many times.
And next series will add TX done interrupt according to the feedback.

There are two reasons of removing the TX done interrupt when the chip is 
designed.
1. The specific product does not care the latency, only care the throughput.
2. When doing many experiment, the tx done interrupt will impact the 
throughput, as a result reclaim is moved to xmit as one of 
optimizations, then finally tx done interrupt is removed at all.

Is it acceptable of removing timer as well as latency handling, or any 
other work around of this kind of hardware?

Thanks

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight April 8, 2014, 8:30 a.m. UTC | #4
From: zhangfei [mailto:zhangfei.gao@linaro.org]
> On 04/08/2014 02:53 AM, David Miller wrote:
> > From: Zhangfei Gao <zhangfei.gao@linaro.org>
> > Date: Sat,  5 Apr 2014 12:35:06 +0800
> >
> >> +struct tx_desc {
> >> +	u32 send_addr;
> >> +	u16 reserved_16;
> >> +	u16 send_size;

The above doesn't look right for endianness independence.
I'd guess the hardware spec shows a 32bit word with the 'send size'
in one half - that is what you need to define.

Since this is a tx descriptor (and written by the host) you
can't have 'reserved' field - the host has to write it.
probably these are 'must be zero' fields.

> >> +	u32 reserved_32;
> >> +	u32 cfg;
> >> +	u32 wb_addr;
> >> +} ____cacheline_aligned;
> >
> > I do not think that ____cacheline_aligned is appropriate at all here.
> >
> > First of all, this is a hardware descriptor, so it has a fixed layout
> > and therefore size.

The structure also isn't even a multiple of a power of two.
So there will be implicit padding at the end.

Since there isn't a 'pointer to next' I presume the hardware accesses
the descriptors from adjacent physical addresses.
So you need to explicitly pad to that size.
If the cache line size were 128 byte the above wouldn't work at all.

> >
> > Secondly, unless you declare this object statically in the data section
> > of the object file, the alignment doesn't matter.  These descriptors
> > are always dynamically allocated, rather than instantiated in the
> > kernel/driver image.
> 
> The ____cacheline_aligned used here is only for the requirement of
> alignment, and use dma_alloc_coherent, while at first dma_pool is used
> for the requirement of alignment.
> Otherwise desc[1] is not aligned and can not be used directly, the
> structure is smaller.

It sounds like you should be explicitly padding the structure
to 32 bytes - whether or not that is the cache line size.

...
> I am sorry, but unfortunately this series really does NOT have TX done
> interrupt after checked with hardware guy many times.
> And next series will add TX done interrupt according to the feedback.
> 
> There are two reasons of removing the TX done interrupt when the chip is
> designed.
> 1. The specific product does not care the latency, only care the throughput.
> 2. When doing many experiment, the tx done interrupt will impact the
> throughput, as a result reclaim is moved to xmit as one of
> optimizations, then finally tx done interrupt is removed at all.
> 
> Is it acceptable of removing timer as well as latency handling, or any
> other work around of this kind of hardware?

If you don't have a global 'TX done' interrupt, you need a per
descriptor one.
Otherwise you cannot send at maximum rate in the absence of
receive traffic.

	David



--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann April 8, 2014, 9:42 a.m. UTC | #5
On Tuesday 08 April 2014 08:30:37 David Laight wrote:
> From: zhangfei [mailto:zhangfei.gao@linaro.org]
> > On 04/08/2014 02:53 AM, David Miller wrote:
> > > From: Zhangfei Gao <zhangfei.gao@linaro.org>
> > > Date: Sat,  5 Apr 2014 12:35:06 +0800
> > >
> > >> +struct tx_desc {
> > >> +  u32 send_addr;
> > >> +  u16 reserved_16;
> > >> +  u16 send_size;
> 
> The above doesn't look right for endianness independence.
> I'd guess the hardware spec shows a 32bit word with the 'send size'
> in one half - that is what you need to define.

It's probably good to use __be32 as the type (or possibly __be16,
depending what the reserved field actually is), to annotate the
fact that the device reads these as big-endian values from memory,
regardless of the CPU endianess.

> > >> +  u32 reserved_32;
> > >> +  u32 cfg;
> > >> +  u32 wb_addr;
> > >> +} ____cacheline_aligned;
> > >
> > > I do not think that ____cacheline_aligned is appropriate at all here.
> > >
> > > First of all, this is a hardware descriptor, so it has a fixed layout
> > > and therefore size.
> 
> The structure also isn't even a multiple of a power of two.
> So there will be implicit padding at the end.
> 
> Since there isn't a 'pointer to next' I presume the hardware accesses
> the descriptors from adjacent physical addresses.

My understanding is that in this device, the "ppe" gets a pointer
to the bus address of the descriptor through an MMIO write, and
has a hardware FIFO to keep track of the ones it still needs to
manage. The PPE is shared across multiple ethernet devices.

On a related note, it would be good if Zhangfei could add a comment
in the code to explain how the ppe avoids a FIFO overrun, because
it's not clear from the driver source.

> So you need to explicitly pad to that size.
> If the cache line size were 128 byte the above wouldn't work at all.

I originally recommended doing this, as a simplification from the
prior code that was using dma_pool_create with the cache line size
as the required alignment. While the device is used only in ARM SoCs
and ARM has a fixed cache line size of 32 bytes, you are obviously
correct that this is not something we want to rely on in a driver,
and it should use either explicit padding, or
__attribute__((__aligned__(32))) to enforce the implicit padding.

> > I am sorry, but unfortunately this series really does NOT have TX done
> > interrupt after checked with hardware guy many times.
> > And next series will add TX done interrupt according to the feedback.
> > 
> > There are two reasons of removing the TX done interrupt when the chip is
> > designed.
> > 1. The specific product does not care the latency, only care the throughput.
> > 2. When doing many experiment, the tx done interrupt will impact the
> > throughput, as a result reclaim is moved to xmit as one of
> > optimizations, then finally tx done interrupt is removed at all.

Both of these statements are clearly wrong, you have to stop bringing
these up now and work on explaining to the hardware designers what
mistake they made. This cannot be a design decision, it can only be
a bug that has to be fixed before a new version of the chip is rolled
out!

Zhangfei, do not post another version of this driver until you are
sure you have understood the problem and explained it to the hardware
designers!
Please reread what I wrote to you in the past on IRC, and find me
there again if you still have questions.

> > Is it acceptable of removing timer as well as latency handling, or any
> > other work around of this kind of hardware?
> 
> If you don't have a global 'TX done' interrupt, you need a per
> descriptor one.
> Otherwise you cannot send at maximum rate in the absence of
> receive traffic.

Zhangfei has already talked to the hardware designers a few times,
and from what I understood, they have just not considered how
software is going to use this device, and they are too shy to
admit their mistake, while Zhangfei is trying to take the blame for
them by claiming that it works as designed. This is a very nice
gesture of him, but I'm afraid it is counterproductive to getting
the driver merged.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhangfei Gao April 8, 2014, 2:47 p.m. UTC | #6
Dear David,

On 04/08/2014 04:30 PM, David Laight wrote:
> From: zhangfei [mailto:zhangfei.gao@linaro.org]
>> On 04/08/2014 02:53 AM, David Miller wrote:
>>> From: Zhangfei Gao <zhangfei.gao@linaro.org>
>>> Date: Sat,  5 Apr 2014 12:35:06 +0800
>>>
>>>> +struct tx_desc {
>>>> +	u32 send_addr;
>>>> +	u16 reserved_16;
>>>> +	u16 send_size;
>
> The above doesn't look right for endianness independence.
> I'd guess the hardware spec shows a 32bit word with the 'send size'
> in one half - that is what you need to define.
>
> Since this is a tx descriptor (and written by the host) you
> can't have 'reserved' field - the host has to write it.
> probably these are 'must be zero' fields.

Yes, it is not endianness independence.
In fact, we have switched the layout since it is u16 for doing the 
switch endianness.

The reserved_16 is the part not used.
So it is simpler to define u32 here.
If upper 16 bits also need to be set, usually we still use the u32, and 
organize dynamically, right?

>
>>>> +	u32 reserved_32;
>>>> +	u32 cfg;
>>>> +	u32 wb_addr;
>>>> +} ____cacheline_aligned;
>>>
>>> I do not think that ____cacheline_aligned is appropriate at all here.
>>>
>>> First of all, this is a hardware descriptor, so it has a fixed layout
>>> and therefore size.
>
> The structure also isn't even a multiple of a power of two.
> So there will be implicit padding at the end.
>
> Since there isn't a 'pointer to next' I presume the hardware accesses
> the descriptors from adjacent physical addresses.
> So you need to explicitly pad to that size.
> If the cache line size were 128 byte the above wouldn't work at all.
Yes, __aligned(64) can be used here, when I though directly use 64 is 
not good.
The requirement is desc address should be align to 0x40, since desc phys 
is send to register whose [31:6] is used.

>
>>>
>>> Secondly, unless you declare this object statically in the data section
>>> of the object file, the alignment doesn't matter.  These descriptors
>>> are always dynamically allocated, rather than instantiated in the
>>> kernel/driver image.
>>
>> The ____cacheline_aligned used here is only for the requirement of
>> alignment, and use dma_alloc_coherent, while at first dma_pool is used
>> for the requirement of alignment.
>> Otherwise desc[1] is not aligned and can not be used directly, the
>> structure is smaller.
>
> It sounds like you should be explicitly padding the structure
> to 32 bytes - whether or not that is the cache line size.
Got it, understand now.

>
> ...
>> I am sorry, but unfortunately this series really does NOT have TX done
>> interrupt after checked with hardware guy many times.
>> And next series will add TX done interrupt according to the feedback.
>>
>> There are two reasons of removing the TX done interrupt when the chip is
>> designed.
>> 1. The specific product does not care the latency, only care the throughput.
>> 2. When doing many experiment, the tx done interrupt will impact the
>> throughput, as a result reclaim is moved to xmit as one of
>> optimizations, then finally tx done interrupt is removed at all.
>>
>> Is it acceptable of removing timer as well as latency handling, or any
>> other work around of this kind of hardware?
>
> If you don't have a global 'TX done' interrupt, you need a per
> descriptor one.
> Otherwise you cannot send at maximum rate in the absence of
> receive traffic.
>

Global 'TX done' interrupt means interrupt for desc chain (several desc 
link together), right?
There is no interrupt for either desc chain or single desc.

By the way, if single desc interrupt, is it can be optimized like napi, 
disable the interrupt and re-enable the interrupt until all buffers are 
reclaimed?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhangfei Gao April 18, 2014, 1:17 p.m. UTC | #7
Dear David

On 04/08/2014 02:53 AM, David Miller wrote:

>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>   ...
>> +static void hip04_xmit_timer(unsigned long data)
>> +{
>> +	struct net_device *ndev = (void *) data;
>> +
>> +	hip04_tx_reclaim(ndev, false);
>> +}
>   ...
>> +	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>
> And this is where I stop reading your driver, I've stated already that this
> kind of reclaim scheme is unacceptable.
>
> The kernel timers lack the granularity necessary to service TX reclaim
> with a reasonable amount of latency.
>
> You must use some kind of hardware notification of TX slots becomming
> available, I find it totally impossible that a modern ethernet controller
> was created without a TX done interrupt.
>

There is no tx_done interrupt, we may need some workaround.

Is it acceptable to use poll to reclaim the xmitted buffer.
And in the xmit calling napi_schedule.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..17dec03 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@ 
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..29549a5
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,777 @@ 
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			(RCV_INT | RCV_NOBUF)
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+#define RECLAIM_PERIOD			(HZ / 15)
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	spinlock_t lock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct timer_list txtimer;
+};
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else if (speed == SPEED_100)
+			val = 7;
+		else
+			val = 6;	/* SPEED_10 */
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = 1;
+		else
+			val = 0;	/* SPEED_10 */
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= BIT(1);		/* rx */
+	val |= BIT(2);		/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int */
+	val = BIT(6);		/* int threshold 1 package */
+	val |= 0x4;		/* recv timeout */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(BIT(1));	/* rx */
+	val &= ~(BIT(2));	/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
+
+	spin_lock_bh(&priv->lock);
+	while ((tx_tail != tx_head) || (priv->tx_count == TX_DESC_NUM)) {
+		desc = &priv->tx_desc[priv->tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		pkts_compl++;
+		bytes_compl += priv->tx_skb[tx_tail]->len;
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_bh(&priv->lock);
+
+	netdev_completed_queue(ndev, pkts_compl, bytes_compl);
+
+	if (priv->tx_count)
+		mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (unlikely(netif_queue_stopped(ndev)) &&
+		(priv->tx_count < TX_DESC_NUM))
+		netif_wake_queue(ndev);
+}
+
+static void hip04_xmit_timer(unsigned long data)
+{
+	struct net_device *ndev = (void *) data;
+
+	hip04_tx_reclaim(ndev, false);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (priv->tx_count >= TX_DESC_NUM) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	/* Don't wait up for transmitted skbs to be freed. */
+	skb_orphan(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+	netdev_sent_queue(ndev, skb->len);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+	priv->tx_count++;
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+		napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	hip04_mac_stop(ndev);
+	hip04_mac_open(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1];
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto init_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy) {
+			ret = -EPROBE_DEFER;
+			goto init_fail;
+		}
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	setup_timer(&priv->txtimer, hip04_xmit_timer, (unsigned long) ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	del_timer_sync(&priv->txtimer);
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");