diff mbox

[RFC] ARM: Add generic instruction opcode manipulation helpers

Message ID 1322220493-3251-1-git-send-email-dave.martin@linaro.org
State Superseded
Headers show

Commit Message

Dave Martin Nov. 25, 2011, 11:28 a.m. UTC
This patch adds some endianness-agnostic helpers to convert machine
instructions between canonical integer form and in-memory
representation, and also provides a transparent way to read a
single Thumb instruction from memory, without the need to know the
size in advance or write explicit condition checks.

A canonical integer form for representing instructions is also
formalised here.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
---
 arch/arm/include/asm/opcodes.h |  162 ++++++++++++++++++++++++++++++++++++++++
 1 files changed, 162 insertions(+), 0 deletions(-)
 create mode 100644 arch/arm/include/asm/opcodes.h

Comments

Dave Martin Nov. 25, 2011, 11:32 a.m. UTC | #1
On Fri, Nov 25, 2011 at 11:28:13AM +0000, Dave Martin wrote:
> This patch adds some endianness-agnostic helpers to convert machine
> instructions between canonical integer form and in-memory
> representation, and also provides a transparent way to read a
> single Thumb instruction from memory, without the need to know the
> size in advance or write explicit condition checks.
> 
> A canonical integer form for representing instructions is also
> formalised here.
> 
> Signed-off-by: Dave Martin <dave.martin@linaro.org>
> ---

Notes:

 * We don't necessarily need everything that's in this example header

 * A generic instruction writing macro could be added, similar to the
   generic read macro, if this looks useful.

 * We could align the use of undefined instruction encodings across
   the kernel via this header: all instruction sets allow a
   guaranteed undefined instruction with up to 8 choosable bits,
   and we can also define additional generic and "NULL" encodings
   for internal use by kernel code which deals with instruction
   opcodes -- to signal special cases and error values etc.
Will Deacon Dec. 6, 2011, 3:08 p.m. UTC | #2
Hi Dave,

On Fri, Nov 25, 2011 at 11:28:13AM +0000, Dave Martin wrote:
> This patch adds some endianness-agnostic helpers to convert machine
> instructions between canonical integer form and in-memory
> representation, and also provides a transparent way to read a
> single Thumb instruction from memory, without the need to know the
> size in advance or write explicit condition checks.
> 
> A canonical integer form for representing instructions is also
> formalised here.
> 
> Signed-off-by: Dave Martin <dave.martin@linaro.org>
> ---
>  arch/arm/include/asm/opcodes.h |  162 ++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 162 insertions(+), 0 deletions(-)
>  create mode 100644 arch/arm/include/asm/opcodes.h

It looks like I might need to implement a basic disassembler for the
hw_breakpoint code and I would certainly like to reuse as much code as I
can. This header could obviously provide the code to fetch and format the
instruction, but it would be nice to have some extra helpers to aid
decoding.

Tixy - how much work do you reckon it would be to rework your kprobes
decoding code into a generic `here are my callbacks, please decode this
instruction stream for me' type thing?

All I want for hw_breakpoint is to know whether an instruction is a load or
a store, but even for that it looks like I'll need to duplicate a lot of
stuff.

Will
Dave Martin Dec. 6, 2011, 3:20 p.m. UTC | #3
On Tue, Dec 06, 2011 at 03:08:55PM +0000, Will Deacon wrote:
> Hi Dave,
> 
> On Fri, Nov 25, 2011 at 11:28:13AM +0000, Dave Martin wrote:
> > This patch adds some endianness-agnostic helpers to convert machine
> > instructions between canonical integer form and in-memory
> > representation, and also provides a transparent way to read a
> > single Thumb instruction from memory, without the need to know the
> > size in advance or write explicit condition checks.
> > 
> > A canonical integer form for representing instructions is also
> > formalised here.
> > 
> > Signed-off-by: Dave Martin <dave.martin@linaro.org>
> > ---
> >  arch/arm/include/asm/opcodes.h |  162 ++++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 162 insertions(+), 0 deletions(-)
> >  create mode 100644 arch/arm/include/asm/opcodes.h
> 
> It looks like I might need to implement a basic disassembler for the
> hw_breakpoint code and I would certainly like to reuse as much code as I
> can. This header could obviously provide the code to fetch and format the
> instruction, but it would be nice to have some extra helpers to aid
> decoding.
> 
> Tixy - how much work do you reckon it would be to rework your kprobes
> decoding code into a generic `here are my callbacks, please decode this
> instruction stream for me' type thing?
> 
> All I want for hw_breakpoint is to know whether an instruction is a load or
> a store, but even for that it looks like I'll need to duplicate a lot of
> stuff.

Note, I'm currently waiting on Leif to repost his opcodes.h before I
repost my instration-swabbing additions on top of it, since the swabbing
stuff seems to be strictly non-urgent.

Cheers
---Dave
Bi Junxiao Dec. 7, 2011, 5:22 a.m. UTC | #4
on 12/06/2011 11:20 PM Dave Martin wrote:
> On Tue, Dec 06, 2011 at 03:08:55PM +0000, Will Deacon wrote:
>    
>> Hi Dave,
>>
>> On Fri, Nov 25, 2011 at 11:28:13AM +0000, Dave Martin wrote:
>>      
>>> This patch adds some endianness-agnostic helpers to convert machine
>>> instructions between canonical integer form and in-memory
>>> representation, and also provides a transparent way to read a
>>> single Thumb instruction from memory, without the need to know the
>>> size in advance or write explicit condition checks.
>>>
>>> A canonical integer form for representing instructions is also
>>> formalised here.
>>>
>>> Signed-off-by: Dave Martin<dave.martin@linaro.org>
>>> ---
>>>   arch/arm/include/asm/opcodes.h |  162 ++++++++++++++++++++++++++++++++++++++++
>>>   1 files changed, 162 insertions(+), 0 deletions(-)
>>>   create mode 100644 arch/arm/include/asm/opcodes.h
>>>        
>> It looks like I might need to implement a basic disassembler for the
>> hw_breakpoint code and I would certainly like to reuse as much code as I
>> can. This header could obviously provide the code to fetch and format the
>> instruction, but it would be nice to have some extra helpers to aid
>> decoding.
>>
>> Tixy - how much work do you reckon it would be to rework your kprobes
>> decoding code into a generic `here are my callbacks, please decode this
>> instruction stream for me' type thing?
>>
>> All I want for hw_breakpoint is to know whether an instruction is a load or
>> a store, but even for that it looks like I'll need to duplicate a lot of
>> stuff.
>>      
> Note, I'm currently waiting on Leif to repost his opcodes.h before I
> repost my instration-swabbing additions on top of it, since the swabbing
> stuff seems to be strictly non-urgent.
>    
I am also waiting for your patch to do my be8 fix.
> Cheers
> ---Dave
>
>
Dave Martin Dec. 7, 2011, 10:42 a.m. UTC | #5
On Wed, Dec 07, 2011 at 01:22:34PM +0800, Bi Junxiao wrote:
> on 12/06/2011 11:20 PM Dave Martin wrote:
> >On Tue, Dec 06, 2011 at 03:08:55PM +0000, Will Deacon wrote:
> >>Hi Dave,
> >>
> >>On Fri, Nov 25, 2011 at 11:28:13AM +0000, Dave Martin wrote:
> >>>This patch adds some endianness-agnostic helpers to convert machine
> >>>instructions between canonical integer form and in-memory
> >>>representation, and also provides a transparent way to read a
> >>>single Thumb instruction from memory, without the need to know the
> >>>size in advance or write explicit condition checks.
> >>>
> >>>A canonical integer form for representing instructions is also
> >>>formalised here.
> >>>
> >>>Signed-off-by: Dave Martin<dave.martin@linaro.org>
> >>>---
> >>>  arch/arm/include/asm/opcodes.h |  162 ++++++++++++++++++++++++++++++++++++++++
> >>>  1 files changed, 162 insertions(+), 0 deletions(-)
> >>>  create mode 100644 arch/arm/include/asm/opcodes.h
> >>It looks like I might need to implement a basic disassembler for the
> >>hw_breakpoint code and I would certainly like to reuse as much code as I
> >>can. This header could obviously provide the code to fetch and format the
> >>instruction, but it would be nice to have some extra helpers to aid
> >>decoding.
> >>
> >>Tixy - how much work do you reckon it would be to rework your kprobes
> >>decoding code into a generic `here are my callbacks, please decode this
> >>instruction stream for me' type thing?
> >>
> >>All I want for hw_breakpoint is to know whether an instruction is a load or
> >>a store, but even for that it looks like I'll need to duplicate a lot of
> >>stuff.
> >Note, I'm currently waiting on Leif to repost his opcodes.h before I
> >repost my instration-swabbing additions on top of it, since the swabbing
> >stuff seems to be strictly non-urgent.
> I am also waiting for your patch to do my be8 fix.

OK -- in that case I will clean up and repost my patch anyway.

The two proposed bits of functionality in that header are independent,
so the later merge shouldn't affect what you're doing.

Cheers
---Dave
diff mbox

Patch

diff --git a/arch/arm/include/asm/opcodes.h b/arch/arm/include/asm/opcodes.h
new file mode 100644
index 0000000..5d18f92
--- /dev/null
+++ b/arch/arm/include/asm/opcodes.h
@@ -0,0 +1,162 @@ 
+/*
+ * arch/arm/include/asm/opcodes.h
+ *
+ * Copyright (C) 2011 Linaro Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __ARM_OPCODES_H
+#define __ARM_OPCODES_H
+
+#include <linux/types.h>
+#include <linux/swab.h>
+
+typedef u32 arm_opcode_t;
+
+/*
+ * Canonical instruction representation (arm_opcode_t):
+ *
+ *	ARM:		0xKKLLMMNN
+ *	Thumb 16-bit:	0x0000KKLL, where KK < 0xE8
+ *	Thumb 32-bit:	0xKKLLMMNN, where KK >= 0xE8
+ *
+ * There is no way to distinguish an ARM instruction in canonical representation
+ * from a Thumb instruction (just as these cannot be distinguished in memory).
+ * Where this distinction is important, it needs to be tracked separately.
+ *
+ * Note that values in the range 0x0000E800..0xE7FFFFFF intentionally do not
+ * represent any valid Thumb-2 instruction.  For this range,
+ * __opcode_is_thumb32() and __opcode_is_thumb16() will both be false.
+ */
+
+#ifdef CONFIG_CPU_ENDIAN_BE8
+#define __opcode_to_mem_arm(x) swab32(x)
+#define __opcode_to_mem_thumb16(x) swab16(x)
+#define __opcode_to_mem_thumb32(x) swahb32(x)
+#else
+#define __opcode_to_mem_arm(x) (x) ((u32)(x))
+#define __opcode_to_mem_thumb16(x) ((u16)(x))
+#define __opcode_to_mem_thumb32(x) swahw32(x)
+#endif
+
+#define __mem_to_opcode_arm(x) __opcode_to_mem_arm(x)
+#define __mem_to_opcode_thumb16(x) __opcode_to_mem_thumb16(x)
+#define __mem_to_opcode_thumb32(x) __opcode_to_mem_thumb32(x)
+
+/* Operations specific to Thumb opcodes */
+
+/* Instruction size checks: */
+#define __opcode_is_thumb32(x) ((u32)(x) >= 0xE8000000UL)
+#define __opcode_is_thumb16(x) ((u32)(x) < 0xE800UL)
+
+/* Operations to construct or split 32-bit Thumb instructions: */
+#define __opcode_thumb32_first(x) ((u16)((thumb_opcode) >> 16))
+#define __opcode_thumb32_second(x) ((u16)(thumb_opcode))
+#define __opcode_thumb32_compose(first, second) \
+	(((u32)(u16)(first) << 16) | (u32)(u16)(second))
+
+/*
+ * int __opcode_read_<isa>(
+ *	arm_opcode_t *outp,
+ *	void const **inpp,
+ *	int (*readfn)(void *dst, void const *src, size_t size)
+ * )
+ *
+ * This helper reads one complete Thumb instruction and stores the canonicalised
+ * opcode to *outp.
+ *
+ * For maximum flexibility, the mechanism for reading the instruction is
+ * specified as an argument: read16fn(dst, src, size) must attempt to copy
+ * <size> bytes from <src> to <dst>.  <readfn>() should return 0 if the copy
+ * was successful, or an error code otherwise.
+ *
+ * Return:
+ *	0	success;
+ *			*outp contains the instruction read
+ *			*inp points to the next instruction
+ *	!= 0	failure:
+ *			*outp is undefined
+ *			*inp contains the first address not successfully read
+ *
+ * Writing this is a macro means that <readfn> can also be implemented as a
+ * macro.  This permits the simple case where no error checking is required to
+ * be heavily optimised.
+ */
+#define __opcode_read_thumb(outp, inpp, readfn) ({			\
+	u16 __t;							\
+									\
+	BUILD_BUG_ON(sizeof(*(outp)) != sizeof(arm_opcode_t));		\
+									\
+	___read_advance(&__t, inpp, sizeof(__t), readfn)		\
+	|| __opcode_is_thumb16(*(outp) = __mem_to_opcode_thumb16(__t)) ? 0 : \
+		___read_advance(&__t, inpp, sizeof(__t), readfn)	\
+		|| (*(outp) = __opcode_thumb32_compose(			\
+					*(outp),			\
+					__mem_to_opcode_thumb16(__t)),	\
+		    0);							\
+})
+#define ___read_advance(outp, inpp, size, readfn) ({			\
+	int __status;							\
+									\
+	__status = readfn(outp, *(inpp), size);				\
+	if (!__status)							\
+		*(inpp) = (typeof(*(inpp)))((uintptr_t)*(inpp) + (size)); \
+									\
+	__status;							\
+})
+
+#define __opcode_read_arm(outp, inpp, readfn) ({			\
+	BUILD_BUG_ON(sizeof(*(outp)) != sizeof(arm_opcode_t));		\
+									\
+	___read_advance(outp, inpp, sizeof(arm_opcode_t), readfn)	\
+	|| (*(outp) = __mem_to_opcode_arm(*(outp)),			\
+	    0);								\
+})
+
+/* __opcode_read_<isa>_simple(
+ *	arm_opcode_t *outp,
+ *	void const **inpp
+ * )
+ *
+ * Reads n Thumb-2 instruction from memory, without error checks.
+ * This macro will always succeed and return 0.  Otherwise, it is similar
+ * to __opcode_read_thumb().
+ */
+#define __opcode_read_thumb_simple(outp, inp)				\
+	__opcode_read_thumb(outp, inp, ___read16_simple)
+#define __opcode_read_arm_simple(outp, inp)				\
+	__opcode_read_arm(outp, inp, ___read32_simple)
+	
+#define ___read16_simple(outp, inp, size) 				\
+	(*(outp) = *(u16 *)(inp), 0)
+#define ___read32_simple(outp, inp, size) 				\
+	(*(outp) = *(u32 *)(inp), 0)
+
+
+#ifdef CONFIG_THUMB2_KERNEL
+#define __opcode_read(outp, inpp, readfn) \
+	__opcode_read_thumb(outp, inpp, readfn)
+#define __opcode_read_simple(outp, inpp) \
+	__opcode_read_thumb_simple(outp, inpp)
+#else
+#define __opcode_read(outp, inpp, readfn) \
+	__opcode_read_arm(outp, inpp, readfn)
+#define __opcode_read_simple(outp, inpp) \
+	__opcode_read_arm_simple(outp, inpp)
+#endif
+
+/* Maybe add some C static functions here, with proper type annotations */
+
+#endif /* ! __ARM_OPCODES_H */