diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index cc09b78f12..5fac6933ce 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -197,6 +197,62 @@ r_core_wrap.cxx:32103:61: error: assigning to 'RDebugReasonType' from incompatib * Never ever use %lld or %llx. This is not portable. Always use the PFMT64x macros. Those are similar to the ones in GLIB. +# Manage Endianness + +As hackers, we need to be aware of endianness. + +Endianness can become a problem when you try to process buffers or streams +of bytes and store intermediate values as integers with width larger than +a single byte. + +It can seem very easy to write the following code: + + ut8 opcode[4] = {0x10, 0x20, 0x30, 0x40}; + ut32 value = *(ut32*)opcode; + +... and then continue to use "value" in the code to represent the opcode. + +This needs to be avoided! + +Why? What is actually happening? + +When you cast the opcode stream to a unsigned int, the compiler uses the endianness +of the host to interpret the bytes and stores it in host endianness. This leads to +very unportable code, because if you compile on a different endian machine, the +value stored in "value" might be 0x40302010 instead of 0x10203040. + +## Solution + +Use bitshifts and OR instructions to interpret bytes in a known endian. +Instead of casting streams of bytes to larger width integers, do the following: + +ut8 opcode[4] = {0x10, 0x20, 0x30, 0x40}; +ut32 value = opcode[0] | opcode[1] << 8 | opcode[2] << 16 | opcode[3] << 24; + +or if you prefer the other endian: + +ut32 value = opcode[3] | opcode[2] << 8 | opcode[1] << 16 | opcode[0] << 24; + +This is much better because you actually know which endian your bytes are stored in +within the integer value, REGARDLESS of the host endian of the machine. + +## Endian helper functions + +Radare2 now uses helper functions to interpret all byte streams in a known endian. + +Please use these at all times, eg: + + val32 = r_read_be32(buffer) // reads 4 bytes from a stream in BE + val32 = r_read_le32(buffer) // reads 4 bytes from a stream in LE + val32 = r_read_ble32(buffer, isbig) // reads 4 bytes from a stream: + // if isbig is true, reads in BE + // otherwise reads in LE + +There are a number of helper functions for 64, 32, 16, and 8 bit reads and writes. + +(Note that 8 bit reads are equivalent to casting a single byte of the buffer +to a ut8 value, ie endian is irrelevant). + # Additional resources * [README.md](https://github.com/radare/radare2/blob/master/README.md) diff --git a/doc/endian b/doc/endian deleted file mode 100644 index b71cdb453b..0000000000 --- a/doc/endian +++ /dev/null @@ -1,68 +0,0 @@ -Endian issues -============= - -As hackers, we need to be aware of endianness. - -Endianness can become a problem when you try to process buffers or streams -of bytes and store intermediate values as integers with width larger than -a single byte. - -It can seem very easy to write the following code: - - ut8 opcode[4] = {0x10, 0x20, 0x30, 0x40}; - ut32 value = *(ut32*)opcode; - - ... and then continue to use "value" in the code to represent the opcode. - -This needs to be avoided! - -Why? What is actually happening? - -When you cast the opcode stream to a unsigned int, the compiler uses the endianness -of the host to interpret the bytes and stores it in host endianness. This leads to -very unportable code, because if you compile on a different endian machine, the -value stored in "value" might be 0x40302010 instead of 0x10203040. - -In the past, radare devs were not as strict about this issue, and as a result, -needed to swap the endian of values regularly in the code. - -Solution -======== - -Use bitshifts and OR instructions to interpret bytes in a known endian. -Instead of casting streams of bytes to larger width integers, do the following: - - ut8 opcode[4] = {0x10, 0x20, 0x30, 0x40}; - ut32 value = opcode[0] | opcode[1] << 8 | opcode[2] << 16 | opcode[3] << 24; - - or if you prefer the other endian: - - ut32 value = opcode[3] | opcode[2] << 8 | opcode[1] << 16 | opcode[0] << 24; - -This is much better because you actually know which endian your bytes are stored in -within the integer value, REGARDLESS of the host endian of the machine. - - -Endian helper functions -======================= - -Radare2 now uses helper functions to interpret all byte streams in a known endian. - -Please use these at all times, eg: - - val32 = r_read_be32(buffer) // reads 4 bytes from a stream in BE - val32 = r_read_le32(buffer) // reads 4 bytes from a stream in LE - val32 = r_read_ble32(buffer, isbig) // reads 4 bytes from a stream: - // if isbig is true, reads in BE - // otherwise reads in LE - -There are a number of helper functions for 64, 32, 16, and 8 bit reads and writes. - -(Note that 8 bit reads are equivalent to casting a single byte of the buffer -to a ut8 value, ie endian is irrelevant). - -Happy hacking! - -- damo22 - -