Reverse Engineering : An Introduction

Rahul Singh Chauhan
6 min readOct 15, 2020

Reverse Engineering

Definition:

Basically, it is looking at things from outside in. It is like having a finalized piece with you like a software system or a hardware product that you want to analyze it to trace it back to its roots.

Why we reverse Engineer things?

There could be various reasons behind it. Maybe for your own financial gains , like your opponent launched a product into the market and you want to see how they built it. Or maybe because you want to fix some bugs in your product,optimize its performance ,audit its security or prevent copyright infringement.

Example : You might have come across cracked versions of some kind of game,video editing software,etc. This is an application of reverse engineering.

Lets Begin:

Programs/Software are written in high level language which are then converted to machine level language so that the machine can understand it and implement the instructions (as specified in the program/software). The thing with machine level code is that we can’t understand it EASILY. It might end up banging your head against a wall before you understand what it is trying to do.

So, we came up with assembly level language that makes it a lot easier than to read machine level code. Assembly level code deals in mnemonics.Like a = a+b in python could be written as add a,b in assembly level language.

Types of Syntax:

There are 2 types of syntax through which you can read the assembly code generated by your code.

1. AT&T Syntax
2. Intel Syntax

I personally prefer AT&T syntax but its totally up-to you which one you choose or prefer.I’ll be listing out the differences between the two.

a) The source , destination operands are written in the opposite order .

Example : In AT & T syntax the format is :
<instruction/mnemonic> <source operand> <destination operand>

While in Intel syntax the format is :
<instruction/mnemonic> <destination operand> <source operand>

b) In AT&T Syntax , a percentage sign (%) is prepended to the register names and a dollar sign is prepended to the constants.
Example : %rax (represents a register)
$0x3 (represents 3.) [Constants are represented in hex format]

c) You’ll often see a character appended with an instruction.Which represents the size of the data involved.

Memory is represented in the form of words. A word a ordered set of bytes or bits that can be used to store , transmit or operate upon in computer.

Example:
q : Stands for a quad word (64 bits)
s : Stands for single precision (32 bits)
l : Stands for double precision/double word(64 bits)
b : Stands for byte (8 bits)
w: Stands for word (16 bits)

Registers :

Since the language here is assembly so we’ll mostly be dealing with registers here.A register can be defined as a fast memory location that can be used to store data and perform operations upon them. They are present within the processor.

For a x86–64 architecture, the registers are 64 bit and there are 16 registers that are provided to us by Intel. Although these are 64 bit , but we can reference parts of them as per our needs.

Example: we could reference them as 32 bits, or 16 bits or even 8 bits (More of this ahead). So here are all the 64-bit registers and them being referenced as 32 bit.

NOTE :As I prefer AT&T syntax, I’ve used a % sign ahead of each register. Intel syntax doesn’t make use of this sign instead it is written without the % sign.As you’ll see ahead that %rax is being called %eax when it is under 32bit. But actually they are the same. Its just when we use a part of 64 bit register(%rax here) we call it by different name.

← — — — 64 bit — — — — — — — — 32 bit — — →

← — — — %rax — — — — — — — — -%eax — — →

← — — — %rbx — — — — — — — — -%ebx — — →

← — — — %rcx — — — — — — — — - %ecx — — →

← — — — %rdx — — — — — — — — - %edx — — →

← — — — %rsi — — — — — — — — — -%esi — — →

← — — — %rdi — — — — — — — — - -%edi — — →

← — — — %rsp — — — — — — — — — %esp — — →

← — — — %rbp — — — — — — — — — %ebp — — →

← — — — %r8 — — — — — — — — — -%e8d — — →

← — — — %r9 — — — — — — — — — %e9d — — →

← — — — %r10 — — — — — — — — - %e10d — — →

← — — — %r11 — — — — — — — — -%e11d — — →

← — — — %r12 — — — — — — — — -%r12d — — →

← — — — %r13 — — — — — — — — - %r13d — — →

← — — — %r14 — — — — — — — — - %e14d — — →

← — — — %r15 — — — — — — — — -%e15d — — →

The first 6 registers, up-to %rdi, are general purpose registers.They store values for temporary storage and computation purpose.%rsp is the stack pointer. It always points to the top of the current stack frame.While %rbp is called the frame pointer or the base pointer. It points at the base of the stack frame.

NOTE : Every function that you write will be executed in a new frame.

Now as we discussed previously, we’ll often see our instructions appended with characters.So lets discuss the instructions first(though not all).

1. leaq source , destination: The instruction actually is “lea” and it has been appended with “q” to mean a quad word which means that there is a 64 bit register involved.
The instruction sets the destination to the address denoted by the expression in the source

2. subq source, destination :
This is an instruction to perform subtraction.
Equivalent to destination = destination-source

3. addq source, destination:
This is an instruction to perform addition.
Equivalent to destination =destination +source

4. imulq source, destination:
This is an instruction to perform multiplication.
Equivalent to destination = destination * source

There are other instructions as well like sar, xor, and, or, leave, etc. but we won’t be requiring them right now.

Writing and understanding our first C program
This is a C program that simply does nothing(as you’ll see)

//Program name : first.c
#include <stdio.h>
int main()
{
return 123;
}

Lets compile it with gcc and produce a output into first.out
>> gcc first.c -o first.out

Lets use gdb and look at the assembly code.
>>gdb first.out

Lets type in disassemble main to look at the assembly code.
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000000005fa <+0>: push %rbp
0x00000000000005fb <+1>: mov %rsp,%rbp
0x00000000000005fe <+4>: mov $0x7b,%eax
0x0000000000000603 <+9>: pop %rbp
0x0000000000000604 <+10>: retq
End of assembler dump.

This is my output but yours might be different(depending upon the memory locations[on the left side] and the syntax you’ve used).

Lets look at the code and analyze it.You’ll always see the first 2 and the last 2 lines in the output. They are called the prologue and the epilogue of the functions.They are quite important and is a topic for another time.

By the way push and pop are used to push and pop the elements onto and off the stack respectively.

Lets look at what is left now.
mov $0x7b %eax. As we are looking at it using AT&T syntax we can figure out that 0x7b is a constant as it is prepended with a “$” sign and similarly eax is a register.

The decimal equivalent of 0x7b is 123 .Now since what we are returning 123 in our source code, so %eax is being set as 123.

Lets look at another example.
//Program name: second.c
#include <stdio.h>
int main()
{
int a = 10;
return 0;
}

>>gcc second.c -o second.out
>>gdb second.out
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000000005fa <+0>: push %rbp
0x00000000000005fb <+1>: mov %rsp,%rbp
0x00000000000005fe <+4>: movl $0xa,-0x4(%rbp)
0x0000000000000605 <+11>: mov $0x0,%eax
0x000000000000060a <+16>: pop %rbp
0x000000000000060b <+17>: retq
End of assembler dump.

The first 2 and last 2 lines have repeated themselves as mentioned above.
Lets look at the third line movl $0xa,-0x4(%rbp).
-0x4(%rbp) represnts a momory location. [We’ll see it when we discuss %rbp and %rsp].
movl transfers 0xa (hex equivalent of 10 to the location -0x4(%rbp)).
This is followed by a mov $0x0,%eax, which is for return 0.
retq is a instruction indicating the end of the program.

I Hope this article was helpful to you.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response