Assembly Language and Language C
Write assembly code
First, make sure that both nasm and gcc are installed. There are few more libraries required.
Then, create an assembly code file. For example, test.asm:
global main
main:
mov ebx, [x]
add ebx, [y]
mov eax, ebx
ret
section .data
x dw 2
y dw 12
Then, compile and run this code:
nasm -f elf test.asm -o test.o
gcc -m32 test.o -o test
./test; echo $?
Notice that command the return value of previous main function is automatically stored as “$?”.
From C Language to Assembly Language
We can use gdb debug tool to get assembly code of a program.
First, we have a simple program, clanguage.c:
int main()
{
int x = 1;
x += 2;
return x;
}
Then, compile it with a compiler you like, and let gdb open the generated program:
clang clanguage.c -o assembly
gdb assembly
While GDB is running, use the disassembly command to get assembly code of corresponding function:
set disassembly-flavor intel
disas main
The following program should be shown:
0x0000000000401110 <+0>: push rbp
0x0000000000401111 <+1>: mov rbp,rsp
0x0000000000401114 <+4>: mov DWORD PTR [rbp-0x4],0x0
0x000000000040111b <+11>: mov DWORD PTR [rbp-0x8],0x1
0x0000000000401122 <+18>: mov eax,DWORD PTR [rbp-0x8]
0x0000000000401125 <+21>: add eax,0x2
0x0000000000401128 <+24>: mov DWORD PTR [rbp-0x8],eax
0x000000000040112b <+27>: mov eax,DWORD PTR [rbp-0x8]
0x000000000040112e <+30>: pop rbp
0x000000000040112f <+31>: ret
## “mov” and “lea”
First we introduce a simple program:
global main
main:
add ebx, [y]
lea eax, [x+2]
mov eax, [eax]
lea eax, [eax + 12]
ret
section .data
x dw 2
y dw 12
Try to write down the return value of this program without running it. The result should be 24.
The main problem is to distinguish the following commands:
mov eax, ebx ; eax = ebx
mov eax, [ebx] ; eax = ValueOnAddress(ebx)
lea eax, ebx ; error
lea eax, [ebx+2] ; eax = ebx + 2
; If ebx is a value, eax=ebx+2, eax is also a value
; If ebx is an address, eax=ebx+2, eax is also a value
To conclude, “lea” is a “pure copy”, meanwhile “mov foo [foo]” will automatically analyze the address and get the value on that address.
So we can look closer to the program we gave in the opening of this section:
global main
main:
add ebx, [y]
; get value on address y, add it to ebx
lea eax, [x+2]
; x is an address, so eax stores an address now, which is y(y=x+2)
mov eax, [eax]
; Get the value on address eax and copy it to eax
lea eax, [eax + 12]
; eax = eax + 12
ret
section .data
x dw 2
y dw 12
PTR in C and Assembly
Now we can know what is a “variable”, and what is a “ptr” by looking into the assembly code. Here’s a simple C program and corresponding assembly code:
int main()
{
int x = 1;
x += 2;
int *y = &x;
*y += 2;
return x;
}
push rbp
mov rbp,rsp
mov DWORD PTR [rbp-0x4],0x0
mov DWORD PTR [rbp-0x8],0x1
mov eax,DWORD PTR [rbp-0x8]
add eax,0x2
mov DWORD PTR [rbp-0x8],eax
lea rcx,[rbp-0x8]
mov QWORD PTR [rbp-0x10],rcx
mov rcx,QWORD PTR [rbp-0x10]
mov eax,DWORD PTR [rcx]
add eax,0x2
mov DWORD PTR [rcx],eax
mov eax,DWORD PTR [rbp-0x8]
pop rbp
ret
First, create a variable “x”, set its value and operate on it:
mov DWORD PTR [rbp-0x8],0x1 ; x = 1
mov eax,DWORD PTR [rbp-0x8] ; eax = x
add eax,0x2 ; eax += 2
mov DWORD PTR [rbp-0x8],eax ; x = eax
Then, ser a ptr “y” and point it to x:
lea rcx,[rbp-0x8] ; rcx = address(x)
mov QWORD PTR [rbp-0x10],rcx ; y = rcx = address(x)
mov rcx,QWORD PTR [rbp-0x10]
; rcx = ValueOnAddress[rbp-0x10]=address(x)
mov eax,DWORD PTR [rcx]
; eax = ValueOnAddress(rcx)=x
add eax,0x2 ; eax += 2
mov DWORD PTR [rcx],eax ; rcx = eax
ret ; return eax
Array in C and Assembly
Now let’s check how C create an array:
int main()
{
int x[2];
x[0] = 1;
x[1] = 2;
x[1] += 2;
return x[1];
}
push rbp
mov rbp,rsp
mov DWORD PTR [rbp-0x4],0x0
mov DWORD PTR [rbp-0xc],0x1 ;x[0]=1
mov DWORD PTR [rbp-0x8],0x2 ;x[1]=2
mov eax,DWORD PTR [rbp-0x8]
add eax,0x2
mov DWORD PTR [rbp-0x8],eax ;x[1]+=2
mov eax,DWORD PTR [rbp-0x8]
pop rbp
ret
So an array is simply consecutive addresses.