Yoharol
by Yoharol

Categories

  • blog

Tags

  • note

Assembly Language and Language C

Write assembly code

First, make sure that both nasm and gcc are installed. There are few more libraries required.

Then, create an assembly code file. For example, test.asm:

global main

main:
	mov ebx, [x]
	add ebx, [y]
	mov eax, ebx
	ret

section .data
x dw 2
y dw 12

Then, compile and run this code:

nasm -f elf test.asm -o test.o
gcc -m32 test.o -o test
./test; echo $?

Notice that command the return value of previous main function is automatically stored as “$?”.

From C Language to Assembly Language

We can use gdb debug tool to get assembly code of a program.

First, we have a simple program, clanguage.c:

int main()
{
    int x = 1;
    x += 2;
    return x;
}

Then, compile it with a compiler you like, and let gdb open the generated program:

clang clanguage.c -o assembly
gdb assembly

While GDB is running, use the disassembly command to get assembly code of corresponding function:

set disassembly-flavor intel
disas main

The following program should be shown:

   0x0000000000401110 <+0>:	push   rbp
   0x0000000000401111 <+1>:	mov    rbp,rsp
   0x0000000000401114 <+4>:	mov    DWORD PTR [rbp-0x4],0x0
   0x000000000040111b <+11>:	mov    DWORD PTR [rbp-0x8],0x1
   0x0000000000401122 <+18>:	mov    eax,DWORD PTR [rbp-0x8]
   0x0000000000401125 <+21>:	add    eax,0x2
   0x0000000000401128 <+24>:	mov    DWORD PTR [rbp-0x8],eax
   0x000000000040112b <+27>:	mov    eax,DWORD PTR [rbp-0x8]
   0x000000000040112e <+30>:	pop    rbp
   0x000000000040112f <+31>:	ret 

## “mov” and “lea”

First we introduce a simple program:

global main

main:
	add ebx, [y]
	lea eax, [x+2]
	mov eax, [eax]
	lea eax, [eax + 12]
	ret

section .data
x dw 2
y dw 12

Try to write down the return value of this program without running it. The result should be 24.

The main problem is to distinguish the following commands:

mov eax, ebx   ; eax = ebx
mov eax, [ebx] ; eax = ValueOnAddress(ebx)
lea eax, ebx   ; error
lea eax, [ebx+2] ; eax = ebx + 2
; If ebx is a value, eax=ebx+2, eax is also a value
; If ebx is an address, eax=ebx+2, eax is also a value 

To conclude, “lea” is a “pure copy”, meanwhile “mov foo [foo]” will automatically analyze the address and get the value on that address.

So we can look closer to the program we gave in the opening of this section:

global main

main:
	add ebx, [y]
	; get value on address y, add it to ebx
	lea eax, [x+2]
	; x is an address, so eax stores an address now, which is y(y=x+2)
	mov eax, [eax]
	; Get the value on address eax and copy it to eax
	lea eax, [eax + 12]
	; eax = eax + 12
	ret

section .data
x dw 2
y dw 12

PTR in C and Assembly

Now we can know what is a “variable”, and what is a “ptr” by looking into the assembly code. Here’s a simple C program and corresponding assembly code:

int main()
{
    int x = 1;
    x += 2;
    int *y = &x;
    *y += 2;
    return x;
}
push   rbp
mov    rbp,rsp
mov    DWORD PTR [rbp-0x4],0x0
mov    DWORD PTR [rbp-0x8],0x1
mov    eax,DWORD PTR [rbp-0x8]
add    eax,0x2
mov    DWORD PTR [rbp-0x8],eax
lea    rcx,[rbp-0x8]
mov    QWORD PTR [rbp-0x10],rcx
mov    rcx,QWORD PTR [rbp-0x10]
mov    eax,DWORD PTR [rcx]
add    eax,0x2
mov    DWORD PTR [rcx],eax
mov    eax,DWORD PTR [rbp-0x8]
pop    rbp
ret   

First, create a variable “x”, set its value and operate on it:

mov    DWORD PTR [rbp-0x8],0x1  ; x = 1
mov    eax,DWORD PTR [rbp-0x8]  ; eax = x
add    eax,0x2                  ; eax += 2
mov    DWORD PTR [rbp-0x8],eax  ; x = eax

Then, ser a ptr “y” and point it to x:

lea    rcx,[rbp-0x8]            ; rcx = address(x)
mov    QWORD PTR [rbp-0x10],rcx ; y = rcx = address(x)
mov    rcx,QWORD PTR [rbp-0x10] 
; rcx = ValueOnAddress[rbp-0x10]=address(x)
mov    eax,DWORD PTR [rcx]      
; eax = ValueOnAddress(rcx)=x
add    eax,0x2					; eax += 2
mov    DWORD PTR [rcx],eax      ; rcx = eax
ret                             ; return eax

Array in C and Assembly

Now let’s check how C create an array:

int main()
{
    int x[2];
    x[0] = 1;
    x[1] = 2;
    x[1] += 2;
    return x[1];
}
push   rbp
mov    rbp,rsp
mov    DWORD PTR [rbp-0x4],0x0
mov    DWORD PTR [rbp-0xc],0x1  ;x[0]=1
mov    DWORD PTR [rbp-0x8],0x2  ;x[1]=2
mov    eax,DWORD PTR [rbp-0x8] 
add    eax,0x2
mov    DWORD PTR [rbp-0x8],eax  ;x[1]+=2
mov    eax,DWORD PTR [rbp-0x8]
pop    rbp
ret   

So an array is simply consecutive addresses.