- ELF is the file format used for object files (
.o
s), binaries, shared libraries and core dumps in Linux. - ELF has the same layout for all architectures, however endianness and word size can differ; relocation types, symbol types and the like may have platform-specific values, and of course, the contained code is arch-specific.
- An ELF file provides 2 views on the data it contains: A linking view and an execution view. Those two views can be accessed by two headers: Section header & Program header.
- Before dive in to ELF understanding, let’s clear some basic keywords as follows :
Shared Library(.so)#
- Combination of multiple objects files
- Single Copy loaded in memory shared by multiple processes (that’s why called shared object)
Sections#
- Sections contain information needed during the linking time
Segments#
- Segments contain information needed at run time
- Neither the
Section Header
nor the Program Header
have fixed positions, they can be located anywhere in an ELF file. To find them the ELF header
is used, which is located at the very start of the file. - The first bytes contain the elf magic
\x7fELF
, followed by the class ID (32 or 64 bit ELF file), the data format ID (little-endian/big-endian), the machine type, etc.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| vishal@linuxmachine:~$ readelf -h /bin/bash
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x805be30
Start of program headers: 52 (bytes into file)
Start of section headers: 675344 (bytes into file)
Flags: 0x0
Size of this header: 52
Size of program headers: 32
Number of program headers: 8
Size of section headers: 40
Number of section headers: 26
Section header string table index: 25
|
- Gives a complete overview of the sections contained in the ELF file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| vishal@linuxmachine:~$ readelf -S /bin/bash
There are 26 section headers, starting at offset 0xa4e10:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 00000 000000 00 0 0 0
[ 1] .interp PROGBITS 08048134 00134 000013 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 08048148 00148 000020 00 A 0 0 4
[ 3] .hash HASH 08048168 00168 002e48 04 A 4 0 4
[ 4] .dynsym DYNSYM 0804afb0 02fb0 007890 10 A 5 1 4
[ 5] .dynstr STRTAB 08052840 0a840 0074e2 00 A 0 0 1
[ 6] .gnu.version VERSYM 08059d22 11d22 000f12 02 A 4 0 2
[ 7] .gnu.version_r VERNEED 0805ac34 12c34 000090 00 A 5 2 4
[ 8] .rel.dyn REL 0805acc4 12cc4 000040 08 A 4 0 4
[ 9] .rel.plt REL 0805ad04 12d04 0005a8 08 A 4 11 4
[10] .init PROGBITS 0805b2ac 132ac 000017 00 AX 0 0 4
[11] .plt PROGBITS 0805b2c4 132c4 000b60 04 AX 0 0 4
[12] .text PROGBITS 0805be30 13e30 077154 00 AX 0 0 16
[13] .fini PROGBITS 080d2f84 8af84 00001a 00 AX 0 0 4
[14] .rodata PROGBITS 080d2fa0 8afa0 015198 00 A 0 0 32
[15] .eh_frame_hdr PROGBITS 080e8138 a0138 00002c 00 A 0 0 4
[16] .eh_frame PROGBITS 080e8164 a0164 00009c 00 A 0 0 4
[17] .ctors PROGBITS 080e9200 a0200 000008 00 WA 0 0 4
[18] .dtors PROGBITS 080e9208 a0208 000008 00 WA 0 0 4
[19] .jcr PROGBITS 080e9210 a0210 000004 00 WA 0 0 4
[20] .dynamic DYNAMIC 080e9214 a0214 0000d8 08 WA 5 0 4
[21] .got PROGBITS 080e92ec a02ec 000004 04 WA 0 0 4
[22] .got.plt PROGBITS 080e92f0 a02f0 0002e0 04 WA 0 0 4
[23] .data PROGBITS 080e95e0 a05e0 004764 00 WA 0 0 32
[24] .bss NOBITS 080edd60 a4d44 004bc8 00 WA 0 0 32
[25] .shstrtab STRTAB 00000000 a4d44 0000cc 00 0 0 1
|
- Contains information for the kernel on how to start the program
- The
LOAD
directives determinate what parts of the ELF file get mapped into memory. - The
INTERP
the directive specifies an ELF interpreter, which is normally /lib/ld-linux.so.2 on Linux systems. - The DYNAMIC entry points to the .dynamic the section which contains information used by the ELF interpreter to set up the binary.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| vishal@linuxmachine:~$ readelf -l /bin/bash
Elf file type is EXEC (Executable file)
Entry point 0x805be30
There are 8 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
INTERP 0x000134 0x08048134 0x08048134 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0xa0200 0xa0200 R E 0x1000
LOAD 0x0a0200 0x080e9200 0x080e9200 0x04b44 0x09728 RW 0x1000
DYNAMIC 0x0a0214 0x080e9214 0x080e9214 0x000d8 0x000d8 RW 0x4
NOTE 0x000148 0x08048148 0x08048148 0x00020 0x00020 R 0x4
GNU_EH_FRAME 0x0a0138 0x080e8138 0x080e8138 0x0002c 0x0002c R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt ...
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag
06 .eh_frame_hdr
07
|
Relocation Table OR Relocation Section#
- Relocation is the process of connecting symbolic references(functions,variable,etc names) with symbolic definitions(function, variable,etc definitions).
- For example, when a program calls a function(at runtime), the associated call instruction must transfer control to the proper destination address at execution.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| vishal@linuxmachine:~$ readelf -r /bin/bash
Relocation section '.rel.dyn' at offset 0x12cc4 contains 8 entries:
Offset Info Type Sym.Value Sym. Name
080e92ec 00078006 R_386_GLOB_DAT 00000000 __gmon_start__
080edd68 00035205 R_386_COPY 080edd68 stdout
080edd6c 00035d05 R_386_COPY 080edd6c stderr
080edd70 00046405 R_386_COPY 080edd70 PC
080edd74 00067405 R_386_COPY 080edd74 stdin
080edd78 0006e305 R_386_COPY 080edd78 UP
Relocation section '.rel.plt' at offset 0x12d04 contains 181 entries:
Offset Info Type Sym.Value Sym. Name
080e9368 00012c07 R_386_JUMP_SLOT 00000000 fileno
080e936c 00013807 R_386_JUMP_SLOT 00000000 strcmp
080e9370 00014107 R_386_JUMP_SLOT 0805b4a4 close
080e9374 00015307 R_386_JUMP_SLOT 00000000 dlsym
080e937c 00016a07 R_386_JUMP_SLOT 00000000 fprintf
080e9388 00018307 R_386_JUMP_SLOT 00000000 fflush
080e9390 00019c07 R_386_JUMP_SLOT 0805b524 unlink
080e930c 00003307 R_386_JUMP_SLOT 00000000 regexec
080e9328 00007a07 R_386_JUMP_SLOT 00000000 ferror
080e9330 00008307 R_386_JUMP_SLOT 00000000 readdir64
080e9334 00008f07 R_386_JUMP_SLOT 00000000 strchr
080e9338 0000a507 R_386_JUMP_SLOT 00000000 fdopen
080e9344 0000da07 R_386_JUMP_SLOT 00000000 getpid
080e9360 00012207 R_386_JUMP_SLOT 00000000 write
080e95cc 00078707 R_386_JUMP_SLOT 00000000 strcpy
...
...
|
Symbol Table: Two Use Case#
- Symbol table in object/executable files will contain the symbolic name of functions & variables with addresses which are used by Linker to resolve any unresolved references during linking.
- There’s also the symbol table in a shared library/DLL produced by the linker(at compile time) which is used by the dynamic linker to do run-time linking & resolving open references to those names to the location where the library is loaded in memory.
Note: While reverse engineering an executable, many tools refer to the symbol table to check what addresses have been assigned to global variables and known functions. If the symbol table has been stripped or cleaned out before being converted into an executable, tools will find it harder to determine addresses or understand anything about the program.
Dynamic Section & Dynamic linking with the ELF interpreter#
- First, the dynamic linker (contained within the interpreter) looks at the Dynamic section, whose address is stored in the Program Header.
- There it finds the NEEDED entries determining which libraries have to be loaded before the program can be run, the RELentries giving the address of the relocation tables, the VER entries which contain symbol versioning information, etc.
- So the dynamic linker loads the needed libraries and performs relocations (either directly at program startup or later(lazy resolution)).
- Finally, control is transferred to the address given by the symbol
_start
in the binary. Normally some gcc/glibc
startup code lives there, which in the end calls main()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| vishal@linuxmachine:~$ readelf -d /bin/bash
Dynamic section at offset 0xa0214 contains 22 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libncurses.so.5]
0x00000001 (NEEDED) Shared library: [libdl.so.2]
0x00000001 (NEEDED) Shared library: [libc.so.6]
0x0000000a (STRSZ) 29922 (bytes)
0x0000000b (SYMENT) 16 (bytes)
0x00000003 (PLTGOT) 0x80e92f0
0x00000002 (PLTRELSZ) 1448 (bytes)
0x00000014 (PLTREL) REL
0x00000017 (JMPREL) 0x805ad04
0x00000011 (REL) 0x805acc4
0x00000012 (RELSZ) 64 (bytes)
0x6ffffffe (VERNEED) 0x805ac34
0x6fffffff (VERNEEDNUM) 2
0x6ffffff0 (VERSYM) 0x8059d22
0x00000000 (NULL) 0x0
|
Procedure Linkage Table(.plt)#
- is a table of addresses
- resides in the text segment
- used to store the address of all functions needed at runtime (address not known at the time of linking)
- The PLT uses what is called lazy resolution. Means it resolves procedure address once when it calls method
- How PLT works -
- A function func is called and the compiler translates this to a call to func@plt(you can see this in a binary dump).
- The program jumps to the PLT. The PLT points to the GOT. If the function has not been previously called, the GOT points back into the PLT to a resolver routine, otherwise it points to the function itself.
- If the function has not been previously called, PLT resolves routine & update the GOT entry with the actual address of the function.
Global Offset Table(.got)#
- is a table of addresses
- resides in the data segment
- used to store the relative address of variables & procedures which is mapped with the absolute address
- How GOT works -
- If some instruction in the text segment, wants to refer to a variable it must normally use an absolute memory address.
- Instead of referring to the absolute memory address, it refers to the GOT, whose location is known.
NOTE: By this kind of address tables we can effectively use relocating of objects, with just updating entry each time relocation performed
Program loading in the kernel#
- The execution of a program starts inside the kernel, in the exec system call. There the file type is looked up and the appropriate handler is called.
- The
binfmt-elf
the handler then loads the ELF header and the Program Header, followed by lots of sanity checks. - The kernel then loads the parts specified in the LOAD directives in the Program Header into memory. If an INTERP entry is present, the interpreter is loaded too.
- Statically linked binaries can do without an interpreter; dynamically linked programs always need
/lib/ld-linux.so
as an interpreter, because it includes some startup code, loads shared libraries needed by the binary and performs relocations.
Finally, control can be transferred to the program, to the interpreter, if present, otherwise to the binary itself.
1
2
3
4
5
6
7
8
9
10
11
| vishal@linuxmachine:~$ readelf -l /bin/bash
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x00100 0x00100 R E 0x4
INTERP 0x000134 0x08048134 0x08048134 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0xa0200 0xa0200 R E 0x1000
LOAD 0x0a0200 0x080e9200 0x080e9200 0x04b44 0x09728 RW 0x1000
DYNAMIC 0x0a0214 0x080e9214 0x080e9214 0x000d8 0x000d8 RW 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4
...
|