I know it’s long, but please bear with me & have patience.
How do we launch our programs?
- Do you know how programs get runs behind the screen when you double-click on it or you type
./a.outon shell - As you know, the standard way to launch an application from a shell is to start terminal emulator application & just write the name of the program & pass or not arguments to our program, for example:
| |
Get In To Bash : /dev/tty
Sanity Checks
- So let’s start with the main function of
bashshell. If you will look on the source code of thebashshell, you will find the main function in theshell.csource code file which makes many different things before the main thread loop of the bash started to work. For example this function:
- checks and tries to open
/dev/tty - check that shell running in debug mode
- parse command-line arguments
- reads shell environment
- loads
.bashrc,.profileand other configuration files and many more.
Creating Environment
- After all of these operations, you can see the call of the
reader_loopfunction defined in theeval.cwhich reads the given program name & arguments, then it calls theexecute_commandfunction from theexecute_cmd.cwhich in turn calls following function chain which makes different checks like do we need to startsubshell, was it built-inbashfunction or not etc.
- In the end of this process, the
shell_execvefunction calls theexecvesystem call which has the following signature
| |
- Executes a program by the given filename, with the given arguments and environment variables. So, a user application (bash in our case) calls the system call & as we already know the next step is the Linux kernel.
Get Into Kernel: execve System Call
execve System Call Implementation
- This system call defined in the
fs/exec.csource code file & has following signature :
- Implementation of the
execveis pretty simple here, as we can see it just returns the result of thedo_execvefunction which initialize two pointers on a userspace data with the given arguments and environment variables & return the result of thedo_execveat_common.
We can see its implementation:
| |
- The
do_execveat_commonfunction takes a similar set of arguments, but having 2 extra arguments.
Sanity Checks
- The first argument
AT_FDCWDis the file descriptor of current directory & fifth argument is flags. which we will see later. do_execveat_commonfunction checks the filename pointer & returns if it isNULL.- After this it check flags of the current process that limit of running processes is not exceeded:
- If these two checks were successful we unset
PF_NPROC_EXCEEDEDflag in the flags of the current process to prevent failure of the execve. - In the next step we call the
unshare_filesfunction that defined in thekernel/fork.cand unshares the files of the current task and check the result of this function:
- We need to call this function to eliminate potential leak of the
execve'dbinary’s file descriptor. In the next step, we start preparation of thebprmthat represented by the structlinux_binprmstructure (defined in theinclude/linux/binfmts.hheader file).
Preparing Binary Parameter Struct
struct linux_binprm
- The
linux_binprmstructure is used to hold the arguments that are used when loading binaries. - For example it contains
vm_area_structwhich represents a single memory area over a contiguous interval in a given address space where our application will be loaded mmfield which is memory descriptor of the binary, pointer to the top of memory and many other different fields.
Allocating Memory
Preparing Credentials
- Initialization of the cred structure that stored inside of the
linux_binprmstructure contains the security context of a task, for example realuidof the task, realguidof the task,uidandguidfor the virtual file system operations etc. - In the next step, the call of the
check_unsafe_execfunction set the current process to thein_execvestate.
Set-up & Schedule Binary
- After all of these operations, we call the
do_open_execatfunction which
- Searches & opens executable file on disk & checks that,
- load a binary file fromnoexecmount points bypassed flag0(we need to avoid execute a binary from filesystems that do not contain executable binaries like proc or sysfs),
- initializefilestructure & returns pointer on this structure. - Next, we can see the call the
sched_execafter this. Thesched_execfunction is used to determine the least loaded processor that can execute the new program & to migrate the current process to it.
- After this, we need to check file descriptor of the give executable binary. We try to check does the name of the our binary file starts from the
/symbol or does the path of the given executable binary is interpreted relative to the current working directory of the calling process or in other words file descriptor isAT_FDCWD. - If one of these checks is successful we set the binary parameter filename:
- Otherwise, if the filename is empty we set the binary parameter filename to the
/dev/fd/%dor/dev/fd/%d/%sdepends on the filename of the given executable binary which means that we will execute the file to which the file descriptor refers:
| |
- Note that we set not only the
bprm->filenamebut alsobprm->interpthat will contain the name of the program interpreter. - For now we just write the same name there, but later it will be updated with the real name of the program interpreter depends on the binary format of a program.
Preparing Memory Related Info
- The
bprm_mm_initdefined in the same source code file initializesmm_structstructure & populate it with a temporary stackvm_area_structwhich is defined in theinclude/linux/mm_types.hheader file & represents address space of a process.
Counting Command Line Args & Environment Variables
- As you can see,
MAX_ARG_STRINGSis the upper limit macro defined in the header file represents maximum number of strings that were passed to theexecvesystem call. The value of theMAX_ARG_STRINGS:
| |
Reading Binary(ELF) File
- Now, the call of
prepare_binprmfunction fills thelinux_binprmstructure with theuidfrominodeand read128bytes from the binary executable file. We read only first128from the executable file because we need to check a type of our executable. We will read the rest of the executable file in the later step.
- After the preparation of the
linux_bprmstructure we copy the filename of the executable binary file, command-line arguments and environment variables to thelinux_bprmfrom the kernel with the call of thecopy_strings_kernelfunction:
- And set the pointer to the top of the new program’s stack that we set in the
bprm_mm_initfunctionbprm->exec = bprm->p; - The top of the stack will contain the program filename and we store this filename to the
execfield of thelinux_bprmstructure.
Processing Binary Parameter Struct
- Call to
exec_binprmfunction stores the pid from the namespace of the current task before it changes
- and call the:
search_binary_handler(bprm);
Which goes through the list of handlers that contains different binary formats. Currently the Linux kernel supports following binary formats:
binfmt_scriptsupport for interpreted scripts that start from the #! line;binfmt_misc- support different binary formats, according to runtime configuration of the Linux kernel;binfmt_elf- support elf format;binfmt_aout- support a.out format;binfmt_flat- support for flat format;binfmt_elf_fdpic- Support for elf FDPIC binaries;binfmt_em86- support for Intel elf binaries running on Alpha machines.So, the
search_binary_handlertries to call theload_binaryfunction and passlinux_binprmto it. If the binary handler supports the given executable file format, it starts to prepare the executable binary for execution:
- Where the
load_binaryfor example checks the magic number (each elf binary file contains magic number in the header) in thelinux_bprmbuffer (remember that we read first128bytes from the executable binary file) & exit if it is not elf binary:
Executing Binary
Sanity Checks
- If the given executable file is in elf format, the
load_elf_binarycontinues & checks the architecture and type of the executable file and exit if there is wrong architecture and executable file non-executable non shared:
Setup Process Address Space & Dependencies
- Tries to load the
program headertable that describessegments. Read the program interpreter and libraries that linked with the our executable binary file from disk and load it to memory.
- The program interpreter specified in the
.interpsection of the executable file (in most cases, linker is -/lib64/ld-linux-x86-64.so.2for thex86_64). - It setups the stack and map elf binary into the correct location in memory. It maps the bss and the brk sections and does many other different things to prepare executable file to execute.
- In the end of the execution of the
load_elf_binarywe call thestart_threadfunction and pass three arguments to it:
- These Arguments are:
- Set of registers for the new task
- Address of the entry point of the new task
- Address of the top of the stack for the new task
- As we can understand from the function’s name, it starts a new thread, but it is not so. The
start_threadfunction just prepares new task’s registers to be ready to run. Let’s look at the implementation of this function:
- As we can see the
start_threadfunction just makes a call of thestart_thread_commonfunction that will do all for us.
Put The Process On-Core
| |
- The
start_thread_commonfunction fillsfssegment register with zero andes&dswith the value of the data segment register. After this we set new values to the instruction pointer,cssegments etc. At the end of thestart_thread_commonfunction we can see theforce_iretmacro that force a system call return via iret instruction. - Ok, we prepared new thread to run in userspace and now we can return from the
exec_binprmand now we are in thedo_execveat_commonagain. After theexec_binprmwill finish its execution we release memory for structures that were allocated before and return. - After we returned from the
execvesystem call handler, execution of our program will be started. We can do it, because all context-related information already configured for this purpose. - As we saw the
execvesystem call does not return control to a process, but code, data and other segments of the caller process are just overwritten of the program segments. - The exit from our application will be implemented through the exit system call.
And we are done with execution