Update: I have another post that covers more ground (ELF loading, kernel module loading etc): http://lastweek.io/notes/dynamic_linking/.
csu/libc-start.c
__libc_start_main()
is the entry point. Inside, it will call__libc_csu_init()
. Then it will call user'smain()
.- Great reference: Linux x86 Program Start Up. I print a PDF copy in this repo.
- ELF's
.interp
section points to the dynamic linker, and here it is. - Related code:
elf/rtld.c
,sysdep/generic
,sysdep/x86_64/
, and more - Inside
dl_main()
, you can see howLD_PRELOAD
is handled. GOT[1]
contains address of thelink_map
data structure.GOT[2]
points to_dl_runtime_resolve()
! This is the runtime dynamic linker entry point.
File sysdep/generic/dl-machine.c
populates GOT[1]
and GOT[2]
.
/* Set up the loaded object described by L so its unrelocated PLT
entries will jump to the on-demand fixup code in dl-runtime.c. */
static inline int
elf_machine_runtime_setup (struct link_map *l, int lazy)
{
extern void _dl_runtime_resolve (Elf32_Word);
if (lazy)
{
/* The GOT entries for functions in the PLT have not yet been filled
in. Their initial contents will arrange when called to push an
offset into the .rel.plt section, push _GLOBAL_OFFSET_TABLE_[1],
and then jump to _GLOBAL_OFFSET_TABLE[2]. */
Elf32_Addr *got = (Elf32_Addr *) D_PTR (l, l_info[DT_PLTGOT]);
got[1] = (Elf32_Addr) l; /* Identify this shared object. */
/* This function will get called to fix up the GOT entry indicated by
the offset on the stack, and then jump to the resolved address. */
got[2] = (Elf32_Addr) &_dl_runtime_resolve;
}
return lazy;
}
_dl_runtime_resolve()
is architecture specific and has a mix of assembly and C code.
The flow is similar to the syscall handling: it first saves the registers,
then calling the actual resolver, then restore all saved registers.
For 64bit x86, the source code is in sysdeps/x86_64/dl-trampoline.h
:
.globl _dl_runtime_resolve
.type _dl_runtime_resolve, @function
_dl_runtime_resolve:
...
...
# Copy args pushed by PLT in register.
# %rdi: link_map, %rsi: reloc_index
mov (LOCAL_STORAGE_AREA + 8)(%BASE), %RSI_LP
mov LOCAL_STORAGE_AREA(%BASE), %RDI_LP
call _dl_fixup # Call resolver.
mov %RAX_LP, %R11_LP # Save return value
...
Bingo, _dl_fixup()
is the final piece of the runtime dynamic linker resolver. We could find it in elf/dl-runtime.c
, which is a file for on-demand PLT fixup.:
/* This function is called through a special trampoline from the PLT the
first time each PLT entry is called. We must perform the relocation
specified in the PLT of the given shared object, and return the resolved
function address to the trampoline, which will restart the original call
to that address. Future calls will bounce directly from the PLT to the
function. */
DL_FIXUP_VALUE_TYPE
attribute_hidden __attribute ((noinline)) ARCH_FIXUP_ATTRIBUTE
_dl_fixup (
# ifdef ELF_MACHINE_RUNTIME_FIXUP_ARGS
ELF_MACHINE_RUNTIME_FIXUP_ARGS,
# endif
struct link_map *l, ElfW(Word) reloc_arg)
{
...
}
Understanding this piece of code requires some effort. Happy hacking!
Most recent ELF produced by GCC is slightly different than
the ones described by previous textbook or papers.
The difference is small, though. You should use man elf
to check latest.
- When a program imports a certain function or variable, the linker
will include a string with the function or variable’s name in the
.dynstr
section. - A symbol (Elf Sym) that refers to the function or variable's name in the
.dynsym
section, and a relocation (Elf Rel) pointing to that symbol in the.rela.plt
section. .rela.dyn
and.rela.plt
are for imported variables and functions, respectively..plt
is the normal one, it has instructions..got
and.got.plt
maybe the first is for variable, and the latter is for function. But essentially the same global offset table functionality.
Relationship among .dynstr
, .dynsym
, .rela.dyn
or .rela.plt
. Credit: link:
PIC Lazy Binding. Credit: link:
Also that nowadays, even an non-PIC binary will always have GOT and PLT sections. In theory, it probably should use load-time relocation. I suspect GOT and PLT are adopted for the following 2 reasons: a) load-time relocation needs to modify code and this not good during time. Especially considering code section probably is not writable. b) GOT/PLT's lazy-binding has performance win at start-up time. However, keep in mind that GOT/PLT's lazy-bindling pay extra runtime cost!
Reading: