So that we're on the same page, I'll try to explain what I think I know about Symbol Relocation and ask questions as they come up.
Symbol Relocation:
- Absolute addresses – replace the reference for the actual address. This happens when a global variable we need to reference is in the .data section of an executable object file. The downside of this is that when the reference is to a variable or function in a static library (.a file), whenever we update the static library, we’d have to re-link all of our object files to get that new address in the .data file. (Essentially, parts of the .a file are carved into the executable, hence the recreation of the executable, or the relink.)
- PC-relative address – tells the program counter to jump a certain number of bytes. This useful when functions are near each other and is useful in creating PIC code. The idea is that some things in the .data and some things in the .write sections stay the same distance apart, including the two sections themselves, so we could refer jump a certain number of bytes to refer to that code we need.
Question 1: Couldn't we simply use absolute addressing in such cases if we’re just using static libraries? For example: callq 0xaddress
.
Question 2: Say a PC relative address of 5 is calculated. The assembly increments the PC by 5 at a certain point. Is this to say we cannot use absolute addresses, say 0x480480 because then the PC would get incremented by 0x480480? In other words, function calls need PC-type instructions to call them.
Question 3: But doesn’t the assembly callq
take the address of a function?
(Potential) Answers for 1-3: Absolute addresses are computed at link time, while PC-relative addresses are computed at compile this, which saves time. We use PC-relative addresses only in the case functions that stay near other functions, not anywhere else. We could use absolute addressing here, but that might be worse (in terms of time).
- GOT – included in any module that references variables in a .so file (a shared library). This is "program independent code" (PIC) because .text sections refer to the entries in the GOT, which refer to absolute addresses load time. It is used specifically in the case of shared libraries because we don’t want to keep the library information with the executable in order to update the shared library as we wish.
Question 4: Isn’t this the same if we the did not use a GOT? Since entries in the GOT get relocated anyway, what is the purpose of using it? Our code gets changed. If the benefit is we do not have to recreate our executables and just defer everything to load time, what’s the benefit of static linking?
- PLT+GOT – included in any module that refers to procedures (functions) in .so files. The idea is to “defer binding of each procedure address until the first time the procedure is called” (i.e., put the address of the procedure in the GOT until the first time the procedure is called). For example, if we have a call to a function defined in a .so, we call to PLT, the first time* calling the dynamic linker to put the address of the function in the GOT and the succeeding times calling to PLT (still) but using the new address in the GOT. (*The first time: GOT holds a special address to start the execution of instructions that will get the address of the function we need and put it in the GOT.)
Question 5: Why don’t we just keep the addresses of the functions in GOT like how we did for variables? Is it just to save a little bit of OS time?
Question 6: And if the GOT+PLT combination is good, why not use it for all our purposes?