This is an ongoing series of posts on ELF Binary Relocations and Thread Local Storage. This article covers only Thread Local Storage and assumes the reader has had a primer in ELF Relocations, if not please start with my previous article ELF Binaries and Relocation Entries.
This is the third part in an illustrated 3 part series covering:
- ELF Binaries and Relocation Entries
- Thread Local Storage
- How Relocations and Thread Local Store are implemented
In the last article we covered how Thread Local Storage (TLS) works at runtime, but how do we get there? How does the compiler and linker create the memory structures and code fragments described in the previous article?
In this article we will discuss how TLS relocations are is implemented. Our outline:
As before, the examples in this article can be found in my tls-examples project. Please check it out.
The GNU toolchain
I will assume here that most people understand what a compiler and assembler basically do. In the sense that compiler will compile routines written C code or something similar to assembly language. It is then up to the assembler to turn that assembly code into machine code to run on a CPU.
That is a big part of what a toolchain does, and it’s pretty much that simple if we have a single file of source code. But usually we don’t have a single file, we have the multiple files, the c runtime, crt0 and other libraries like libc. These all need to be put together into our final program, that is where the complexities of the linker comes in.
In this article I will cover how variables in our source code (symbols) traverse the toolchain from code to the memory in our final running program. A picture that looks something like this:
The Compiler
First we start off with how relocations are created and emitted in the compiler.
As I work primarily on the GNU toolchain with it’s GCC compiler we will look at that, let’s get started.
GCC Legitimize Address
To start we define a symbol as named address in memory. This address can be a program variable where data is stored or function reference to where a subroutine starts.
In GCC we have have TARGET_LEGITIMIZE_ADDRESS
, the OpenRISC implementation
being or1k_legitimize_address().
It takes a symbol (memory address) and makes it usable in our CPU by generating RTX
sequences that are possible on our CPU to load that address into a register.
RTX represents a tree node in GCC’s register transfer language (RTL). The RTL Expression is used to express our algorithm as a series of register transfers. This is used as register transfer is basically what a CPU does.
A snippet from legitimize_address()
function is below. The argument x
represents our input symbol (memory address) that we need to make usable by our
CPU. This code uses GCC internal API’s to emit RTX code sequences.
static rtx
or1k_legitimize_address (rtx x, rtx /* unused */, machine_mode /* unused */)
...
case TLS_MODEL_NONE:
t1 = can_create_pseudo_p () ? gen_reg_rtx (Pmode) : scratch;
if (!flag_pic)
{
emit_insn (gen_rtx_SET (t1, gen_rtx_HIGH (Pmode, x)));
return gen_rtx_LO_SUM (Pmode, t1, x);
}
else if (is_local)
{
crtl->uses_pic_offset_table = 1;
t2 = gen_sym_unspec (x, UNSPEC_GOTOFF);
emit_insn (gen_rtx_SET (t1, gen_rtx_HIGH (Pmode, t2)));
emit_insn (gen_add3_insn (t1, t1, pic_offset_table_rtx));
return gen_rtx_LO_SUM (Pmode, t1, copy_rtx (t2));
}
else
{
...
We can read the code snippet above as follows:
- This is for the non
TLS
case as we seeTLS_MODEL_NONE
. - We reserve a temporary register
t1
. - If not using Position-independent code (
flag_pic
) we do:- Emit an instruction to put the high bits of
x
into our temporary registert1
. - Return the sum of
t1
and the low bits ofx
.
- Emit an instruction to put the high bits of
- Otherwise if the symbol is static (
is_local
) we do:- Mark the global state that this object file uses the
uses_pic_offset_table
. - We create a Global Offset Table offset variable
t2
. - Emit an instruction to put the high bits of
t2
(the GOT offset) into out temporary registert1
. - Emit an instruction to put the sum of
t1
(high bits oft2) and the GOT into
t1`. - Return the sum of
t1
and the low bits oft1
.
- Mark the global state that this object file uses the
You may have noticed that the local symbol still used the global offset table (GOT). This is because Position-idependent code requires using the GOT to reference symbols.
An example, from nontls.c:
static int x;
int *get_x_addr() {
return &x;
}
Example of the non pic case above, when we look at the assembly code generated by GCC we can see the following:
.file "nontls.c"
.section .text
.local x
.comm x,4,4
.align 4
.global get_x_addr
.type get_x_addr, @function
get_x_addr:
l.addi r1, r1, -8 # \
l.sw 0(r1), r2 # | function prologue
l.addi r2, r1, 8 # |
l.sw 4(r1), r9 # /
l.movhi r17, ha(x) # \__ legitimize address of x into r17
l.addi r17, r17, lo(x) # /
l.or r11, r17, r17 # } place result in return register r11
l.lwz r2, 0(r1) # \
l.lwz r9, 4(r1) # | function epilogue
l.addi r1, r1, 8 # |
l.jr r9 # |
l.nop # /
.size get_x_addr, .-get_x_addr
.ident "GCC: (GNU) 9.0.1 20190409 (experimental)"
Example of the local pic case above the same code compiled with the -fPIC
GCC option
looks like the following:
.file "nontls.c"
.section .text
.local x
.comm x,4,4
.align 4
.global get_x_addr
.type get_x_addr, @function
get_x_addr:
l.addi r1, r1, -8 # \
l.sw 0(r1), r2 # | function prologue
l.addi r2, r1, 8 # |
l.sw 4(r1), r9 # /
l.jal 8 # \
l.movhi r19, gotpchi(_GLOBAL_OFFSET_TABLE_-4) # | PC relative, put
l.ori r19, r19, gotpclo(_GLOBAL_OFFSET_TABLE_+0) # | GOT into r19
l.add r19, r19, r9 # /
l.movhi r17, gotoffha(x) # \
l.add r17, r17, r19 # | legitimize address of x into r17
l.addi r17, r17, gotofflo(x) # /
l.or r11, r17, r17 # } place result in return register r11
l.lwz r2, 0(r1) # \
l.lwz r9, 4(r1) # | function epilogue
l.addi r1, r1, 8 # |
l.jr r9 # |
l.nop # /
.size get_x_addr, .-get_x_addr
.ident "GCC: (GNU) 9.0.1 20190409 (experimental)"
TLS and Addend cases are also handled by or1k_legitimize_address()
.
GCC Print Operand
Once RTX is generated by legitimize address and GCC passes
run all of their optimizations the RTX needs to be printed out as assembly code. During
this process relocations are printed by GCC macros TARGET_PRINT_OPERAND_ADDRESS
and TARGET_PRINT_OPERAND
. In OpenRISC these defined
by or1k_print_operand_address()
and or1k_print_operand().
Let us have a look at or1k_print_operand_address()
.
/* Worker for TARGET_PRINT_OPERAND_ADDRESS.
Prints the argument ADDR, an address RTX, to the file FILE. The output is
formed as expected by the OpenRISC assembler. Examples:
RTX OUTPUT
(reg:SI 3) 0(r3)
(plus:SI (reg:SI 3) (const_int 4)) 0x4(r3)
(lo_sum:SI (reg:SI 3) (symbol_ref:SI ("x"))) lo(x)(r3) */
static void
or1k_print_operand_address (FILE *file, machine_mode, rtx addr)
{
rtx offset;
switch (GET_CODE (addr))
{
case REG:
fputc ('0', file);
break;
case ...
case LO_SUM:
offset = XEXP (addr, 1);
addr = XEXP (addr, 0);
print_reloc (file, offset, 0, RKIND_LO);
break;
default: ...
}
fprintf (file, "(%s)", reg_names[REGNO (addr)]);
}
The above code snippet can be read as we explain below, but let’s first make some notes:
- The input RTX
addr
forTARGET_PRINT_OPERAND_ADDRESS
will usually contain a register and an offset typically this is used for LOAD and STORE operations. - Think of the RTX
addr
as a node in an AST. - The RTX node with code
REG
andSYMBOL_REF
are always leaf nodes.
With that, and if we use the or1k_print_operand_address()
c comments above as examples
of some RTX addr
input we will have:
RTX | (reg:SI 3) (lo_sum:SI (reg:SI 3) (symbol_ref:SI("x")))
-----------+--------------------------------------------------------------------
TREE |
(code) | (code:REG regno:3) (code:LO_SUM)
/ \ | / \
(0) (1) | (code:REG regno:3) (code:SYMBOL_REF "x")
We can now read the above snippet as:
- First get the
CODE
of the RTX.- If
CODE
isREG
(a register) than our offset can be0
. - If
IS
isLO_SUM
(an addition operation) then we need to break it down to:- Arg
0
is our newaddr
RTX (which we assume is a register) - Arg
1
is an offset (which we then print with print_reloc())
- Arg
- If
- Second print out the register name now in
addr
i.e. “r3”.
The code of or1k_print_operand()
is similar and the reader may be inclined to
read more details. With that we can move on to the assembler.
TLS cases are also handled inside of the print_reloc()
function.
The Assembler
In the GNU Toolchain our assembler is GAS, part of binutils.
The code that handles relocations is found in the function
parse_reloc()
found in opcodes/or1k-asm.c
. The function parse_reloc()
is the direct counterpart of GCC’s print_reloc()
discussed above. This is actually part of or1k_cgen_parse_operand()
which is wired into our assembler generator CGEN used for parsing operands.
If we are parsing a relocation like the one from above lo(x)
then we can
isolate the code that processes that relocation.
static const bfd_reloc_code_real_type or1k_imm16_relocs[][6] = {
{ BFD_RELOC_LO16,
BFD_RELOC_OR1K_SLO16,
...
BFD_RELOC_OR1K_TLS_LE_AHI16 },
};
static int
parse_reloc (const char **strp)
{
const char *str = *strp;
enum or1k_rclass cls = RCLASS_DIRECT;
enum or1k_rtype typ;
...
else if (strncasecmp (str, "lo(", 3) == 0)
{
str += 3;
typ = RTYPE_LO;
}
...
*strp = str;
return (cls << RCLASS_SHIFT) | typ;
}
This uses strncasecmp to match
our "lo("
string pattern. The returned result is a relocation type and relocation class
which are use to lookup the relocation BFD_RELOC_LO16
in the or1k_imm16_relocs[][]
table
which is indexed by relocation class and relocation class.
The assembler will encode that into the ELF binary. For TLS relocations the exact same pattern is used.
The Linker
In the GNU Toolchain our object linker is the GNU linker LD, also part of the binutils project.
The GNU linker uses the framework BFD or Binary File Descriptor which is a beast. It is not only used in the linker but also used in GDB, the GNU Simulator and the objdump tool.
What makes this possible is a rather complex API.
BFD Linker API
The BFD API is a generic binary file access API. It has been designed to support multiple file formats and architectures via an object oriented, polymorphic API all written in c. It supports file formats including a.out, COFF and ELF as well as unexpected file formats like verilog hex memory dumps.
Here we will concentrate on the BFD ELF implementation.
The API definition is split across multiple files which include:
- bfd/bfd-in.h - top level generic APIs including
bfd_hash_table
- bfd/bfd-in2.h - top level binary file APIs including
bfd
andasection
- include/bfdlink.h - generic bfd linker APIs including
bfd_link_info
andbfd_link_hash_table
- bfd/elf-bfd.h - extensions to the APIs for ELF binaries including
elf_link_hash_table
bfd/elf{wordsize}-{architecture}.c
- architecture specific implementations
For each architecture implementations are defined in bfd/elf{wordsize}-{architecture}.c
. For
example for OpenRISC we have
bfd/elf32-or1k.c.
Throughout the linker code we see access to the BFD Linker and ELF APIs. Some key symbols to watch out for include:
info
- A reference tobfd_link_info
top level reference to all linker state.htab
- A pointer toelf_or1k_link_hash_table
fromor1k_elf_hash_table (info)
, a hash table on steroids which stores generic link state and arch specific state, it’s also a hash table of all global symbols by name, contains:htab->root.splt
- the output.plt
sectionhtab->root.sgot
- the output.got
sectionhtab->root.srelgot
- the output.relgot
section (relocations against the got)htab->root.sgotplt
- the output.gotplt
sectionhtab->root.dynobj
- a specialbfd
to which sections are added (created inor1k_elf_check_relocs
)
sym_hashes
- Fromelf_sym_hashes (abfd)
a list of for global symbols in abfd
indexed by the relocation indexELF32_R_SYM (rel->r_info)
.h
- A pointer to astruct elf_link_hash_entry
, represents link state of a global symbol, contains:h->got
- A union of different attributes with different roles based on link phase.h->got.refcount
- used during phase 1 to count the symbol.got
section referencesh->got.offset
- used during phase 2 to record the symbol.got
section offseth->plt
- A union with the same function ash->got
but used for the.plt
section.h->root.root.string
- The symbol name
local_got
- an array ofunsigned long
fromelf_local_got_refcounts (ibfd)
with the same function toh->got
but for local symbols, the function of theunsigned long
is changed base on the link phase. Ideally this should also be a union.tls_type
- Retrieved by((struct elf_or1k_link_hash_entry *) h)->tls_type
used to store thetls_type
of a global symbol.local_tls_type
- Retrieved byelf_or1k_local_tls_type(abfd)
entry to storetls_type
for local symbols, whenh
isNULL
.root
- The struct fieldroot
is used in subclasses to represent the parent class, similar to howsuper
is used in other languages.
Putting it all together we have a diagram like the following:
Now that we have a bit of understanding of the data structures we can look to the link algorithm.
The link process in the GNU Linker can be thought of in phases.
Phase 1 - Book Keeping (check_relocs)
The or1k_elf_check_relocs()
function is called during the first phase to
do book keeping on relocations. The function signature looks like:
static bfd_boolean
or1k_elf_check_relocs (bfd *abfd,
struct bfd_link_info *info,
asection *sec,
const Elf_Internal_Rela *relocs)
#define elf_backend_check_relocs or1k_elf_check_relocs
The arguments being:
abfd
- The current elf object file we are working oninfo
- The BFD APIsec
- The current elf section we are working onrelocs
- The relocations from the current section
It does the book keeping by looping over relocations for the provided section and updating the local and global symbol properties.
For local symbols:
...
else
{
unsigned char *local_tls_type;
/* This is a TLS type record for a local symbol. */
local_tls_type = (unsigned char *) elf_or1k_local_tls_type (abfd);
if (local_tls_type == NULL)
{
bfd_size_type size;
size = symtab_hdr->sh_info;
local_tls_type = bfd_zalloc (abfd, size);
if (local_tls_type == NULL)
return FALSE;
elf_or1k_local_tls_type (abfd) = local_tls_type;
}
local_tls_type[r_symndx] |= tls_type;
}
...
else
{
bfd_signed_vma *local_got_refcounts;
/* This is a global offset table entry for a local symbol. */
local_got_refcounts = elf_local_got_refcounts (abfd);
if (local_got_refcounts == NULL)
{
bfd_size_type size;
size = symtab_hdr->sh_info;
size *= sizeof (bfd_signed_vma);
local_got_refcounts = bfd_zalloc (abfd, size);
if (local_got_refcounts == NULL)
return FALSE;
elf_local_got_refcounts (abfd) = local_got_refcounts;
}
local_got_refcounts[r_symndx] += 1;
}
The above is pretty straight forward and we can read as:
- First part is for storing local symbol
TLS
type information:- If the
local_tls_type
array is not initialized:- Allocate it, 1 entry for each local variable
- Record the TLS type in
local_tls_type
for the current symbol
- If the
- Second part is for recording
.got
section references:- If the
local_got_refcounts
array is not initialized:- Allocate it, 1 entry for each local variable
- Record a reference by incrementing
local_got_refcounts
for the current symbol
- If the
For global symbols, it’s much more easy we see:
...
if (h != NULL)
((struct elf_or1k_link_hash_entry *) h)->tls_type |= tls_type;
else
...
if (h != NULL)
h->got.refcount += 1;
else
...
As the tls_type
and refcount
fields are available directly on each
hash_entry
handling global symbols is much easier.
- First part is for storing
TLS
type information:- Record the TLS type in
tls_type
for the currenthash_entry
- Record the TLS type in
- Second part is for recording
.got
section references:- Record a reference by incrementing
got.refcounts
for thehash_entry
- Record a reference by incrementing
The above is repeated for all relocations and all input sections. A few other
things are also done including accounting for .plt
entries.
Phase 2 - creating space (size_dynamic_sections + _bfd_elf_create_dynamic_sections)
The or1k_elf_size_dynamic_sections()
function iterates over all input object files to calculate the size required for
output sections. The _bfd_elf_create_dynamic_sections()
function does the
actual section allocation, we use the generic version.
Setting up the sizes of the .got
section (global offset table) and .plt
section (procedure link table) is done here.
The definition is as below:
static bfd_boolean
or1k_elf_size_dynamic_sections (bfd *output_bfd ATTRIBUTE_UNUSED,
struct bfd_link_info *info)
#define elf_backend_size_dynamic_sections or1k_elf_size_dynamic_sections
#define elf_backend_create_dynamic_sections _bfd_elf_create_dynamic_sections
The arguments to or1k_elf_size_dynamic_sections()
being:
output_bfd
- Unused, the output elf objectinfo
- the BFD API which provides access to everything we need
Internally the function uses:
htab
- fromor1k_elf_hash_table (info)
htab->root.dynamic_sections_created
-true
if sections like.interp
have been created by the linker
ibfd
- abfd
pointer frominfo->input_bfds
, represents an input object when iterating.s->size
- represents the output.got
section size, which we will be incrementing.srel->size
- represents the output.got.rela
section size, which will contain relocations against the.got
section
During the first part of phase 2 we set .got
and .got.rela
section sizes
for local symbols with this code:
/* Set up .got offsets for local syms, and space for local dynamic
relocs. */
for (ibfd = info->input_bfds; ibfd != NULL; ibfd = ibfd->link.next)
{
...
local_got = elf_local_got_refcounts (ibfd);
if (!local_got)
continue;
symtab_hdr = &elf_tdata (ibfd)->symtab_hdr;
locsymcount = symtab_hdr->sh_info;
end_local_got = local_got + locsymcount;
s = htab->root.sgot;
srel = htab->root.srelgot;
local_tls_type = (unsigned char *) elf_or1k_local_tls_type (ibfd);
for (; local_got < end_local_got; ++local_got)
{
if (*local_got > 0)
{
unsigned char tls_type = (local_tls_type == NULL)
? TLS_UNKNOWN
: *local_tls_type;
*local_got = s->size;
or1k_set_got_and_rela_sizes (tls_type, bfd_link_pic (info),
&s->size, &srel->size);
}
else
*local_got = (bfd_vma) -1;
if (local_tls_type)
++local_tls_type;
}
}
Here, for example, we can see we iterate over each input elf object ibfd
and
each local symbol (local_got
) we try and update s->size
and srel->size
to
account for the required size.
The above can be read as:
- For each
local_got
entry:- If the local symbol is used in the
.got
section:- Get the
tls_type
byte stored in thelocal_tls_type
array - Set the offset
local_got
to the section offsets->size
, that is used in phase 3 to tell us where we need to write the symbol into the.got
section. - Update
s->size
andsrel->size
usingor1k_set_got_and_rela_sizes()
- Get the
- If the local symbol is used in the
- If the local symbol is not used in the
.got
section:- Set the offset
local_got
to the-1
, to indicate not used
- Set the offset
In the next part of phase 2 we allocate space for all global symbols by
iterating through symbols in htab
with the allocate_dynrelocs
iterator. To
do that we call:
elf_link_hash_traverse (&htab->root, allocate_dynrelocs, info);
Inside allocate_dynrelocs()
we record the space used for relocations and
the .got
and .plt
sections. Example:
if (h->got.refcount > 0)
{
asection *sgot;
bfd_boolean dyn;
unsigned char tls_type;
...
sgot = htab->root.sgot;
h->got.offset = sgot->size;
tls_type = ((struct elf_or1k_link_hash_entry *) h)->tls_type;
dyn = htab->root.dynamic_sections_created;
dyn = WILL_CALL_FINISH_DYNAMIC_SYMBOL (dyn, bfd_link_pic (info), h);
or1k_set_got_and_rela_sizes (tls_type, dyn,
&sgot->size, &htab->root.srelgot->size);
}
else
h->got.offset = (bfd_vma) -1;
The above, with h
being our global symbol, a pointer to struct elf_link_hash_entry
,
can be read as:
- If the symbol will be in the
.got
section:- Get the global reference to the
.got
section and put it insgot
- Set the got location
h->got.offset
for the symbol to the current got section sizehtab->root.sgot
. - Set
dyn
totrue
if we will be doing a dynamic link. - Call
or1k_set_got_and_rela_sizes()
to update the sizes for the.got
and.got.rela
sections.
- Get the global reference to the
- If the symbol is going to be in the
.got
section:- Set the got location
h->got.offset
to-1
- Set the got location
The function or1k_set_got_and_rela_sizes()
used above is used to increment
.got
and .rela
section sizes accounting for if these are TLS symbols, which
need additional entries and relocations.
Phase 3 - linking (relocate_section)
The or1k_elf_relocate_section()
function is called to fill in the relocation holes in the output binary .text
section. It does this by looping over relocations and writing to the .text
section the correct symbol value (memory address). It also updates other output
binary sections like the .got
section. Also, for dynamic executables and
libraries new relocations may be written to .rela
sections.
The function signature looks as follows:
static bfd_boolean
or1k_elf_relocate_section (bfd *output_bfd,
struct bfd_link_info *info,
bfd *input_bfd,
asection *input_section,
bfd_byte *contents,
Elf_Internal_Rela *relocs,
Elf_Internal_Sym *local_syms,
asection **local_sections)
#define elf_backend_relocate_section or1k_elf_relocate_section
The arguments to or1k_elf_relocate_sectioni()
being:
output_bfd
- the output elf object we will be writing toinfo
- the BFD API which provides access to everything we needinput_bfd
- the current input elf object being iterated overinput_section
the current.text
section in the input elf object being iterated over. From here we get.text
section output details for pc relative relocations:input_section->output_section->vma
- the location of the output section.input_section->output_offset
- the output offset
contents
- the output file buffer we will write torelocs
- relocations from the current input sectionlocal_syms
- an array of local symbols used to get therelocation
value for local symbolslocal_sections
- an array input sections for local symbols, used to get therelocation
value for local symbols
Internally the function uses:
- or1k_elf_howto_table - not
mentioned until now, but an array of
howto
structs indexed by relocation enum. Thehowto
struct expresses the algorithm required to update the relocation. relocation
- abfd_vma
the value of the relocation symbol (memory address) to be written to the output file. in the output file that needs to be updated for the relocation.value
- the value that needs to be written to the relocation location.
During the first part of relocate_section
we see:
if (r_symndx < symtab_hdr->sh_info)
{
sym = local_syms + r_symndx;
sec = local_sections[r_symndx];
relocation = _bfd_elf_rela_local_sym (output_bfd, sym, &sec, rel);
name = bfd_elf_string_from_elf_section
(input_bfd, symtab_hdr->sh_link, sym->st_name);
name = name == NULL ? bfd_section_name (sec) : name;
}
else
{
bfd_boolean unresolved_reloc, warned, ignored;
RELOC_FOR_GLOBAL_SYMBOL (info, input_bfd, input_section, rel,
r_symndx, symtab_hdr, sym_hashes,
h, sec, relocation,
unresolved_reloc, warned, ignored);
name = h->root.root.string;
}
This can be read as:
- If the current symbol is a local symbol:
- We initialize
relocation
to the local symbol value using_bfd_elf_rela_local_sym()
.
- We initialize
- Otherwise the current symbol is global:
- We use the
RELOC_FOR_GLOBAL_SYMBOL()
macro to initializerelocation
.
- We use the
During the next part we use the howto
information to update the relocation
value, and also
add relocations to the output file. For example:
case R_OR1K_TLS_GD_HI16:
case R_OR1K_TLS_GD_LO16:
case R_OR1K_TLS_GD_PG21:
case R_OR1K_TLS_GD_LO13:
case R_OR1K_TLS_IE_HI16:
case R_OR1K_TLS_IE_LO16:
case R_OR1K_TLS_IE_PG21:
case R_OR1K_TLS_IE_LO13:
case R_OR1K_TLS_IE_AHI16:
{
bfd_vma gotoff;
Elf_Internal_Rela rela;
asection *srelgot;
bfd_byte *loc;
bfd_boolean dynamic;
int indx = 0;
unsigned char tls_type;
srelgot = htab->root.srelgot;
/* Mark as TLS related GOT entry by setting
bit 2 to indcate TLS and bit 1 to indicate GOT. */
if (h != NULL)
{
gotoff = h->got.offset;
tls_type = ((struct elf_or1k_link_hash_entry *) h)->tls_type;
h->got.offset |= 3;
}
else
{
unsigned char *local_tls_type;
gotoff = local_got_offsets[r_symndx];
local_tls_type = (unsigned char *) elf_or1k_local_tls_type (input_bfd);
tls_type = local_tls_type == NULL ? TLS_NONE
: local_tls_type[r_symndx];
local_got_offsets[r_symndx] |= 3;
}
/* Only process the relocation once. */
if ((gotoff & 1) != 0)
{
gotoff += or1k_initial_exec_offset (howto, tls_type);
/* The PG21 and LO13 relocs are pc-relative, while the
rest are GOT relative. */
relocation = got_base + (gotoff & ~3);
if (!(r_type == R_OR1K_TLS_GD_PG21
|| r_type == R_OR1K_TLS_GD_LO13
|| r_type == R_OR1K_TLS_IE_PG21
|| r_type == R_OR1K_TLS_IE_LO13))
relocation -= got_sym_value;
break;
}
...
/* Static GD. */
else if ((tls_type & TLS_GD) != 0)
{
bfd_put_32 (output_bfd, 1, sgot->contents + gotoff);
bfd_put_32 (output_bfd, tpoff (info, relocation, dynamic),
sgot->contents + gotoff + 4);
}
gotoff += or1k_initial_exec_offset (howto, tls_type);
...
/* Static IE. */
else if ((tls_type & TLS_IE) != 0)
bfd_put_32 (output_bfd, tpoff (info, relocation, dynamic),
sgot->contents + gotoff);
/* The PG21 and LO13 relocs are pc-relative, while the
rest are GOT relative. */
relocation = got_base + gotoff;
if (!(r_type == R_OR1K_TLS_GD_PG21
|| r_type == R_OR1K_TLS_GD_LO13
|| r_type == R_OR1K_TLS_IE_PG21
|| r_type == R_OR1K_TLS_IE_LO13))
relocation -= got_sym_value;
}
break;
Here we process the relocation for TLS General Dynamic and Initial Exec relocations. I have trimmed out the shared cases to save space.
This can be read as:
- Get a reference to the output relocation section
sreloc
. - Get the got offset which we setup during phase 3 for global or local symbols.
- Mark the symbol as using a TLS got entry, this
offset |= 3
trick is possible because on 32-bit machines we have 2 lower bits free. This is used during phase 4. - If we have already processed this symbol once:
- Update
relocation
to the location in the output.got
section and break, we only need to create.got
entries 1 time
- Update
- Otherwise populate
.got
section entries- For General Dynamic
- Put 2 entries into the output elf object
.got
section, a literal1
and the thread pointer offset
- Put 2 entries into the output elf object
- For Initial Exec
- Put 1 entry into the output elf object
.got
section, the thread pointer offset
- Put 1 entry into the output elf object
- For General Dynamic
- Finally update the
relocation
to the location in the output.got
section
In the last part of the loop we write the relocation
value to the output
.text
section. This is done with the or1k_final_link_relocate()
function.
r = or1k_final_link_relocate (howto, input_bfd, input_section, contents,
rel->r_offset, relocation + rel->r_addend);
With this the .text
section is complete.
Phase 4 - finishing up (finish_dynamic_symbol + finish_dynamic_sections)
During phase 3 above we wrote the .text
section out to file. During the
final finishing up phase we need to write the remaining sections. This
includes the .plt
section an more writes to the .got
section.
This also includes the .plt.rela
and .got.rela
sections which contain
dynamic relocation entries.
Writing of the data sections is handled by or1k_elf_finish_dynamic_sections() and writing of the relocation sections is handled by or1k_elf_finish_dynamic_symbol(). These are defined as below.
static bfd_boolean
or1k_elf_finish_dynamic_sections (bfd *output_bfd,
struct bfd_link_info *info)
static bfd_boolean
or1k_elf_finish_dynamic_symbol (bfd *output_bfd,
struct bfd_link_info *info,
struct elf_link_hash_entry *h,
Elf_Internal_Sym *sym)
#define elf_backend_finish_dynamic_sections or1k_elf_finish_dynamic_sections
#define elf_backend_finish_dynamic_symbol or1k_elf_finish_dynamic_symbol
A snippet for the or1k_elf_finish_dynamic_sections()
shows how when writing to
the .plt
section assembly code needs to be injected. This is where the first
entry in the .plt
section is written.
else if (bfd_link_pic (info))
{
plt0 = OR1K_LWZ(15, 16) | 8; /* .got+8 */
plt1 = OR1K_LWZ(12, 16) | 4; /* .got+4 */
plt2 = OR1K_NOP;
}
else
{
unsigned ha = ((got_addr + 0x8000) >> 16) & 0xffff;
unsigned lo = got_addr & 0xffff;
plt0 = OR1K_MOVHI(12) | ha;
plt1 = OR1K_LWZ(15,12) | (lo + 8);
plt2 = OR1K_LWZ(12,12) | (lo + 4);
}
or1k_write_plt_entry (output_bfd, splt->contents,
plt0, plt1, plt2, OR1K_JR(15));
elf_section_data (splt->output_section)->this_hdr.sh_entsize = 4;
Here we see a write to output_bfd
, this represents the output object file
which we are writing to. The argument splt->contents
represents the object
file offset to write to for the .plt
section. Next we see the line
elf_section_data (splt->output_section)->this_hdr.sh_entsize = 4
this allows the linker to calculate the size of the section.
A snippet from the or1k_elf_finish_dynamic_symbol()
function shows where
we write out the code and dynamic relocation entries for each symbol to
the .plt
section.
splt = htab->root.splt;
sgot = htab->root.sgotplt;
srela = htab->root.srelplt;
...
else
{
unsigned ha = ((got_addr + 0x8000) >> 16) & 0xffff;
unsigned lo = got_addr & 0xffff;
plt0 = OR1K_MOVHI(12) | ha;
plt1 = OR1K_LWZ(12,12) | lo;
plt2 = OR1K_ORI0(11) | plt_reloc;
}
or1k_write_plt_entry (output_bfd, splt->contents + h->plt.offset,
plt0, plt1, plt2, OR1K_JR(12));
/* Fill in the entry in the global offset table. We initialize it to
point to the top of the plt. This is done to lazy lookup the actual
symbol as the first plt entry will be setup by libc to call the
runtime dynamic linker. */
bfd_put_32 (output_bfd, plt_base_addr, sgot->contents + got_offset);
/* Fill in the entry in the .rela.plt section. */
rela.r_offset = got_addr;
rela.r_info = ELF32_R_INFO (h->dynindx, R_OR1K_JMP_SLOT);
rela.r_addend = 0;
loc = srela->contents;
loc += plt_index * sizeof (Elf32_External_Rela);
bfd_elf32_swap_reloca_out (output_bfd, &rela, loc);
Here we can see we write 3 things to output_bfd
for the single .plt
entry.
We write:
- The assembly code to the
.plt
section. - The
plt_base_addr
(the first entry in the.plt
for runtime lookup) to the.got
section. - And finally a dynamic relocation for our symbol to the
.plt.rela
.
With that we have written all of the sections out to our final elf object, and it’s ready to be used.
GLIBC Runtime Linker
The runtime linker, also referred to as the dynamic linker, will do the final linking as we load our program and shared libraries into memory. It can process a limited set of relocation entries that were setup above during phase 4 of linking.
The runtime linker implementation is found mostly in the
elf/dl-*
GLIBC source files. Dynamic relocation processing is handled in by
the _dl_relocate_object()
function in the elf/dl-reloc.c
file. The back end macro used for relocation
ELF_DYNAMIC_RELOCATE
is defined across several files including elf/dynamic-link.h
and elf/do-rel.h
Architecture specific relocations are handled by the function elf_machine_rela()
, the implementation
for OpenRISC being in sysdeps/or1k/dl-machine.h.
In summary from top down:
- elf/rtld.c - implements
dl_main()
the top level entry for the dynamic linker. - elf/dl-open.c - function
dl_open_worker()
calls_dl_relocate_object()
, you may also recognize this from dlopen(3). - elf/dl-reloc.c - function
_dl_relocate_object
callsELF_DYNAMIC_RELOCATE
elf/dynamic-link.h
- defined macroELF_DYNAMIC_RELOCATE
callself_dynamic_do_Rel()
via several macroself/do-rel.h
- functionelf_dynamic_do_Rel()
callself_machine_rela()
sysdeps/or1k/dl-machine.h
- architecture specific functionelf_machine_rela()
implements dynamic relocation handling
It supports relocations for:
R_OR1K_NONE
- do nothingR_OR1K_COPY
- used to copy initial values from shared objects to process memory.R_OR1K_32
- a32-bit
valueR_OR1K_GLOB_DAT
- aligned32-bit
values forGOT
entriesR_OR1K_JMP_SLOT
- aligned32-bit
values forPLT
entriesR_OR1K_TLS_DTPMOD/R_OR1K_TLS_DTPOFF
- for shared TLS GDGOT
entriesR_OR1K_TLS_TPOFF
- for shared TLS IEGOT
entries
A snippet of the OpenRISC implementation of elf_machine_rela()
can be seen
below. It is pretty straight forward.
/* Perform the relocation specified by RELOC and SYM (which is fully resolved).
MAP is the object containing the reloc. */
auto inline void
__attribute ((always_inline))
elf_machine_rela (struct link_map *map, const Elf32_Rela *reloc,
const Elf32_Sym *sym, const struct r_found_version *version,
void *const reloc_addr_arg, int skip_ifunc)
{
struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);
Elf32_Addr value = SYMBOL_ADDRESS (sym_map, sym, true);
...
switch (r_type)
{
...
case R_OR1K_32:
/* Support relocations on mis-aligned offsets. */
value += reloc->r_addend;
memcpy (reloc_addr_arg, &value, 4);
break;
case R_OR1K_GLOB_DAT:
case R_OR1K_JMP_SLOT:
*reloc_addr = value + reloc->r_addend;
break;
...
}
}
Handling TLS
The complicated part of the runtime linker is how it handles TLS variables.
This is done in the following files and functions.
elf/rtld.c
- implements init_tls() which initializes the TLS data structures.- elf/dl-tls.c - The runtime linker tls implementation the top level initialization code including _dl_allocate_tls_storage() and _dl_allocate_tls_init().
The reader can read through the initialization code which is pretty straight forward, except for the macros. Like most GNU code the code relies heavily on untyped macros. These macros are defined in the architecture specific implementation files. For OpenRISC this is:
- sysdeps/or1k/nptl/tls.h - contains the definition of the TLS structures used for OpenRISC.
From the previous article on TLS we have the TLS data structure that looks as follows:
dtv[] [ dtv[0], dtv[1], dtv[2], .... ]
counter ^ | \
----/ / \________
/ V V
/------TCB-------\/----TLS[1]----\ /----TLS[2]----\
| pthread tcbhead | tbss tdata | | tbss tdata |
\----------------/\--------------/ \--------------/
^
|
TP-----/
The symbols and macros defined in sysdeps/or1k/nptl/tls.h
are:
__thread_self
- a symbol representing the current thread alwaysTLS_DTV_AT_TP
- used throughout the TLS code to adjust offsetsTLS_TCB_AT_TP
- used throughout the TLS code to adjust offsetsTLS_TCB_SIZE
- used duringinit_tls()
to allocate memory for TLSTLS_PRE_TCB_SIZE
- used duringinit_tls()
to allocate space for thepthread
structINSTALL_DTV
- used during initialization to update a new dtv pointer into the given tcbGET_DTV
- gets dtv via the provided tcb pointerINSTALL_NEW_DTV
- used during resizing to update the dtv into the current runtime__thread_self
TLS_INIT_TP
- sets__thread_self
this is the final step ininit_tls()
THREAD_DTV
- gets dtv via _thread_selfTHREAD_SELF
- get the pthread pointer via__thread_self
Implementations for OpenRISC are:
register tcbhead_t *__thread_self __asm__("r10");
#define TLS_DTV_AT_TP 1
#define TLS_TCB_AT_TP 0
#define TLS_TCB_SIZE sizeof (tcbhead_t)
#define TLS_PRE_TCB_SIZE sizeof (struct pthread)
#define INSTALL_DTV(tcbp, dtvp) (((tcbhead_t *) (tcbp))->dtv = (dtvp) + 1)
#define GET_DTV(tcbp) (((tcbhead_t *) (tcbp))->dtv)
#define TLS_INIT_TP(tcbp) ({__thread_self = ((tcbhead_t *)tcbp + 1); NULL;})
#define THREAD_DTV() ((((tcbhead_t *)__thread_self)-1)->dtv)
#define INSTALL_NEW_DTV(dtv) (THREAD_DTV() = (dtv))
#define THREAD_SELF \
((struct pthread *) ((char *) __thread_self - TLS_INIT_TCB_SIZE \
- TLS_PRE_TCB_SIZE))
Summary
We have looked at how symbols move from the Compiler, to Assembler, to Linker to Runtime linker.
This has ended up being a long article to explain a rather complicated subject. Let’s hope it helps provide a good reference for others who want to work on the GNU toolchain in the future.
Further Reading
- GCC Passes - My blog entry on GCC passes
- bfdint - The BFD developer’s manual
- ldint - The LD developer’s manual
- LD and BFD Gist - Dump of notes I collected while working on this article.