shorne in japan

blog archive about resume

How Relocations and Thread Local Store are Implemented

21 Jul 2020

This is an ongoing series of posts on ELF Binary Relocations and Thread Local Storage. This article covers only Thread Local Storage and assumes the reader has had a primer in ELF Relocations, if not please start with my previous article ELF Binaries and Relocation Entries.

This is the third part in an illustrated 3 part series covering:

In the last article we covered how Thread Local Storage (TLS) works at runtime, but how do we get there? How does the compiler and linker create the memory structures and code fragments described in the previous article?

In this article we will discuss how TLS relocations are is implemented. Our outline:

As before, the examples in this article can be found in my tls-examples project. Please check it out.

The GNU toolchain

I will assume here that most people understand what a compiler and assembler basically do. In the sense that compiler will compile routines written C code or something similar to assembly language. It is then up to the assembler to turn that assembly code into machine code to run on a CPU.

That is a big part of what a toolchain does, and it’s pretty much that simple if we have a single file of source code. But usually we don’t have a single file, we have the multiple files, the c runtime, crt0 and other libraries like libc. These all need to be put together into our final program, that is where the complexities of the linker comes in.

In this article I will cover how variables in our source code (symbols) traverse the toolchain from code to the memory in our final running program. A picture that looks something like this:

GCC and Linker

The Compiler

First we start off with how relocations are created and emitted in the compiler.

As I work primarily on the GNU toolchain with it’s GCC compiler we will look at that, let’s get started.

GCC Legitimize Address

To start we define a symbol as named address in memory. This address can be a program variable where data is stored or function reference to where a subroutine starts.

In GCC we have have TARGET_LEGITIMIZE_ADDRESS, the OpenRISC implementation being or1k_legitimize_address(). It takes a symbol (memory address) and makes it usable in our CPU by generating RTX sequences that are possible on our CPU to load that address into a register.

RTX represents a tree node in GCC’s register transfer language (RTL). The RTL Expression is used to express our algorithm as a series of register transfers. This is used as register transfer is basically what a CPU does.

A snippet from legitimize_address() function is below. The argument x represents our input symbol (memory address) that we need to make usable by our CPU. This code uses GCC internal API’s to emit RTX code sequences.

static rtx
or1k_legitimize_address (rtx x, rtx /* unused */, machine_mode /* unused */)

    ...

	case TLS_MODEL_NONE:
	  t1 = can_create_pseudo_p () ? gen_reg_rtx (Pmode) : scratch;
	  if (!flag_pic)
	    {
	      emit_insn (gen_rtx_SET (t1, gen_rtx_HIGH (Pmode, x)));
	      return gen_rtx_LO_SUM (Pmode, t1, x);
	    }
	  else if (is_local)
	    {
	      crtl->uses_pic_offset_table = 1;
	      t2 = gen_sym_unspec (x, UNSPEC_GOTOFF);
	      emit_insn (gen_rtx_SET (t1, gen_rtx_HIGH (Pmode, t2)));
	      emit_insn (gen_add3_insn (t1, t1, pic_offset_table_rtx));
	      return gen_rtx_LO_SUM (Pmode, t1, copy_rtx (t2));
	    }
	  else
	    {
              ...

We can read the code snippet above as follows:

You may have noticed that the local symbol still used the global offset table (GOT). This is because Position-idependent code requires using the GOT to reference symbols.

An example, from nontls.c:

static int x;

int *get_x_addr() {
   return &x;
}

Example of the non pic case above, when we look at the assembly code generated by GCC we can see the following:

	.file	"nontls.c"
	.section	.text
	.local	x
	.comm	x,4,4
	.align 4
	.global	get_x_addr
	.type	get_x_addr, @function
get_x_addr:
	l.addi	r1, r1, -8       # \
	l.sw	0(r1), r2        # | function prologue
	l.addi	r2, r1, 8        # |
	l.sw	4(r1), r9        # /
	l.movhi	r17, ha(x)       # \__ legitimize address of x into r17
	l.addi	r17, r17, lo(x)  # /
	l.or	r11, r17, r17    # } place result in return register r11
	l.lwz	r2, 0(r1)        # \
	l.lwz	r9, 4(r1)        # | function epilogue
	l.addi	r1, r1, 8        # |
	l.jr	r9               # |
	 l.nop                   # /

	.size	get_x_addr, .-get_x_addr
	.ident	"GCC: (GNU) 9.0.1 20190409 (experimental)"

Example of the local pic case above the same code compiled with the -fPIC GCC option looks like the following:

	.file	"nontls.c"
	.section	.text
	.local	x
	.comm	x,4,4
	.align 4
	.global	get_x_addr
	.type	get_x_addr, @function
get_x_addr:
	l.addi	r1, r1, -8      # \
	l.sw	0(r1), r2       # | function prologue
	l.addi	r2, r1, 8       # |
	l.sw	4(r1), r9       # /
	l.jal	8                                             # \
	 l.movhi	r19, gotpchi(_GLOBAL_OFFSET_TABLE_-4) # | PC relative, put
	l.ori	r19, r19, gotpclo(_GLOBAL_OFFSET_TABLE_+0)    # | GOT into r19
	l.add	r19, r19, r9                                  # /
	l.movhi	r17, gotoffha(x)        # \
	l.add	r17, r17, r19           # | legitimize address of x into r17
	l.addi	r17, r17, gotofflo(x)   # /
	l.or	r11, r17, r17   # } place result in return register r11
	l.lwz	r2, 0(r1)       # \
	l.lwz	r9, 4(r1)       # | function epilogue
	l.addi	r1, r1, 8       # |
	l.jr	r9              # |
	 l.nop                  # /

	.size	get_x_addr, .-get_x_addr
	.ident	"GCC: (GNU) 9.0.1 20190409 (experimental)"

TLS and Addend cases are also handled by or1k_legitimize_address().

GCC Print Operand

Once RTX is generated by legitimize address and GCC passes run all of their optimizations the RTX needs to be printed out as assembly code. During this process relocations are printed by GCC macros TARGET_PRINT_OPERAND_ADDRESS and TARGET_PRINT_OPERAND. In OpenRISC these defined by or1k_print_operand_address() and or1k_print_operand().

Let us have a look at or1k_print_operand_address().

/* Worker for TARGET_PRINT_OPERAND_ADDRESS.
   Prints the argument ADDR, an address RTX, to the file FILE.  The output is
   formed as expected by the OpenRISC assembler.  Examples:
     RTX							      OUTPUT
     (reg:SI 3)							       0(r3)
     (plus:SI (reg:SI 3) (const_int 4))				     0x4(r3)
     (lo_sum:SI (reg:SI 3) (symbol_ref:SI ("x")))		   lo(x)(r3)  */

static void
or1k_print_operand_address (FILE *file, machine_mode, rtx addr)
{
  rtx offset;

  switch (GET_CODE (addr))
    {
    case REG:
      fputc ('0', file);
      break;
    case ...
    case LO_SUM:
      offset = XEXP (addr, 1);
      addr = XEXP (addr, 0);
      print_reloc (file, offset, 0, RKIND_LO);
      break;
    default: ...
    }

  fprintf (file, "(%s)", reg_names[REGNO (addr)]);
}

The above code snippet can be read as we explain below, but let’s first make some notes:

With that, and if we use the or1k_print_operand_address() c comments above as examples of some RTX addr input we will have:

    RTX     |    (reg:SI 3)          (lo_sum:SI (reg:SI 3) (symbol_ref:SI("x")))
 -----------+--------------------------------------------------------------------
    TREE    |
   (code)   |  (code:REG regno:3)                (code:LO_SUM)
   /    \   |                                      /        \
  (0)   (1) |                         (code:REG regno:3)  (code:SYMBOL_REF "x")

We can now read the above snippet as:

The code of or1k_print_operand() is similar and the reader may be inclined to read more details. With that we can move on to the assembler.

TLS cases are also handled inside of the print_reloc() function.

The Assembler

In the GNU Toolchain our assembler is GAS, part of binutils.

The code that handles relocations is found in the function parse_reloc() found in opcodes/or1k-asm.c. The function parse_reloc() is the direct counterpart of GCC’s print_reloc() discussed above. This is actually part of or1k_cgen_parse_operand() which is wired into our assembler generator CGEN used for parsing operands.

If we are parsing a relocation like the one from above lo(x) then we can isolate the code that processes that relocation.

static const bfd_reloc_code_real_type or1k_imm16_relocs[][6] = {
  { BFD_RELOC_LO16,
    BFD_RELOC_OR1K_SLO16,
...
    BFD_RELOC_OR1K_TLS_LE_AHI16 },
};

static int
parse_reloc (const char **strp)
{
    const char *str = *strp;
    enum or1k_rclass cls = RCLASS_DIRECT;
    enum or1k_rtype typ;

    ...
    else if (strncasecmp (str, "lo(", 3) == 0)
      {
	str += 3;
	typ = RTYPE_LO;
      }
    ...

    *strp = str;
    return (cls << RCLASS_SHIFT) | typ;
}

This uses strncasecmp to match our "lo(" string pattern. The returned result is a relocation type and relocation class which are use to lookup the relocation BFD_RELOC_LO16 in the or1k_imm16_relocs[][] table which is indexed by relocation class and relocation class.

The assembler will encode that into the ELF binary. For TLS relocations the exact same pattern is used.

The Linker

In the GNU Toolchain our object linker is the GNU linker LD, also part of the binutils project.

The GNU linker uses the framework BFD or Binary File Descriptor which is a beast. It is not only used in the linker but also used in GDB, the GNU Simulator and the objdump tool.

What makes this possible is a rather complex API.

BFD Linker API

The BFD API is a generic binary file access API. It has been designed to support multiple file formats and architectures via an object oriented, polymorphic API all written in c. It supports file formats including a.out, COFF and ELF as well as unexpected file formats like verilog hex memory dumps.

Here we will concentrate on the BFD ELF implementation.

The API definition is split across multiple files which include:

For each architecture implementations are defined in bfd/elf{wordsize}-{architecture}.c. For example for OpenRISC we have bfd/elf32-or1k.c.

Throughout the linker code we see access to the BFD Linker and ELF APIs. Some key symbols to watch out for include:

Putting it all together we have a diagram like the following:

The BFD API

Now that we have a bit of understanding of the data structures we can look to the link algorithm.

The link process in the GNU Linker can be thought of in phases.

Phase 1 - Book Keeping (check_relocs)

The or1k_elf_check_relocs() function is called during the first phase to do book keeping on relocations. The function signature looks like:

static bfd_boolean
or1k_elf_check_relocs (bfd *abfd,
                       struct bfd_link_info *info,
                       asection *sec,
                       const Elf_Internal_Rela *relocs)

#define elf_backend_check_relocs        or1k_elf_check_relocs

The arguments being:

It does the book keeping by looping over relocations for the provided section and updating the local and global symbol properties.

For local symbols:

      ...
      else
	{
	  unsigned char *local_tls_type;

	  /* This is a TLS type record for a local symbol.  */
	  local_tls_type = (unsigned char *) elf_or1k_local_tls_type (abfd);
	  if (local_tls_type == NULL)
	    {
	      bfd_size_type size;

	      size = symtab_hdr->sh_info;
	      local_tls_type = bfd_zalloc (abfd, size);
	      if (local_tls_type == NULL)
		return FALSE;
	      elf_or1k_local_tls_type (abfd) = local_tls_type;
	    }
	  local_tls_type[r_symndx] |= tls_type;
	}

	      ...
	      else
		{
		  bfd_signed_vma *local_got_refcounts;

		  /* This is a global offset table entry for a local symbol.  */
		  local_got_refcounts = elf_local_got_refcounts (abfd);
		  if (local_got_refcounts == NULL)
		    {
		      bfd_size_type size;

		      size = symtab_hdr->sh_info;
		      size *= sizeof (bfd_signed_vma);
		      local_got_refcounts = bfd_zalloc (abfd, size);
		      if (local_got_refcounts == NULL)
			return FALSE;
		      elf_local_got_refcounts (abfd) = local_got_refcounts;
		    }
		  local_got_refcounts[r_symndx] += 1;
		}

The above is pretty straight forward and we can read as:

For global symbols, it’s much more easy we see:

      ...
      if (h != NULL)
	  ((struct elf_or1k_link_hash_entry *) h)->tls_type |= tls_type;
      else
      ...

	      if (h != NULL)
		h->got.refcount += 1;
	      else
		...

As the tls_type and refcount fields are available directly on each hash_entry handling global symbols is much easier.

The above is repeated for all relocations and all input sections. A few other things are also done including accounting for .plt entries.

Phase 2 - creating space (size_dynamic_sections + _bfd_elf_create_dynamic_sections)

The or1k_elf_size_dynamic_sections() function iterates over all input object files to calculate the size required for output sections. The _bfd_elf_create_dynamic_sections() function does the actual section allocation, we use the generic version.

Setting up the sizes of the .got section (global offset table) and .plt section (procedure link table) is done here.

The definition is as below:

static bfd_boolean
or1k_elf_size_dynamic_sections (bfd *output_bfd ATTRIBUTE_UNUSED,
                                struct bfd_link_info *info)

#define elf_backend_size_dynamic_sections       or1k_elf_size_dynamic_sections
#define elf_backend_create_dynamic_sections     _bfd_elf_create_dynamic_sections

The arguments to or1k_elf_size_dynamic_sections() being:

Internally the function uses:

During the first part of phase 2 we set .got and .got.rela section sizes for local symbols with this code:

  /* Set up .got offsets for local syms, and space for local dynamic
     relocs.  */
  for (ibfd = info->input_bfds; ibfd != NULL; ibfd = ibfd->link.next)
    {
      ...
      local_got = elf_local_got_refcounts (ibfd);
      if (!local_got)
	continue;

      symtab_hdr = &elf_tdata (ibfd)->symtab_hdr;
      locsymcount = symtab_hdr->sh_info;
      end_local_got = local_got + locsymcount;
      s = htab->root.sgot;
      srel = htab->root.srelgot;
      local_tls_type = (unsigned char *) elf_or1k_local_tls_type (ibfd);
      for (; local_got < end_local_got; ++local_got)
	{
	  if (*local_got > 0)
	    {
	      unsigned char tls_type = (local_tls_type == NULL)
					? TLS_UNKNOWN
					: *local_tls_type;

	      *local_got = s->size;
	      or1k_set_got_and_rela_sizes (tls_type, bfd_link_pic (info),
					   &s->size, &srel->size);
	    }
	  else
	    *local_got = (bfd_vma) -1;

	  if (local_tls_type)
	    ++local_tls_type;
	}
    }

Here, for example, we can see we iterate over each input elf object ibfd and each local symbol (local_got) we try and update s->size and srel->size to account for the required size.

The above can be read as:

In the next part of phase 2 we allocate space for all global symbols by iterating through symbols in htab with the allocate_dynrelocs iterator. To do that we call:

  elf_link_hash_traverse (&htab->root, allocate_dynrelocs, info);

Inside allocate_dynrelocs() we record the space used for relocations and the .got and .plt sections. Example:

  if (h->got.refcount > 0)
    {
      asection *sgot;
      bfd_boolean dyn;
      unsigned char tls_type;

      ...
      sgot = htab->root.sgot;

      h->got.offset = sgot->size;

      tls_type = ((struct elf_or1k_link_hash_entry *) h)->tls_type;

      dyn = htab->root.dynamic_sections_created;
      dyn = WILL_CALL_FINISH_DYNAMIC_SYMBOL (dyn, bfd_link_pic (info), h);
      or1k_set_got_and_rela_sizes (tls_type, dyn,
				   &sgot->size, &htab->root.srelgot->size);
    }
  else
    h->got.offset = (bfd_vma) -1;

The above, with h being our global symbol, a pointer to struct elf_link_hash_entry, can be read as:

The function or1k_set_got_and_rela_sizes() used above is used to increment .got and .rela section sizes accounting for if these are TLS symbols, which need additional entries and relocations.

Phase 3 - linking (relocate_section)

The or1k_elf_relocate_section() function is called to fill in the relocation holes in the output binary .text section. It does this by looping over relocations and writing to the .text section the correct symbol value (memory address). It also updates other output binary sections like the .got section. Also, for dynamic executables and libraries new relocations may be written to .rela sections.

The function signature looks as follows:

static bfd_boolean
or1k_elf_relocate_section (bfd *output_bfd,
                           struct bfd_link_info *info,
                           bfd *input_bfd,
                           asection *input_section,
                           bfd_byte *contents,
                           Elf_Internal_Rela *relocs,
                           Elf_Internal_Sym *local_syms,
                           asection **local_sections)

#define elf_backend_relocate_section    or1k_elf_relocate_section

The arguments to or1k_elf_relocate_sectioni() being:

Internally the function uses:

During the first part of relocate_section we see:

      if (r_symndx < symtab_hdr->sh_info)
	{
	  sym = local_syms + r_symndx;
	  sec = local_sections[r_symndx];
	  relocation = _bfd_elf_rela_local_sym (output_bfd, sym, &sec, rel);

	  name = bfd_elf_string_from_elf_section
	    (input_bfd, symtab_hdr->sh_link, sym->st_name);
	  name = name == NULL ? bfd_section_name (sec) : name;
	}
      else
	{
	  bfd_boolean unresolved_reloc, warned, ignored;

	  RELOC_FOR_GLOBAL_SYMBOL (info, input_bfd, input_section, rel,
				   r_symndx, symtab_hdr, sym_hashes,
				   h, sec, relocation,
				   unresolved_reloc, warned, ignored);
	  name = h->root.root.string;
	}

This can be read as:

During the next part we use the howto information to update the relocation value, and also add relocations to the output file. For example:

	case R_OR1K_TLS_GD_HI16:
	case R_OR1K_TLS_GD_LO16:
	case R_OR1K_TLS_GD_PG21:
	case R_OR1K_TLS_GD_LO13:
	case R_OR1K_TLS_IE_HI16:
	case R_OR1K_TLS_IE_LO16:
	case R_OR1K_TLS_IE_PG21:
	case R_OR1K_TLS_IE_LO13:
	case R_OR1K_TLS_IE_AHI16:
	  {
	    bfd_vma gotoff;
	    Elf_Internal_Rela rela;
	    asection *srelgot;
	    bfd_byte *loc;
	    bfd_boolean dynamic;
	    int indx = 0;
	    unsigned char tls_type;

	    srelgot = htab->root.srelgot;

	    /* Mark as TLS related GOT entry by setting
	       bit 2 to indcate TLS and bit 1 to indicate GOT.  */
	    if (h != NULL)
	      {
		gotoff = h->got.offset;
		tls_type = ((struct elf_or1k_link_hash_entry *) h)->tls_type;
		h->got.offset |= 3;
	      }
	    else
	      {
		unsigned char *local_tls_type;

		gotoff = local_got_offsets[r_symndx];
		local_tls_type = (unsigned char *) elf_or1k_local_tls_type (input_bfd);
		tls_type = local_tls_type == NULL ? TLS_NONE
						  : local_tls_type[r_symndx];
		local_got_offsets[r_symndx] |= 3;
	      }

	    /* Only process the relocation once.  */
	    if ((gotoff & 1) != 0)
	      {
		gotoff += or1k_initial_exec_offset (howto, tls_type);

		/* The PG21 and LO13 relocs are pc-relative, while the
		   rest are GOT relative.  */
		relocation = got_base + (gotoff & ~3);
		if (!(r_type == R_OR1K_TLS_GD_PG21
		    || r_type == R_OR1K_TLS_GD_LO13
		    || r_type == R_OR1K_TLS_IE_PG21
		    || r_type == R_OR1K_TLS_IE_LO13))
		  relocation -= got_sym_value;
		break;
	      }

           ...

	    /* Static GD.  */
	    else if ((tls_type & TLS_GD) != 0)
	      {
		bfd_put_32 (output_bfd, 1, sgot->contents + gotoff);
		bfd_put_32 (output_bfd, tpoff (info, relocation, dynamic),
		    sgot->contents + gotoff + 4);
	      }

	    gotoff += or1k_initial_exec_offset (howto, tls_type);

	    ...

	    /* Static IE.  */
	    else if ((tls_type & TLS_IE) != 0)
	      bfd_put_32 (output_bfd, tpoff (info, relocation, dynamic),
			  sgot->contents + gotoff);

	    /* The PG21 and LO13 relocs are pc-relative, while the
	       rest are GOT relative.  */
	    relocation = got_base + gotoff;
	    if (!(r_type == R_OR1K_TLS_GD_PG21
		  || r_type == R_OR1K_TLS_GD_LO13
		  || r_type == R_OR1K_TLS_IE_PG21
		  || r_type == R_OR1K_TLS_IE_LO13))
	      relocation -= got_sym_value;
	  }
	  break;

Here we process the relocation for TLS General Dynamic and Initial Exec relocations. I have trimmed out the shared cases to save space.

This can be read as:

In the last part of the loop we write the relocation value to the output .text section. This is done with the or1k_final_link_relocate() function.

      r = or1k_final_link_relocate (howto, input_bfd, input_section, contents,
				    rel->r_offset, relocation + rel->r_addend);

With this the .text section is complete.

Phase 4 - finishing up (finish_dynamic_symbol + finish_dynamic_sections)

During phase 3 above we wrote the .text section out to file. During the final finishing up phase we need to write the remaining sections. This includes the .plt section an more writes to the .got section.

This also includes the .plt.rela and .got.rela sections which contain dynamic relocation entries.

Writing of the data sections is handled by or1k_elf_finish_dynamic_sections() and writing of the relocation sections is handled by or1k_elf_finish_dynamic_symbol(). These are defined as below.

static bfd_boolean
or1k_elf_finish_dynamic_sections (bfd *output_bfd,
                                  struct bfd_link_info *info)

static bfd_boolean
or1k_elf_finish_dynamic_symbol (bfd *output_bfd,
                                struct bfd_link_info *info,
                                struct elf_link_hash_entry *h,
                                Elf_Internal_Sym *sym)

#define elf_backend_finish_dynamic_sections     or1k_elf_finish_dynamic_sections
#define elf_backend_finish_dynamic_symbol       or1k_elf_finish_dynamic_symbol

A snippet for the or1k_elf_finish_dynamic_sections() shows how when writing to the .plt section assembly code needs to be injected. This is where the first entry in the .plt section is written.

          else if (bfd_link_pic (info))
            {
              plt0 = OR1K_LWZ(15, 16) | 8;      /* .got+8 */
              plt1 = OR1K_LWZ(12, 16) | 4;      /* .got+4 */
              plt2 = OR1K_NOP;
            }
          else
            {
              unsigned ha = ((got_addr + 0x8000) >> 16) & 0xffff;
              unsigned lo = got_addr & 0xffff;
              plt0 = OR1K_MOVHI(12) | ha;
              plt1 = OR1K_LWZ(15,12) | (lo + 8);
              plt2 = OR1K_LWZ(12,12) | (lo + 4);
            }

          or1k_write_plt_entry (output_bfd, splt->contents,
                                plt0, plt1, plt2, OR1K_JR(15));

          elf_section_data (splt->output_section)->this_hdr.sh_entsize = 4;

Here we see a write to output_bfd, this represents the output object file which we are writing to. The argument splt->contents represents the object file offset to write to for the .plt section. Next we see the line elf_section_data (splt->output_section)->this_hdr.sh_entsize = 4 this allows the linker to calculate the size of the section.

A snippet from the or1k_elf_finish_dynamic_symbol() function shows where we write out the code and dynamic relocation entries for each symbol to the .plt section.

      splt = htab->root.splt;
      sgot = htab->root.sgotplt;
      srela = htab->root.srelplt;
      ...

      else
        {
          unsigned ha = ((got_addr + 0x8000) >> 16) & 0xffff;
          unsigned lo = got_addr & 0xffff;
          plt0 = OR1K_MOVHI(12) | ha;
          plt1 = OR1K_LWZ(12,12) | lo;
          plt2 = OR1K_ORI0(11) | plt_reloc;
        }

      or1k_write_plt_entry (output_bfd, splt->contents + h->plt.offset,
                            plt0, plt1, plt2, OR1K_JR(12));

      /* Fill in the entry in the global offset table.  We initialize it to
         point to the top of the plt.  This is done to lazy lookup the actual
         symbol as the first plt entry will be setup by libc to call the
         runtime dynamic linker.  */
      bfd_put_32 (output_bfd, plt_base_addr, sgot->contents + got_offset);

      /* Fill in the entry in the .rela.plt section.  */
      rela.r_offset = got_addr;
      rela.r_info = ELF32_R_INFO (h->dynindx, R_OR1K_JMP_SLOT);
      rela.r_addend = 0;
      loc = srela->contents;
      loc += plt_index * sizeof (Elf32_External_Rela);
      bfd_elf32_swap_reloca_out (output_bfd, &rela, loc);

Here we can see we write 3 things to output_bfd for the single .plt entry. We write:

With that we have written all of the sections out to our final elf object, and it’s ready to be used.

GLIBC Runtime Linker

The runtime linker, also referred to as the dynamic linker, will do the final linking as we load our program and shared libraries into memory. It can process a limited set of relocation entries that were setup above during phase 4 of linking.

The runtime linker implementation is found mostly in the elf/dl-* GLIBC source files. Dynamic relocation processing is handled in by the _dl_relocate_object() function in the elf/dl-reloc.c file. The back end macro used for relocation ELF_DYNAMIC_RELOCATE is defined across several files including elf/dynamic-link.h and elf/do-rel.h Architecture specific relocations are handled by the function elf_machine_rela(), the implementation for OpenRISC being in sysdeps/or1k/dl-machine.h.

In summary from top down:

It supports relocations for:

A snippet of the OpenRISC implementation of elf_machine_rela() can be seen below. It is pretty straight forward.

/* Perform the relocation specified by RELOC and SYM (which is fully resolved).
   MAP is the object containing the reloc.  */

auto inline void
__attribute ((always_inline))
elf_machine_rela (struct link_map *map, const Elf32_Rela *reloc,
                  const Elf32_Sym *sym, const struct r_found_version *version,
                  void *const reloc_addr_arg, int skip_ifunc)
{

      struct link_map *sym_map = RESOLVE_MAP (&sym, version, r_type);
      Elf32_Addr value = SYMBOL_ADDRESS (sym_map, sym, true);

     ...
      switch (r_type)
        {
          ...
          case R_OR1K_32:
            /* Support relocations on mis-aligned offsets.  */
            value += reloc->r_addend;
            memcpy (reloc_addr_arg, &value, 4);
            break;
          case R_OR1K_GLOB_DAT:
          case R_OR1K_JMP_SLOT:
            *reloc_addr = value + reloc->r_addend;
            break;
          ...
        }
}

Handling TLS

The complicated part of the runtime linker is how it handles TLS variables.

This is done in the following files and functions.

The reader can read through the initialization code which is pretty straight forward, except for the macros. Like most GNU code the code relies heavily on untyped macros. These macros are defined in the architecture specific implementation files. For OpenRISC this is:

From the previous article on TLS we have the TLS data structure that looks as follows:

  dtv[]   [ dtv[0], dtv[1], dtv[2], .... ]
            counter ^ |      \
               ----/ /        \________
              /     V                  V
/------TCB-------\/----TLS[1]----\   /----TLS[2]----\
| pthread tcbhead | tbss   tdata |   | tbss   tdata |
\----------------/\--------------/   \--------------/
          ^
          |
   TP-----/

The symbols and macros defined in sysdeps/or1k/nptl/tls.h are:

Implementations for OpenRISC are:

register tcbhead_t *__thread_self __asm__("r10");

#define TLS_DTV_AT_TP  1
#define TLS_TCB_AT_TP  0

#define TLS_TCB_SIZE             sizeof (tcbhead_t)
#define TLS_PRE_TCB_SIZE         sizeof (struct pthread)

#define INSTALL_DTV(tcbp, dtvp)  (((tcbhead_t *) (tcbp))->dtv = (dtvp) + 1)
#define GET_DTV(tcbp)            (((tcbhead_t *) (tcbp))->dtv)

#define TLS_INIT_TP(tcbp)        ({__thread_self = ((tcbhead_t *)tcbp + 1); NULL;})

#define THREAD_DTV()             ((((tcbhead_t *)__thread_self)-1)->dtv)
#define INSTALL_NEW_DTV(dtv)     (THREAD_DTV() = (dtv))

#define THREAD_SELF \
  ((struct pthread *) ((char *) __thread_self - TLS_INIT_TCB_SIZE \
    - TLS_PRE_TCB_SIZE))

Summary

We have looked at how symbols move from the Compiler, to Assembler, to Linker to Runtime linker.

This has ended up being a long article to explain a rather complicated subject. Let’s hope it helps provide a good reference for others who want to work on the GNU toolchain in the future.

Further Reading