In the last article we introduced the OpenRISC glibc FPU port and the effort required to get user space FPU support into OpenRISC linux user applications. We explained how the FPU port is a fullstack project covering:
- Architecture Specification
- Simulators and CPU implementations
- Linux Kernel support
- GCC Instructions and Soft FPU
- Binutils/GDB Debugging Support
- glibc support
In this entry we will cover updating Simulators and CPU implementations to support the architecture changes which are called for as per the previous article.
- Allowing usermode programs to update the FPCSR register
- Detecting tininess before rounding
Simulator Updates
The simulators used for testing OpenRISC software without hardware are QEMU and or1ksim. They both needed to be updated to cohere to the specification updates discussed above.
Or1ksim Updates
The OpenRISC architectue simulator or1ksim has been updated with the single patch: cpu: Allow FPCSR to be read/written in user mode.
The softfloat FPU implementation was already configured to detect tininess before rounding.
If you are interested you can download and run the simulator and test this out with a docker image pulled from docker hub using the following:
# using podman instead of docker, you can use docker here too
podman pull stffrdhrn/or1k-sim-env:latest
podman run -it --rm stffrdhrn/or1k-sim-env:latest
root@9a4a52eec8ee:/tmp# or1k-elf-sim -version
Seeding random generator with value 0x4a3c2bbd
OpenRISC 1000 Architectural Simulator, version 2023-08-20
This starts up an environment which has access to the OpenRISC architecture simulator and a GNU compiler toolchain. While still in the container can run a quick test using the FPU as follows:
# Create a test program using OpenRISC FPU
cat > fpee.c <<EOF
#include <float.h>
#include <stdio.h>
#include <or1k-sprs.h>
#include <or1k-support.h>
static void enter_user_mode() {
int32_t sr = or1k_mfspr(OR1K_SPR_SYS_SR_ADDR);
sr &= ~OR1K_SPR_SYS_SR_SM_MASK;
or1k_mtspr(OR1K_SPR_SYS_SR_ADDR, sr);
}
static void enable_fpu_exceptions() {
unsigned long fpcsr = OR1K_SPR_SYS_FPCSR_FPEE_MASK;
or1k_mtspr(OR1K_SPR_SYS_FPCSR_ADDR, fpcsr);
}
static void fpe_handler() {
printf("Got FPU Exception, PC: 0x%lx\n", or1k_mfspr(OR1K_SPR_SYS_EPCR_BASE));
}
int main() {
float result;
or1k_exception_handler_add(0xd, fpe_handler);
#ifdef USER_MODE
/* Note, printf here also allocates some memory allowing user mode runtime to
work. */
printf("Enabling user mode\n");
enter_user_mode();
#endif
enable_fpu_exceptions();
printf("Exceptions enabled, now DIV 3.14 / 0!\n");
result = 3.14f / 0.0f;
/* Verify we see infinity. */
printf("Result: %f\n", result);
/* Verify we see DZF set. */
printf("FPCSR: %x\n", or1k_mfspr(OR1K_SPR_SYS_FPCSR_ADDR));
#ifdef USER_MODE
asm volatile("l.movhi r3, 0; l.nop 1"); /* Exit sim, now */
#endif
return 0;
}
EOF
# Compile the program
or1k-elf-gcc -g -O2 -mhard-float fpee.c -o fpee
or1k-elf-sim -f /opt/or1k/sim.cfg ./fpee
# Expected results
# Program Header: PT_LOAD, vaddr: 0x00000000, paddr: 0x0 offset: 0x00002000, filesz: 0x000065ab, memsz: 0x000065ab
# Program Header: PT_LOAD, vaddr: 0x000085ac, paddr: 0x85ac offset: 0x000085ac, filesz: 0x000000c8, memsz: 0x0000046c
# WARNING: sim_init: Debug module not enabled, cannot start remote service to GDB
# Exceptions enabled, now DIV 3.14 / 0!
# Got FPU Exception, PC: 0x2068
# Result: f
# FPCSR: 801
# Compile the program to run in USER_MODE
or1k-elf-gcc -g -O2 -mhard-float -DUSER_MODE fpee.c -o fpee
or1k-elf-sim -f /opt/or1k/sim.cfg ./fpee
# Expected results with USER_MODE
# Program Header: PT_LOAD, vaddr: 0x00000000, paddr: 0x0 offset: 0x00002000, filesz: 0x000065ab, memsz: 0x000065ab
# Program Header: PT_LOAD, vaddr: 0x000085ac, paddr: 0x85ac offset: 0x000085ac, filesz: 0x000000c8, memsz: 0x0000046c
# WARNING: sim_init: Debug module not enabled, cannot start remote service to GDB
# Enabling user mode
# Exceptions enabled, now DIV 3.14 / 0!
# Got FPU Exception, PC: 0x2068
# Result: f
# FPCSR: 801
# exit(0)
In the above we can see how to compile and run a simple FPU test program and run it on or1ksim. The program set’s up an FPU exception handler, enables exceptions then does a divide by zero to produce an exception. This program uses the OpenRISC newlib (baremetal) toolchain to compile a program that can run directly on the simulator, as oppposed to a program running in an OS on a simulator or hardware.
Note, that normally newlib programs expect to run in supervisor mode, when our program switches to user mode we need to take some precautions to ensure it can run correctly. As noted in the comments, usually when allocating and exiting the newlib runtime will do things like disabling/enabling interrupts which will fail when running in user mode.
QEMU Updates
The QEMU update was done in my OpenRISC user space FPCSR qemu patch series. The series was merged for the qemu 8.1 release.
The updates were split it into three changes:
- Allowing FPCSR access in user mode.
- Properly set the exception PC address on floating point exceptions.
- Configuring the QEMU softfloat implementation to perform tininess check before rounding.
QEMU Patch 1
The first patch to allow FPCSR access in user mode was trivial, but required some code structure changes making the patch look bigger than it really was.
QEMU Patch 2
The next patch to properly set the exception PC address fixed a long existing
bug where the EPCR
was not properly updated after FPU exceptions. Up until now
OpenRISC userspace did not support FPU instructions and this code path had not
been tested.
To explain why this fix is important let us look at the EPCR
and what it is used for
in a bit more detail.
In general, when an exception occurs an OpenRISC CPU will store the program counter (PC
)
of the instruction that caused the exception into the exeption program counter address
(EPCR
). Floating point exceptions are a special case in that the EPCR
is
actually set to the next instruction to be executed, this is to avoid looping.
When the linux kernel handles a floating point exception it follows the path
0xd00 > fpe_trap_handler > do_fpe_trap. This will setup a
signal to be delivered to the user process.
The Linux OS uses the EPCR
to report the exception instruction address to
userspace via a signal
which we can see being done in do_fpe_trap
which
we can see below:
asmlinkage void do_fpe_trap(struct pt_regs *regs, unsigned long address)
{
int code = FPE_FLTUNK;
unsigned long fpcsr = regs->fpcsr;
if (fpcsr & SPR_FPCSR_IVF)
code = FPE_FLTINV;
else if (fpcsr & SPR_FPCSR_OVF)
code = FPE_FLTOVF;
else if (fpcsr & SPR_FPCSR_UNF)
code = FPE_FLTUND;
else if (fpcsr & SPR_FPCSR_DZF)
code = FPE_FLTDIV;
else if (fpcsr & SPR_FPCSR_IXF)
code = FPE_FLTRES;
/* Clear all flags */
regs->fpcsr &= ~SPR_FPCSR_ALLF;
force_sig_fault(SIGFPE, code, (void __user *)regs->pc);
}
Here we see the excption becomes a SIGFPE
signal and the exception address in
regs->pc
is passed to force_sig_fault. The PC
will be used to set the
si_addr
field of the siginfo_t
structure.
Next upon return from kernel space to user space the path is do_fpe_trap
>
_fpe_trap_handler
> ret_from_exception > resume_userspace >
work_pending > do_work_pending > restore_all.
Inside of do_work_pending
with there the signal handling is done. In explain a bit
about this in the article Unwinding a Bug - How C++ Exceptions Work.
In restore_all
we see EPCR
is returned to when exception handling is
complete. A snipped of this code is show below:
#define RESTORE_ALL \
DISABLE_INTERRUPTS(r3,r4) ;\
l.lwz r3,PT_PC(r1) ;\
l.mtspr r0,r3,SPR_EPCR_BASE ;\
l.lwz r3,PT_SR(r1) ;\
l.mtspr r0,r3,SPR_ESR_BASE ;\
l.lwz r3,PT_FPCSR(r1) ;\
l.mtspr r0,r3,SPR_FPCSR ;\
l.lwz r2,PT_GPR2(r1) ;\
l.lwz r3,PT_GPR3(r1) ;\
l.lwz r4,PT_GPR4(r1) ;\
l.lwz r5,PT_GPR5(r1) ;\
l.lwz r6,PT_GPR6(r1) ;\
l.lwz r7,PT_GPR7(r1) ;\
l.lwz r8,PT_GPR8(r1) ;\
l.lwz r9,PT_GPR9(r1) ;\
l.lwz r10,PT_GPR10(r1) ;\
l.lwz r11,PT_GPR11(r1) ;\
l.lwz r12,PT_GPR12(r1) ;\
l.lwz r13,PT_GPR13(r1) ;\
l.lwz r14,PT_GPR14(r1) ;\
l.lwz r15,PT_GPR15(r1) ;\
l.lwz r16,PT_GPR16(r1) ;\
l.lwz r17,PT_GPR17(r1) ;\
l.lwz r18,PT_GPR18(r1) ;\
l.lwz r19,PT_GPR19(r1) ;\
l.lwz r20,PT_GPR20(r1) ;\
l.lwz r21,PT_GPR21(r1) ;\
l.lwz r22,PT_GPR22(r1) ;\
l.lwz r23,PT_GPR23(r1) ;\
l.lwz r24,PT_GPR24(r1) ;\
l.lwz r25,PT_GPR25(r1) ;\
l.lwz r26,PT_GPR26(r1) ;\
l.lwz r27,PT_GPR27(r1) ;\
l.lwz r28,PT_GPR28(r1) ;\
l.lwz r29,PT_GPR29(r1) ;\
l.lwz r30,PT_GPR30(r1) ;\
l.lwz r31,PT_GPR31(r1) ;\
l.lwz r1,PT_SP(r1) ;\
l.rfe
Here we can see how l.mtspr r0,r3,SPR_EPCR_BASE
restores the EPCR
to the pc
address stored in pt_regs
when we entered the exception handler. All
other register are restored and finally the l.rfe
instruction is issued to
return from the exception which affectively jumps to EPCR
.
The reason QEMU was not setting the correct exception address is due to the way
qemu is implemented which optimizes performance. QEMU executes target code
basic blocks that are translated to host native instructions, during runtime
all PC
addresses are those of the host, for example x86-64 64-bit
addresses. When an exception occurs, updating the target PC
address from the host PC
need to be explicityly requested.
QEMU Patch 3
The next patch to implement tininess before rouding was also trivial but brought up a conversation about default NaN payloads.
QEMU Patch 4
Wait, there is more. During writing this article I realized that if QEMU
was setting the ECPR
to the FPU instruction causing the exception then
we would end up in an endless loop.
Luckily the arcitecture anticipated this calling for FPU exceptions to set the next
instruction to be executed to EPCR
. QEMU was missing this logic.
The patch target/openrisc: Set EPCR to next PC on FPE exceptions fixes this up.
RTL Updates
Updating the actual verilog RTL CPU implementations also needed to be done. Updates have been made to both the mor1kx and the or1k_marocchino implementations.
mor1kx Updates
Updates to the mor1kx to support user mode reads and write to the FPCSR
were done in the patch:
Make FPCSR is R/W accessible for both user- and supervisor- modes.
The full patch is:
@@ -618,7 +618,7 @@ module mor1kx_ctrl_cappuccino
spr_fpcsr[`OR1K_FPCSR_FPEE] <= 1'b0;
end
else if ((spr_we & spr_access[`OR1K_SPR_SYS_BASE] &
- (spr_sr[`OR1K_SPR_SR_SM] & padv_ctrl | du_access)) &&
+ (padv_ctrl | du_access)) &&
`SPR_OFFSET(spr_addr)==`SPR_OFFSET(`OR1K_SPR_FPCSR_ADDR)) begin
spr_fpcsr <= spr_write_dat[`OR1K_FPCSR_WIDTH-1:0]; // update all fields
`ifdef OR1K_FPCSR_MASK_FLAGS
The change to verilog shows that before when writng (spr_we
) to the FPCSR (OR1K_SPR_FPCSR_ADDR
) register
we used to check that the supervisor bit (OR1K_SPR_SR_SM
) bit of the sr spr (spr_sr
) is set. That check
enforced supervisor mode only write access, removing this allows user space to write to the regsiter.
Updating mor1kx to support tininess checking before rounding was done in the change Refactoring and implementation tininess detection before rounding. I will not go into the details of these patches as I don’t understand them so much.
Marocchino Updates
Updates to the or1k_marocchino to support user mode reads and write to the FPCSR
were done in the patch:
Make FPCSR is R/W accessible for both user- and supervisor- modes.
The full patch is:
@@ -714,7 +714,7 @@ module or1k_marocchino_ctrl
assign except_fpu_enable_o = spr_fpcsr[`OR1K_FPCSR_FPEE];
wire spr_fpcsr_we = (`SPR_OFFSET(({1'b0, spr_sys_group_wadr_r})) == `SPR_OFFSET(`OR1K_SPR_FPCSR_ADDR)) &
- spr_sys_group_we & spr_sr[`OR1K_SPR_SR_SM];
+ spr_sys_group_we; // FPCSR is R/W for both user- and supervisor- modes
`ifdef OR1K_FPCSR_MASK_FLAGS
reg [`OR1K_FPCSR_ALLF_SIZE-1:0] ctrl_fpu_mask_flags_r;
Updating the marocchino to support dttectig tininess before rounding was done in the patch: Refactoring FPU Implementation for tininess detection BEFORE ROUNDING. I will not go into details of the patch as I didn’t write them. In general it is a medium size refactoring of the floating point unit.
Summary
We discussed updates to the architecture simulators and verilog CPU implementations to allow supporting user mode floating point programs. These updates will now allow us to port Linux and glibc to the OpenRISC floating point unit.
Further Reading
- QEMU Translation Internals - details on how QEMU works