A beginner level 5-day workshop on “RISC-V based MYTH” (Microprocessor for You in Thirty Hours). In the workshop the following topics are discussed namely RISC-V specs, RISC-V software, how to implement RISC-V basic specs using TL-Verilog, simulate your own RISC-V core. The final objective by day 5 is to write RTL and build RISC-V core on my own.
A C program which has to be run on a specific hardware layout which is the interior of a chip in your laptop, there is certain flow to be followed. Initially, this particular C program is compiled in it's assembly language program which is nothing but RISC-V ISA (Reduced Instruction Set Compting - V Intruction Set Architecture). Following this, the assembly language program is then converted to machine language program which is the binary language logic 0 and 1 which is understood by the hardware of the computer. Directly after this, we've to implement this RISC-V specification using some RTL (a Hardware Description Language). Finally, from the RTL to Layout it is a standard PnR or RTL to GDSII flow.
For an application software to be run on a hardware there are several processes taking place. To begin with, the apps enters into a block called system software and it converts the application program to binary language. There are various layers in system software in which the major layers or components are OS (Operating System), Compiler and Assembler. At first the OS outputs are small function in C, C++, VB or Java language which are taken by the respective compiler and converted into instructions and the syntax of these instructions varies with the hardware architecture on which the system is implemented. Then, the job of the assembler is to take these instructions and convert it into it's binary format which is basically called as a machine language program. Finally, this binary language is fed to the hardware and it understands the specific functions it has to perform based on the binary code it receives.
For example, if we take a stopwatch app on RISC-V core, then the output of the OS could be a small C function which enters into the compiler and we get output RISC-V instructions following this, the output of the assembler will be the binary code which enters into your chip layout.
For the above stopwatch the following are the input and output of the compiler and assembler.
The output of the compiler are instructions and the output of the assembler is the binary pattern. Now, we need some RTL (a Hardware Description Language) which understands and implements the particular instructions. Then, this RTL is synthesised into a netlist in form of gates which is fabricated into the chip through a physical design implementation.
There are mainly 3 different parts in this course. They are:
- RISC-V ISA
- RTL and synthesis of RISC-V based CPU core - picorv32
- Physical design implementation of picorv32
Contents:
- Introduction to RISC-V basic keywords
- Labwork for RISC-V software toolchain
- Integer number representation
- Signed and unsigned arithmetic operations
The following are basic C program which will do integer addition, multiplication and division.
And these are the output of the compiler for these codes which is nothing but a set of RISC-V instructions which perform the integer addition, multiplication and division.
The following are the different sets of instructions which form the RISC-V ISA:
The following are basic C program which will do floating point addition, multiplication and division.
And these are the output of the compiler for these codes which is nothing but a set of RISC-V instructions which perform the floating point addition, multiplication and division.
The following are the rest of the different sets of instructions which form the RISC-V ISA:
If we look into all the instruction over here these are some interface with which the user can access the RISC-V registers.
In the following highlighted instructions there are some data transfer happening between memory, stack pointer or register.
C Program sum1ton.c:
#include<stdio.h>
int main()
{
int i,sum=0,n=9;
for(i=1;i<=n;i++)
{
sum+=i;
}
printf("The sum of numbers from 1 to %d is %d\n", n, sum);
return 0;
}
Commands to compile & execute and generate output of C program:
# Compiling the code
gcc sum1ton.c
# Executing the code
./a.out
Screenshot
Commands to compile & execute the same C program using RISC-V simulator:
Details regarding optimization options -O1 & -Ofast can be found here.
Details regarding rest of the options -mabi, -march and so on can be found here.
# Compiling code on RISC-V simulator in -O1 optimization
riscv64-unknown-elf-gcc -O1 -mabi=lp64 -march=rv64i -o sum1ton_O1.o sum1ton.c
# Disassembling compiled code
riscv64-unknown-elf-objdump -d sum1ton_O1.o > sum1ton_O1_d.txt
# Compiling code on RISC-V simulator in -Ofast optimization
riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o sum1ton_Ofast.o sum1ton.c
# Disassembling compiled code
riscv64-unknown-elf-objdump -d sum1ton_Ofast.o > sum1ton_Ofast_d.txt
Screenshots
-O1 and -Ofast optimizations instruction count in main() function
# Executing the code
spike pk sum1ton_O1.o
# To debug type
spike -d pk sum1ton_O1.o
# In debug mode
# Run until program counter "10184"
(spike) until pc 0 10184
# Display value of stack pointer "sp"
(spike) reg 0 sp
# Just press "Enter" to run the next instruction
(spike)
# Checking stack pointer "sp" value again
(spike) reg 0 sp
# Just press "Enter" to run the next instruction
(spike)
(spike)
(spike)
# Display value of register "a0"
(spike) reg 0 a0
# Just press "Enter" to run the next instruction
(spike)
# Checking register "a0" value again
(spike) reg 0 a0
# Just press "Enter" to run the next instruction
(spike)
# Checking register "a0" value again
(spike) reg 0 a0
# Exiting debug mode
(spike) q
Screenshot
# Executing the code
spike pk sum1ton_Ofast.o
# To debug type
spike -d pk sum1ton_Ofast.o
# In debug mode
# Run until program counter "100b0"
(spike) until pc 0 100b0
# Display value of register "a0"
(spike) reg 0 a0
# Just press "Enter" to run the next instruction
(spike)
# Checking register "a0" value again
(spike) reg 0 a0
# Display value of stack pointer "sp"
(spike) reg 0 sp
# Just press "Enter" to run the next instruction
(spike)
# Checking stack pointer "sp" value again
(spike) reg 0 sp
# Just press "Enter" to run the next instruction
(spike)
(spike)
# Display value of register "a0"
(spike) reg 0 a0
# Just press "Enter" to run the next instruction
(spike)
# Checking register "a0" value again
(spike) reg 0 a0
# Exiting debug mode
(spike) q
Screenshot
lui command
addi command
Important Terms
Maximum amount of numbers that can be expressed in 64bit
Unsigned 64bit limits
Signed postive representation and signed negative in two's compliment representation
Signed 64bit limits
C Program unshighlow.c:
#include<stdio.h>
#include<math.h>
int main()
{
unsigned long long int max = (unsigned long long int)(pow(2,64)-1);
unsigned long long int maxover = (unsigned long long int)(pow(2,127));
unsigned long long int min = (unsigned long long int)(0);
unsigned long long int minover = (unsigned long long int)(pow(2,64)*-1);
printf("Highest number represented by unsigned long long int is %llu which is calculated by (2^64 - 1)\n", max);
printf("Proving by overflowing above limit (2^127) value of variable still is %llu\n", maxover);
printf("Lowest number represented by unsigned long long int is %llu\n", min);
printf("Proving by overflowing below limit (2^64 * -1) value of variable still is %llu\n", minover);
return 0;
}
Commands to compile & execute the same C program using RISC-V simulator:
# Compiling code on RISC-V simulator in -Ofast optimization
riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o unshighlow_Ofast.o unshighlow.c
# Executing the code
spike pk unshighlow_Ofast.o
Screenshot
C Program signhighlow.c:
#include<stdio.h>
#include<math.h>
int main()
{
long long int max = (long long int)(pow(2,63)-1);
long long int maxover = (long long int)(pow(2,127));
long long int min = (long long int)(pow(2,63)*-1);
long long int minover = (long long int)(pow(2,127)*-1);
printf("Highest positive number represented by signed long long int is %lld which is calculated by (2^63 - 1)\n", max);
printf("Proving by overflowing above limit (2^127) value of variable still is %lld\n", maxover);
printf("Lowest negative number represented by signed long long int is %lld which is calculated by (2^63 * -1)\n", min);
printf("Proving by overflowing below limit (2^127 * -1) value of variable still is %lld\n", minover);
return 0;
}
Commands to compile & execute the same C program using RISC-V simulator:
# Compiling code on RISC-V simulator in -Ofast optimization
riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o signhighlow_Ofast.o signhighlow.c
# Executing the code
spike pk signhighlow_Ofast.o
Screenshot
Contents:
- Application Binary interface (ABI)
- Lab work using ABI function calls
- Basic verification flow using iverilog
There are some portions of the ISA that is directly accessible to the user namely a programmer and operating system through a System Call using which a user can access and utilize these portions of the ISA namely the registers thereby getting direct access to the hardware of the system. And the System Call is performed through or using the Application Binary Interface (ABI) also called as System Call Interface. The portion of the ISA available to user is called the User ISA and the portion available to both system and user is called the User and System ISA.
There are only 32 registers in both riscv64 and riscv32 architecture and each register in riscv64 is 64 bit but memory each unit stores only a byte (8 bits) so 8 units of memory is required to store the data in a riscv64 register and it is done by splitting the data into 8 bytes which is stored in such a way that the Least Significant Byte is stored in the lowest address memory and moving up with each byte. This system where LSB is stored in lowest address memory is called little-endian memory addressing system.
All the instruction sizes in both riscv64 and riscv32 architecture are 32 bit only. Of which 5 bit is the maximum length for indicating and register and using the formula [0 - (2^5 - 1)], 0 to 31 are the register numbers that can be represented which is why there are only 32 registers in both riscv64 and riscv32.
This instruction is used to load data into register from memory. In it's 32bits, opcode or the assembly language instruction or command which in this case is ld is recognized by the using 10 bits of which 3 are of funct3 and the rest 7 are of opcode.
This instruction is used to add contents of 2 registers and store the output in another register. In it's 32bits, opcode or the assembly language instruction or command which in this case is add is recognized by the using 17 bits of which 3 are of funct3, 7 are of funct7 and the rest 7 are of opcode.
This instruction is used to store data from register into memory. In it's 32bits, opcode or the assembly language instruction or command which in this case is sd is recognized by the using 10 bits of which 3 are of funct3 and the rest 7 are of opcode.
This instruction is used to add contents of a register to an immediate or constant value and store the output in a register. The basic syntax is addi rd, rs1, imm
where rd is the destination register, rs1 is the source register and imm is the immediate or constant value.
This instruction is used to compare two registers and if source register content is less than destination register content then branch the code to the address pointed by the label. The basic syntax is blt rd, rs1, label
where rd is the destination register, rs1 is the source register and label is the user-defined label that points to a specific address of a line of code.
This instruction is used to load an immediate or constant value to the upper-immediate (bits 12 to 31) of a register. The basic syntax is lui rd, imm
where rd is the destination register and imm is the immediate or constant value.
This instruction is used to return back the parent program along with returning a value of register.
Instructions that operate only on registers are called R-type instructions.
Instructions that operate on registers and an immediate are called I-type instructions.
Instructions that operate only on source registers and an immediate and are generally used for storing something are called S-type instructions.
C Program 1to9_custom.c:
#include <stdio.h>
extern int load(int x, int y);
int main()
{
int result = 0;
int count = 2;
result = load(0x0, count+1);
printf("Sum of number from 1 to %d is %d\n", count, result);
}
ASM Program load.S:
.section .text
.global load
.type load, @function
load:
add a4, a0, zero //Initialize sum register a4 with 0x0
add a2, a0, a1 // store count of 10 in register a2. Register a1 is loaded with 0xa (decimal 10) from main program
add a3, a0, zero // initialize intermediate sum register a3 by 0
loop: add a4, a3, a4 // Incremental addition
addi a3, a3, 1 // Increment intermediate register by 1
blt a3, a2, loop // If a3 is less than a2, branch to label named <loop>
add a0, a4, zero // Store final result to register a0 so that it can be read by main program
ret
Commands to compile & execute the same C program using RISC-V simulator:
# Compiling code on RISC-V simulator in -Ofast optimization
riscv64-unknown-elf-gcc -Ofast -mabi=lp64 -march=rv64i -o 1to9_custom_Ofast.o 1to9_custom.c load.S
# Executing the code
spike pk 1to9_custom_Ofast.o
# Disassembling compiled code
riscv64-unknown-elf-objdump -d 1to9_custom_Ofast.o > 1to9_custom_Ofast_d.txt
Screenshot
Running the 1to9_custom.c on picorv32.v microprocessor verilog code by passing the the C code as .hex file.
Commands to compile & execute the same C program on picorv32.v microprocessor code:
# List all files
ls
# Change permission of all files in current folder to 755
chmod 755 rv32im.sh
# Execute the bash script
./rv32im.sh
# List all files
ls
Screenshot
Contents:
- Combinational logic in TL-Verilog using Makerchip
- Sequential and pipelined logic
- Validity
- Hierarchy
Contents:
- Microarchitecture and testbench for a simple RISC-V CPU
- Fetch, decode, and execute logic
- RISC-V control logic
Contents:
- Pipelining the CPU
- Load and store instructions and memory
- Completing the RISC-V CPU
- Wrap-up and future opportunities