Showing posts with label floating point. Show all posts
Showing posts with label floating point. Show all posts

Wednesday, 22 April 2020

Homework on Textbook Sections 2.9 to 2.14, 2.17, 2.19, 3.1 to 3.7, 3.9

Q1: Write a C function equivalent to the following RISC-V assembly language code.

clear: addi  x5, x0, 0
       addi  x7, x0, '-'
       jal   x0, L2
L1:    sb    x7, 0(x6)
       addi  x5, x5, 1
L2:    add   x6, x10, x5
       lbu   x28, 0(x6)
       bne   x28, x0, L1
       jalr  x0, 0(x1)

Solution: void clear (char s[]){
int i;
i=0;
while (s[i] != '\0'){
s[i] = '-';
i++;
}
}

Q2: What doubleword, in hexadecimal, is placed in register x8 by the following instructions?

    lui  x8, 0xA9344
    ori  x8, x8, 0x01C

Solution: 0xFFFFFFFFA934401C

Q3: What word, in hexadecimal, encodes the branch instruction in the following code sequence?

    beq   x9, x24, L1
    slli  x8, x7, 2
    add   x8, x8, x15
    ld    x8, 0(x8)
    sd    x8, 0(x23)
L1: addi  x7, x7, 1


Solution: 0x01848A63

Q4: Suppose the following code sequence is executed on two processors in parallel without synchronization:

ld   x6, 0(x10)  // load x
add  x6, x6, x6  // double x
sd   x6, 0(x10)  // store x

If the variable x is initially 8, what are the possible final values for x? Assume that memory only does one load or store at a time, and that a pending load or store waits until the memory is not busy.


Solution:  16 or 32
 
Q5: The swap procedure on page 135 of the textbook is called by the sort procedure in Figure 2.25 on page 139. By how much would the dynamic instruction count change per iteration of the inner loop if the swap procedure were inlined? 

Solution: 4 fewer instructions
 
Q6: Measurements of programs running on a processor show the following relative frequencies of execution and CPI values:
  • Arithmetic instructions: 50%, 1 cycle
  • Load instructions: 15%, 3 cycles
  • Store instructions: 10%, 2 cycles
  • Branch instructions: 25%: 2 cycles
The clock frequency of the processor is 2GHz.
Suppose we augment a processor’s instruction set by adding a load indexed instruction that forms the effective address by adding two register values. The new instruction allows a sequence such as:
add  rtmp, rs1, rs2
ld   rd, 0(rtmp)
to be replaced by a single instruction
ldx  rd, (rs1+rs2)   // Load from address rs1+rs2 to rd
20% of the loads in the original processor are preceded by add instructions such that the pair can be replaced by a ldx instruction in the new processor. The CPI for the ldx instruction is 3 cycles, but its inclusion slows down the clock frequency of the processor.
What is the minimum clock frequency in GHz required for the new processor to ensure its performance is at least that of the original processor? 


Solution: 1.963

Q7: Consider a 32-bit multiplier organized in a similar way to Figure 3.7 in the textbook. Suppose the time required for each adder is 4ns. How long (in ns) does the multiplier take to multiply two 32-bit operands? 
Solution: 20

Q8: Use the RISC-V instructions described in the textbook (Sections 3.3 and 3.4 and the green card) to write assembly language code for the following C statement. Assume all variables are of type int, with a, b, c and d in x10, x11, x12 and x18, respectively.
d = (a * b) % c;


Solution: mul x18, x10, x11
rem x18, x18, x12

Q9: Write the hexadecimal word for the IEEE 754 single-precision representation of the decimal +81.75. 

Solution: 0x42A38000

Q10: What decimal number does the hexadecimal word 0xBFF00000 represent as an IEEE 754 single precision floating point value? 

Solution: -1.875

Q11: Write RISC-V instructions for the following C statement, assuming the variables y and a are of type double and are in floating-point registers, x is an array of double with the base address in x9, and i is of type int in x18.
y = y + a * x[i];


Solution: 
Assumption: a is in f9, y is in f8 and f0 is used as temporary storage
slli x5, x18, 3
add x5, x5, x9
fld f0, 0(x5)
fmul.d f0, f9, f0
fadd.d f8, f8, f0

Quiz on Textbook Sections 3.1 to 3.7, 3.9

Q1: For what combinations of operands can signed addition overflow? 
Solution: Both operands positive or both operands negative
 
Q2: What usually happens when an overflow occurs during addition to calculate an address in a program? 
Solution: Nothing - the overflow is ignored.
 
Q3: How many clock cycles would the sequential multiplier shown in Figure 3.5 of the textbook take to multiply 64-bit operands? 
Solution: 64

Q4: A parallel multiplier using tree-structured stack of adders is much faster than a sequential multiplier, but at the cost of significantly more hardware resources. 
Solution: True

Q5: Can we construct a fast parallel divider is a similar way to the way we make a fast parallel multiplier? 
Solution: No, because we need to use the sign of the difference calculated at each step in order to perform the next step.

Q6: How many bits are used for single precision floating-point values? 
Solution:  32 bits: 1 sign bit, 8 exponent bits, and 23 fraction bits

Q7: The revised IEEE 754-2008 standard added a 16-bit floating-point format with five exponent bits. What do you think is the likely range of numbers it could represent?
 Solution: ±1.0000 0000 00×2^−14 to ±1.1111 1111 11×2^15, ±0, ±∞, NaN

Q8: The hardware for floating-point operations is significantly more complex than that for integer operations.
Solution: True



Q9: From the statements below, select those that correctly describe floating point instructions in the RISC-V instruction set. 
Solution: Floating-point arithmetic instructions operate on different registers to integer instructions

Q10: The Intel SE2 instruction extensions provide subword parallelism by what means? 
Solution: By having 128-bit wide registers and a floating-point operations on short vectors of 4 single-precision or 2 double-precision elements.