Random Questioning: floating point

Q1: Write a C function equivalent to the following RISC-V assembly language code.

clear: addi x5, x0, 0
       addi x7, x0, '-'
       jal   x0, L2
L1:    sb    x7, 0(x6)
       addi x5, x5, 1
L2:    add   x6, x10, x5
       lbu   x28, 0(x6)
       bne   x28, x0, L1
       jalr x0, 0(x1)
Solution: void clear (char s[]){
int i;
i=0;
while (s[i] != '\0'){
s[i] = '-';
i++;
}
}

Q2: What doubleword, in hexadecimal, is placed in register x8 by the following instructions?

    lui x8, 0xA9344
    ori x8, x8, 0x01C
Solution: 0xFFFFFFFFA934401C

Q3: What word, in hexadecimal, encodes the branch instruction in the following code sequence?

    beq   x9, x24, L1
    slli x8, x7, 2
    add   x8, x8, x15
    ld    x8, 0(x8)
    sd    x8, 0(x23)
L1: addi x7, x7, 1

Solution: 0x01848A63

Q4: Suppose the following code sequence is executed on two processors in parallel without synchronization:

ld   x6, 0(x10) // load x
add x6, x6, x6 // double x
sd   x6, 0(x10) // store x

If the variable x is initially 8, what are the possible final values for x? Assume that memory only does one load or store at a time, and that a pending load or store waits until the memory is not busy.

Solution: 16 or 32

Q5: The swap procedure on page 135 of the textbook is called by the sort procedure in Figure 2.25 on page 139. By how much would the dynamic instruction count change per iteration of the inner loop if the swap procedure were inlined?

Solution: 4 fewer instructions

Q6: Measurements of programs running on a processor show the following relative frequencies of execution and CPI values:

Arithmetic instructions: 50%, 1 cycle

Load instructions: 15%, 3 cycles

Store instructions: 10%, 2 cycles

Branch instructions: 25%: 2 cycles

The clock frequency of the processor is 2GHz.
Suppose we augment a processor’s instruction set by adding a load indexed instruction that forms the effective address by adding two register values. The new instruction allows a sequence such as:
add rtmp, rs1, rs2
ld rd, 0(rtmp)
to be replaced by a single instruction
ldx rd, (rs1+rs2) // Load from address rs1+rs2 to rd
20% of the loads in the original processor are preceded by add instructions such that the pair can be replaced by a ldx instruction in the new processor. The CPI for the ldx instruction is 3 cycles, but its inclusion slows down the clock frequency of the processor.
What is the minimum clock frequency in GHz required for the new processor to ensure its performance is at least that of the original processor?

Solution: 1.963

Q7: Consider a 32-bit multiplier organized in a similar way to Figure 3.7 in the textbook. Suppose the time required for each adder is 4ns. How long (in ns) does the multiplier take to multiply two 32-bit operands?
Solution: 20

Q8: Use the RISC-V instructions described in the textbook (Sections 3.3 and 3.4 and the green card) to write assembly language code for the following C statement. Assume all variables are of type int, with a, b, c and d in x10, x11, x12 and x18, respectively.
d = (a * b) % c;

Solution: mul x18, x10, x11
rem x18, x18, x12

Q9: Write the hexadecimal word for the IEEE 754 single-precision representation of the decimal +81.75.

Solution: 0x42A38000

Q10: What decimal number does the hexadecimal word 0xBFF00000 represent as an IEEE 754 single precision floating point value?

Solution: -1.875

Q11: Write RISC-V instructions for the following C statement, assuming the variables y and a are of type double and are in floating-point registers, x is an array of double with the base address in x9, and i is of type int in x18.
y = y + a * x[i];

Solution:

Assumption: a is in f9, y is in f8 and f0 is used as temporary storage
slli x5, x18, 3
add x5, x5, x9
fld f0, 0(x5)
fmul.d f0, f9, f0
fadd.d f8, f8, f0

Q1: For what combinations of operands can signed addition overflow?
Solution: Both operands positive or both operands negative

Q2: What usually happens when an overflow occurs during addition to calculate an address in a program?
Solution: Nothing - the overflow is ignored.

Q3: How many clock cycles would the sequential multiplier shown in Figure 3.5 of the textbook take to multiply 64-bit operands?
Solution: 64

Q4: A parallel multiplier using tree-structured stack of adders is much faster than a sequential multiplier, but at the cost of significantly more hardware resources.
Solution: True

Q5: Can we construct a fast parallel divider is a similar way to the way we make a fast parallel multiplier?
Solution: No, because we need to use the sign of the difference calculated at each step in order to perform the next step.

Q6: How many bits are used for single precision floating-point values?
Solution: 32 bits: 1 sign bit, 8 exponent bits, and 23 fraction bits

Q7: The revised IEEE 754-2008 standard added a 16-bit floating-point format with five exponent bits. What do you think is the likely range of numbers it could represent?
Solution: ±1.0000 0000 00×2^−14 to ±1.1111 1111 11×2^15, ±0, ±∞, NaN

Q8: The hardware for floating-point operations is significantly more complex than that for integer operations.
Solution: True

Q9: From the statements below, select those that correctly describe floating point instructions in the RISC-V instruction set.
Solution: Floating-point arithmetic instructions operate on different registers to integer instructions

Q10: The Intel SE2 instruction extensions provide subword parallelism by what means?
Solution: By having 128-bit wide registers and a floating-point operations on short vectors of 4 single-precision or 2 double-precision elements.

Random Questioning

Wednesday, 22 April 2020

Homework on Textbook Sections 2.9 to 2.14, 2.17, 2.19, 3.1 to 3.7, 3.9

Quiz on Textbook Sections 3.1 to 3.7, 3.9