Showing posts with label microprocessor. Show all posts
Showing posts with label microprocessor. Show all posts

Thursday, 14 May 2020

Quiz on Textbook Sections 4.7 to 4.11

Question 1: In the following RISC-V instruction sequence executed in the 5-stage pipeline, which instructions use forwarded data?

i1:  sub  x2, x1, x3    # Register x2 written by sub
i2:  and  x12, x2, x5   # 1st operand (x2) depends on sub
i3:  or   x13, x6, x2   # 2nd operand (x2) depends on sub
i4:  add  x14, x2, x2   # 1st (x2) and 2nd (x2) depend on sub
i5:  sd   x15, 100(x2)  # Base (x2) depends on sub

Solution:


i2 and i3

Question 2: In the case of a load-use data hazard, how does the pipeline stall the instruction using the loaded data?

Solution:


It prevents update of the PC and IF/ID pipeline registers, and sets the control values for EX, MEM and WB to 0 in the ID/EX pipeline register.

Question 3: If branch computation is moved from the EX stage to the ID stage, forwarding paths are required from the EX/MEM and MEM/WB pipeline registers to the branch comparison logic in the ID stage.

Solution:


True

Question 4: Match the following descriptions to the correct terms.

Solution:

Prediction of branches at runtime using runtime information.

dynamic branch prediction

A small memory that is indexed using the address of the branch instruction and that contains bits indicating whether the branch was recently taken or not.
branch prediction buffer

A structure that caches the destination PC or destination instruction for a branch.

branch target buffer

A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch.

tournament branch predictor

Question 5: Which of the following events would cause an exception or interrupt in a RISC-V computer system?

Solution:


A request from an I/O device
An undefined instruction
An operating system request from a user program

Question 6: In a static dual-issue processor with 5 pipeline stages, what is the maximum number of instructions that can be in progress at any time?

Solution:


10

Question 7: Loop unrolling is a technique to get more performance from loops that access arrays, in which multiple copies of the loop body are made and instructions from different iterations are scheduled together.

Solution:


True

Question 8: Match the following descriptions to the defined terms.

Solution:


Hardware support for reordering the order of instruction execution so as to avoid stalls.
dynamic scheduling

A situation in pipelined execution when an instruction blocked from executing does not cause the following instructions to wait.
out-of-order execution

A commit in which the results of pipelined execution are written to the programmer visible state in the same order that instructions are fetched.
in-order commit

The buffer that holds results in a dynamically scheduled processor until it is safe to store the results to memory or a register.
reorder buffer

Question 9: Which of the following correctly describes the ARM Cortex-A8 processor?

Solution:


Dynamic multiple-issue, static in-order pipeline scheduling

Question 10: Which of the following correctly describes the Intel Core i7 920 processor?

Solution:


Dynamic multiple-issue, dynamic out-of-order pipeline scheduling

Wednesday, 22 April 2020

Homework on Textbook Sections 2.9 to 2.14, 2.17, 2.19, 3.1 to 3.7, 3.9

Q1: Write a C function equivalent to the following RISC-V assembly language code.

clear: addi  x5, x0, 0
       addi  x7, x0, '-'
       jal   x0, L2
L1:    sb    x7, 0(x6)
       addi  x5, x5, 1
L2:    add   x6, x10, x5
       lbu   x28, 0(x6)
       bne   x28, x0, L1
       jalr  x0, 0(x1)

Solution: void clear (char s[]){
int i;
i=0;
while (s[i] != '\0'){
s[i] = '-';
i++;
}
}

Q2: What doubleword, in hexadecimal, is placed in register x8 by the following instructions?

    lui  x8, 0xA9344
    ori  x8, x8, 0x01C

Solution: 0xFFFFFFFFA934401C

Q3: What word, in hexadecimal, encodes the branch instruction in the following code sequence?

    beq   x9, x24, L1
    slli  x8, x7, 2
    add   x8, x8, x15
    ld    x8, 0(x8)
    sd    x8, 0(x23)
L1: addi  x7, x7, 1


Solution: 0x01848A63

Q4: Suppose the following code sequence is executed on two processors in parallel without synchronization:

ld   x6, 0(x10)  // load x
add  x6, x6, x6  // double x
sd   x6, 0(x10)  // store x

If the variable x is initially 8, what are the possible final values for x? Assume that memory only does one load or store at a time, and that a pending load or store waits until the memory is not busy.


Solution:  16 or 32
 
Q5: The swap procedure on page 135 of the textbook is called by the sort procedure in Figure 2.25 on page 139. By how much would the dynamic instruction count change per iteration of the inner loop if the swap procedure were inlined? 

Solution: 4 fewer instructions
 
Q6: Measurements of programs running on a processor show the following relative frequencies of execution and CPI values:
  • Arithmetic instructions: 50%, 1 cycle
  • Load instructions: 15%, 3 cycles
  • Store instructions: 10%, 2 cycles
  • Branch instructions: 25%: 2 cycles
The clock frequency of the processor is 2GHz.
Suppose we augment a processor’s instruction set by adding a load indexed instruction that forms the effective address by adding two register values. The new instruction allows a sequence such as:
add  rtmp, rs1, rs2
ld   rd, 0(rtmp)
to be replaced by a single instruction
ldx  rd, (rs1+rs2)   // Load from address rs1+rs2 to rd
20% of the loads in the original processor are preceded by add instructions such that the pair can be replaced by a ldx instruction in the new processor. The CPI for the ldx instruction is 3 cycles, but its inclusion slows down the clock frequency of the processor.
What is the minimum clock frequency in GHz required for the new processor to ensure its performance is at least that of the original processor? 


Solution: 1.963

Q7: Consider a 32-bit multiplier organized in a similar way to Figure 3.7 in the textbook. Suppose the time required for each adder is 4ns. How long (in ns) does the multiplier take to multiply two 32-bit operands? 
Solution: 20

Q8: Use the RISC-V instructions described in the textbook (Sections 3.3 and 3.4 and the green card) to write assembly language code for the following C statement. Assume all variables are of type int, with a, b, c and d in x10, x11, x12 and x18, respectively.
d = (a * b) % c;


Solution: mul x18, x10, x11
rem x18, x18, x12

Q9: Write the hexadecimal word for the IEEE 754 single-precision representation of the decimal +81.75. 

Solution: 0x42A38000

Q10: What decimal number does the hexadecimal word 0xBFF00000 represent as an IEEE 754 single precision floating point value? 

Solution: -1.875

Q11: Write RISC-V instructions for the following C statement, assuming the variables y and a are of type double and are in floating-point registers, x is an array of double with the base address in x9, and i is of type int in x18.
y = y + a * x[i];


Solution: 
Assumption: a is in f9, y is in f8 and f0 is used as temporary storage
slli x5, x18, 3
add x5, x5, x9
fld f0, 0(x5)
fmul.d f0, f9, f0
fadd.d f8, f8, f0

Quiz on Textbook Sections 1.1 to 1.4, 1.6 to 1.9

Q1: Match each description with the class of computer. 
(i) General purpose, run a variety of software, subject to cost/performance tradeoff
Solution: Personal computers
(ii) Network based, high capacity, high performance, high reliability, range from small to building sized
Solution: Server computers
(iii) High-end scientific and engineering calculations, highest capability but represent a small fraction of the overall computer market
Solution: Supercomputers
(iv) Hidden as components of systems, stringent power/performance/cost constraints
Solution: Embedded computers

Q2: Which kind of computer can best be described as:
  • Battery operated 
  • Connects to the Internet 
  • Costs a few hundred dollars
  • Has touch screen
Solution: Personal mobile device




Q3: Which of the following are input devices? 
LCD display
Keyboard
Loudspeaker
Touchscreen
Pushbutton
Radio transmitter

Solution:
  1. Keyboard
  2. Touchscreen
  3. Pushbutton
Q4: Which of the following are output devices 
Loudspeaker
Temperature sensor
LED indicator light
Pushbutton
Mouse
LCD display 

Solution:
  1. Loudspeaker
  2. LED indicator light
  3. LCD display  
Q5: Match the following descriptions to the types of memory: 
(i) The storage area in which programs are kept when they are running and that contains the data needed by the running programs.
Solution: Main memory
(ii) Memory built as an integrated circuit; it provides random access to any location. Access times are 50 nanoseconds and cost per gigabyte in 2012 was $5 to $10.
Solution: Dynamic random access memory (DRAM)
(iii) A small, fast memory that acts as a buffer for a slower, larger memory.
Solution: Cache memory
(iv) Memory built as an integrated circuit, but faster and less dense than DRAM.
Solution: Static random access memory (SRAM)
(v) A form of nonvolatile secondary memory composed of rotating platters coated with a magnetic recording material. Because they are rotating mechanical devices, access times are about 5 to 20 milliseconds and cost per gigabyte in 2012 was $0.05 to $0.10.
Solution: Magnetic disk memory
(vi) A nonvolatile semi-conductor memory. It is cheaper and slower than DRAM but more expensive per bit and faster than magnetic disks. Access times are about 5 to 50 microseconds and cost per gigabyte in 2012 was $0.75 to $1.00.
Solution: Flash memory

Q6: Which of the following best defines the term "instruction set architecture"? 
Solution: An abstract interface between the hardware and the lowest-level software that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on.

Q8: Computer C’s performance is 4 times as fast as the performance of computer B, which runs a given application in 28 seconds. How many seconds will computer C take to run that application? 
Solution: 7

Q9: A given application written in Java runs 15 seconds on a desktop processor. A new Java compiler is released that requires only 0.6 as many instructions as the old compiler. Unfortunately, it increases the CPI by 1.1. How fast can we expect the application to run using this new compiler? Pick the right answer from the three choices below: 
Solution: 15 × 0.6 × 1.1 = 9.9 seconds

 Q10: If we increase the clock frequency of a microprocessor, what will happen to the power consumption? 
Solution: Power will increase