Thursday, 14 May 2020

Quiz on Textbook Sections 5.4 and 5.10

Question 2: The average memory access time (AMAT) is the average time to access memory, considering both hits and misses, for various kinds of memory accesses.

Solution:


True

Question 4: Match the following descriptions to the terms defined:

Solution:

A cache structure in which a block can be placed in any location in the cache

fully associative cache

A cache that has a fixed number of locations (at least two) where each block can be placed.
set-associative cache

A cache that has a single location in which each block can be placed.
direct-mapped cache

Question 6: Least recently used (LRU) replacement refers to a scheme in which the entry in a set selected for replacement on a miss is the one that has been unused for the least time.

Solution:

False

Question 7: In a multi-level cache system, the miss penalty for the first-level cache is:

Solution:


The number of cycles required to access the second-level cache

Question 8: A dynamically-scheduled processor with out-of-order execution can hide cache-miss latency by executing other independent instructions while the missed word is read from memory.

Solution:


True

Question 9: Match the following descriptions to the terms defined:

Solution:

The property of a memory system that any read of a data item returns the most recently written value of that data item.

coherence

The property of a memory system that determines when a written value will be returned by a read.
consistency

Homework on Textbook Sections 4.7 to 4.11, 5.1 to 5.3

For Questions 1 to 5: In the following code sequence, we need to stall the RISC-V pipeline to resolve the load-use data hazard:

i1: ld   x8, 0(x$5)
i2: add  x9, x8, x10
i3: addi x9, x9, -1
i4: sd   x9, 0(x5)

Consider the cycle when i1 is in the EX stage, i2 is in the ID stage, and i3 is in the IF stage.

Question 1: What is the value of ID/EX.MemRead in this cycle?

Solution:


1

Question 2: What is the value of ID/EX.RegisterRd in this cycle?

Solution:


8

Question 3: What is the value of IF/ID.RegisterRs1 in this cycle?

Solution:


8

Question 4: What is the value of IF/ID.RegisterRs2 in this cycle?

Solution:


10

Question 5: Which instruction (i1, i2, i3, or i4) will be fetch by the IF stage in the next cycle?

Solution:


i3

Question 8: For the 2-bit branch predictor shown in Figure 4.61, label the states anti-clockwise from the bottom left as 0, 1, 2 and 3, as shown below.

Assume the initial state for a branch is state 0. A given branch has the following history (T = taken, N = not taken):

N, N, T, T, T, N, N

What is the state of the predictor after this history?

Solution:


1

For Questions 11 to 15: Consider a system using 28-bit addresses, with a direct-mapped cache of 32Kbytes and a block size of 64 bytes. The cache uses a write-through/no-allocate strategy without a write buffer.

Question 11: How many bits are required for the offset field of an address?

Solution:


6

Question 12: How many bits are required for the index field of an address?

Solution:


9

Question 13:

How many bits are required for the tag field of an address?

Solution:


13

Question 14: How many bits of storage are required for the cache, in addition to those for data storage?

Solution:

7,168


Quiz on Textbook Sections 5.1 to 5.3

Question 1: Match the following descriptions to the defined terms.

Solution:


The principle stating that if a data location is referenced then it will tend to be referenced again soon.
temporal locality

The locality principle stating that if a data location is referenced, data locations with nearby addresses will tend to be referenced soon.
spatial locality

Question 3: For a given level of a memory hierarchy, miss rate + hit rate = 1.

Solution:


True

Question 5: Match the following descriptions to the terms defined.

Solution:

One of thousands of concentric circles that makes up the surface of a magnetic disk.

track

One of the segments that make up a track on a magnetic disk.
sector

All of the tracks under the heads at a given point on all surfaces of a disk drive.
cylinder

The process of positioning a read/write head over the proper track on a disk.

seek

The time required for the desired sector of a disk to rotate under the read/write head.
rotational latency

Question 6: In a direct-mapped cache structure, each memory location is mapped to exactly one location in the cache.

Solution:


True

Question 10:  If a miss occurs in a direct-mapped write-back cache and the block to be replaced is not dirty, the block must still be written back to memory.

Solution:


False

Quiz on Textbook Sections 4.7 to 4.11

Question 1: In the following RISC-V instruction sequence executed in the 5-stage pipeline, which instructions use forwarded data?

i1:  sub  x2, x1, x3    # Register x2 written by sub
i2:  and  x12, x2, x5   # 1st operand (x2) depends on sub
i3:  or   x13, x6, x2   # 2nd operand (x2) depends on sub
i4:  add  x14, x2, x2   # 1st (x2) and 2nd (x2) depend on sub
i5:  sd   x15, 100(x2)  # Base (x2) depends on sub

Solution:


i2 and i3

Question 2: In the case of a load-use data hazard, how does the pipeline stall the instruction using the loaded data?

Solution:


It prevents update of the PC and IF/ID pipeline registers, and sets the control values for EX, MEM and WB to 0 in the ID/EX pipeline register.

Question 3: If branch computation is moved from the EX stage to the ID stage, forwarding paths are required from the EX/MEM and MEM/WB pipeline registers to the branch comparison logic in the ID stage.

Solution:


True

Question 4: Match the following descriptions to the correct terms.

Solution:

Prediction of branches at runtime using runtime information.

dynamic branch prediction

A small memory that is indexed using the address of the branch instruction and that contains bits indicating whether the branch was recently taken or not.
branch prediction buffer

A structure that caches the destination PC or destination instruction for a branch.

branch target buffer

A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch.

tournament branch predictor

Question 5: Which of the following events would cause an exception or interrupt in a RISC-V computer system?

Solution:


A request from an I/O device
An undefined instruction
An operating system request from a user program

Question 6: In a static dual-issue processor with 5 pipeline stages, what is the maximum number of instructions that can be in progress at any time?

Solution:


10

Question 7: Loop unrolling is a technique to get more performance from loops that access arrays, in which multiple copies of the loop body are made and instructions from different iterations are scheduled together.

Solution:


True

Question 8: Match the following descriptions to the defined terms.

Solution:


Hardware support for reordering the order of instruction execution so as to avoid stalls.
dynamic scheduling

A situation in pipelined execution when an instruction blocked from executing does not cause the following instructions to wait.
out-of-order execution

A commit in which the results of pipelined execution are written to the programmer visible state in the same order that instructions are fetched.
in-order commit

The buffer that holds results in a dynamically scheduled processor until it is safe to store the results to memory or a register.
reorder buffer

Question 9: Which of the following correctly describes the ARM Cortex-A8 processor?

Solution:


Dynamic multiple-issue, static in-order pipeline scheduling

Question 10: Which of the following correctly describes the Intel Core i7 920 processor?

Solution:


Dynamic multiple-issue, dynamic out-of-order pipeline scheduling

Homework on Textbook Sections 4.1 to 4.6

For Questions 1 to 11: Questions 1 to 11 deal with the processor datapath and control shown in Figure 4.17 of the textbook, executing the following instruction, located at address 0x0000000000204014 in the instruction memory:
sd x7, 0x18(x24)

Question 1: What is the value of the Branch control signal?

Solution:


0

Question 2:  What is the value of the MemRead control signal?

Solution:


0

Question 3: What is the value of the MemtoReg control signal?

Solution:


X

Question 4: What is the value of the ALUOp control signal, expressed in binary?

Solution:


00

Question 5: What is the value of the MemWrite control signal?

Solution:


1

Question 6: What is the value of the ALUSrc control signal?

Solution:


1

Question 7: What is the value of the RegWrite control signal?

Solution:


0

Question 8: What is the output value of the ALU Control block, expressed in binary?

Solution:


0010

Question 9: What is the output value of the ALU source multiplexer, expressed in hex?

Solution:


0X0000000000000018

Question 10: What is the value of the Zero output flag of the ALU?

Solution:

0

Question 11: What is the output value of the branch target adder, expressed in hex?

Solution:


0X0000000000204044

For Questions 12 and 13: Suppose the blocks in Figure 4.17 of the textbook have latencies shown in the following table.

 Block Latency
 PC read 20ps
 PC setup 15ps
 I-Mem 250ps
 Add 70ps
 Shift-left-1
 5ps
 Mux 20ps
 Regs read 150ps
 Regs setup 15ps
 Imm Gen 10ps
 ALU 100ps
 D-Mem 350ps
 Control 80ps
 ALU Ctrl 30ps

Question 12: What is the minimum clock period for this processor (in ps)?

Solution:


905

Question 13: Adding a multiplier to the ALU results in 100ps additional latency. However, it reduces the number of instructions needed in a program, since multiplications no longer need to be emulated in software. What fraction of the original instruction count must the reduced instruction count be to match the performance of the original processor?

Solution:


0.9005

For Questions 14 to 16: A 5-stage pipelined version of the RISC-V processor has the following latencies for the stages, including the overhead for pipeline registers between stages:

 IF ID EX MEM WB
 500ps 300ps 400ps 600ps 100ps

Question 14: What is the maximum clock frequency (in GHz) for this processor?

Solution:


1.67

Question 15: Suppose you can divide the IF stage into two stages, each with latency 300ps, but with a cost increase of 5%. Similarly, you can divide the MEM stage into two stages, each with 350ps latency, but with cost increase of 5%. If you are only allowed to chose one of these options, what would the resulting maximum clock frequency be (in GHz)?

Solution:


2

Question 16: If you are allowed to chose both options from Question 15 based on best cost/performance, decide whether to chose one or both options. What is the maximum clock frequency (in GHz) based on your choice?

Solution:


2.5

Question 20: In Section 4.5 of the textbook, on page 272, it is suggested that the hardware for branch processing (register comparison, target address calculation and PC update) all be included in the second stage of the pipeline (the ID stage). What is the speedup of this hardware change compared to the pipelined datapath using the ALU and adder in the EX stage and completing the branch in the MEM stage? Assume that branches account for 17% of instructions, as suggested on page 270, that no branch prediction is used, that the average CPI for all other instructions is 1.2, and that there is no effect on clock period.

Solution:


1.25

Quiz on Textbook Sections 4.5 and 4.6

Question 2: Suppose a single-cycle datapath has 4 major function units each with a latency of 200ps. A pipelined version of this datapath has one stage for each of the function units, with the same latencies. What is the speed up of the pipelined version compared to the single-cycle datapath?

Solution:


4

Question 3: Match the following descriptions to the defined terms.

Solution:


When a planned instruction cannot execute in the proper clock cycle because the hardware does not support the combination of instructions that are set to execute.

structural hazard

When a planned instruction cannot execute in the proper clock cycle because data that is needed to execute the instruction is not yet available.

data hazard

When the proper instruction cannot execute in the proper pipeline clock cycle because the instruction that was fetched is not the one that is needed; that is, the flow of instruction addresses is not what the pipeline expected.

control hazard

Question 4: For the following RISC-V code sequence:
ld    x7, 0(x3)
addi  x8, x7, 1
the RISC-V pipeline can use forwarding to completely eliminate stall cycles.

Solution:


False

Question 5: How many stall cycles are required the following code sequence executing in the RISC-V pipeline, assuming all required forwarding paths are included?
slli  x6, x10, 3
add   x6, x6, x18
ld    x20, 0(x6)
addi  x28, x20, -1

Solution:


1

Question 6:  Branch prediction reduces the effect of branch hazards by assuming a given outcome for a branch and proceeding from that assumption, rather than waiting to ascertain the actual outcome.

Solution: 


True

Question 7: What is the latency, in clock cycles, for the following instruction in the RISC-V pipeline, assuming no stall cycles?
andi  x30, x8, 0x0ff

Solution:


5

Question 9: Consider execution of a sd instruction in the RISC-V pipeline, with the instruction word being fetched in cycle n, and no stalls. In which cycle is the value of the MemWrite control signal determined, and in which cycle is it used?

Solution:


Determined in cycle n + 1, used in cycle n + 3

Question 10: The pipeline registers in the RISC-V pipeline contain control signal values for use in subsequent pipeline stages.

Solution:


True