Exercise 1.9
1.9.1 Find the percentage of the total dissipated power comprised by static power.
1.9.2 If the total dissipated power is reduced by 10% while maintaining the static to total power rate of problem 1.9.1, how much should the voltage be reduced to maintain the same leakage current? 1.9.3 Determine the ratio of static power to dynamic power for each technology.
1.9.4 Determine the static power for each version at 0.8 V, assuming a static to dynamic power ratio of 0.6. Powerst/Powerdyn = 0.6 => Powerst = 0.6 × Powerdyn
a) Powerst = 0.6 × 35 W = 21 W b) Powerst = 0.6 × 30 W = 18 W
1.9.5 Determine the static power and dynamic power dissipation assuming the rates obtained in problem 1.9.1. Llk = Voltage / Powerst a) Ilk = 21/0.8 = 26.25 A
b) Ilk = 18/0.8 = 22.5 A
Exercise 1.10
1.10.1 The table above shows the number of instructions required per processor to complete a program on a multiprocessor with 1, 2, 4, or 8 processors. What is the total number of instructions executed per processor? What is the aggregate number of instructions executed across all processors? Instructions per processor = (ArithInst x ArithCpi)+( loadInst x loadCpi)+( branchInst x branchCpi) Total Instructions = Instructions per processor * processors Processors Instructions per processor Total Instructions 1 4096 4096 2 2048 4096 a 4 1024 4096 8 512 4096 1 4096 4096 2 2278 4556 b 4 1464 5856 8 1132 9056 1.10.2 Given the CPI values on the right of the table above, find the total execution time for this program on I, 2, 4, and 8 processors. Assume that each processor has a 2 GHz clock frequency.
1.10.3 If the CPI of arithmetic instructions was doubled, what would the impact be on the execution time of the program on 1,2,4, or 8 processors?
1.10.4 Assuming a 3 GHz clock frequency, what is the execution time of the program using 1, 2, 4, or 8 cores.
1.10.5 Assuming that the power consumption of a processor core can be described by the following equation:
Power = (5.0n1A)/(MHz)* Voltage^2
Where the operation voltage of the processor is described by the following equation: Voltage = 1/5*Frequency+0.4
With the frequency measured in GHz. So, at 5 GHz, the voltage would be 1.4V. Find the power
consumption ofthe program executing on 1, 2, 4, and 8 cores assuming that each core is operating at a 3
GHz clock frequency. Likewise, find the power consumption of the program executing on I, 2, 4, or 8 cores assuming that each core is operating at 500 MHz.
Exercise 1.11
1.11.1 Find the yield. Wafer area = π × (d/2)2
a. Wafer area = π × 7.52 = 176.7 cm2 b. Wafer area = π × 12.52 = 490.9 cm2
Die area = wafer area/dies per wafer
a. Die area = 176.7/90 = 1.96 cm2 b. Die area = 490.9/140 = 3.51 cm2
Yield = 1/ (1 + (defect per area × die area)/2)2
a. Yield = 0.97 b. Yield = 0.92
1.11.2 Find the cost per die. Cost per die = cost per wafer/ (dies per wafer × yield)
a. Cost per die = 0.12 b. Cost per die = 0.16
1.11.3 If the number of dies per wafer is increased by 100/0 and the defects per area unit increases by 150/0, find the die area and yield. a. Dies per wafer = 1.1 × 90 = 99
Defects per area = 1.15 × 0.018 = 0.021 defects/cm2
Die area = wafer area/Dies per wafer = 176.7/99 = 1.78 cm2 Yield = 0.97
b. Dies per wafer = 1.1 × 140 = 154
Defects per area = 1.15 × 0.024 = 0.028 defects/cm2
Die area = wafer area/Dies per wafer = 490.9/154 = 3.19 cm2 Yield = 0.93
1.11.4 Find the defects per area unit for each technology given a die area of 200 mm2 Yield = 1/(1 + (defect per area × die area)/2)2
Defect per area = (2/die area) (y?1/2? 1) Replacing values for T1 and T2 we get
T1: defects per area = 0.00085 defects/mm2 = 0.085 defects/cm2 T2: defects per area = 0.00060 defects/mm2 = 0.060 defects/cm2 T3: defects per area = 0.00043 defects/mm2 = 0.043 defects/cm2 T4: defects per area = 0.00026 defects/mm2 = 0.026 defects/cm2
Exercise 1.12
1.12.1 Find the CPI if the clock cycle time is 0.333 ns. CPI = clock rate × CPU time/instr. count
Clock rate = 1/cycle time = 3 GHz a. CPI (pearl) = 3 × 10^9 × 500/2118 × 10^9 = 0.7 b. CPI (mcf) = 3 × 10^9 × 1200/336 × 10^9 = 10.7
1.12.2 Find the SPEC ratio.
SPECratio = ref. time/execution time. a. SPECratio (pearl) = 9770/500 = 19.54 b. SPECratio (mcf) = 9120/1200 = 7.6
1.12.3 For these two benchmarks, find the geometric mean.
(19.54 × 7.6)1/2 = 12.19
1.12.4 Find the increase in CPU time if the number of instruction of the benchmark is increased by 100/0 without affecting the CPI. CPU time = number of instructions × CPI/clock rate
If CPI and clock rate do not change, then the CPU time will increase equal to the number of instructions, which would be by 10%.
1.12.5 Find the increase in CPU time if the number of instruction of the benchmark is increased by 10% and the CPI is increased by 5%. CPU time (before) = number of instructions × CPI/clock rate
CPU time (after) = 1.1 × number of instructions × 1.05 × CPI/clock rate CPU times (after)/CPU time (before) = 1.1 × 1.05 = 1.155. CPU time is increased by 15.5%
1.12.6 Find the change in the SPECratio for the change described in 1.12.5. SPECratio = reference time/CPU time
SPECratio (after)/SPECratio (before) = CPU time (before)/CPU time (after) = 1/1.1555 = 0.86 The SPECratio is decreased by 14%.
Exercise 1.13
1.13.1 Find the new CPI. CPI = (CPU × clock rate)/Number of instr.
a. CPI = 450 × 4 × 10^9/ (0.85 × 2118 × 10^9) = 0.99 b. CPI = 1150 × 4 × 10^9/ (0.85 × 336 × 10^9) = 16.10
1.13.2 In general, these CPI values are larger than those obtained in previous exercises for the same benchmarks. This is due mainly to the clock rate used in both cases, 3 GHz and 4 GHz. Determine whether the increase in the CPI is similar to that of the clock rate. If they are dissimilar, why? Clock rate ratio = 4 GHz/3 GHz = 1.33.
a. CPI at 4 GHz = 0.99, CPI at 3 GHz = 0.7, ratio = 1.41 b. CPI at 4 GHz = 16.1, CPI at 3 GHz = 10.7, ratio = 1.50
They are different because the CPU time has been reduced by a lower percentage, even though the number of instructions has been reduced by 15%,
1.13.3 How much has the CPU time been reduced? a. 450/500 = 0.90
CPU time reduction = 10%
b. 1150/1200 = 0.958
CPU time reduction = 4.2%
1.13.4 If the execution time is reduced by an additional 10% without affecting the CPI and with a clock rate of 4 GHz, determine the number of instructions. Number of instructions = CPU × clock rate/CPI.
a. Number of instructions = 820 × 0.9 × 4 × 10^9/0.96 = 3075 × 10^9 b. Number of instructions = 580 × 0.9 × 4 × 10^9/2.94 = 710 × 10^9
1.13.5 Determine the clock rate required to give a further 10% reduction in CPU time while maintaining the nun1ber of instructions and CPI unchanged. Clock rate = Number of instructions × CPI/CPU time.
Clock ratenew = Number of instructions × CPI/0.9 × CPU time = 1/0.9 clock rateold = 3.33 GHz.
1.13.6 Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20% while the number of instructions is unchanged. Clock rate = Number of instructions × CPI/CPU time.
Clock ratenew = Number of instructions × 0.85 × CPI/0.80 CPU time = 0.85/0.80 clock rateold = 3.18 GHz.
Exercise 1.14
1.14.1 One usual fallacy is to consider the computer with the largest clock rate as having the large perfonl1ance. Check if this is true for PI and P2. Number of instructions = 106
Tcpu (P1) = 106 × 1.25/4 × 10^9 = 0.315 × 10–3 s Tcpu (P2) = 106 × 0.75/3 × 10^9 = 0.25 × 10–3 s
Clock rate (P1) > clock rate (P2), but performance (P1) < performance (P2)
1.14.2 Another fallacy is to consider that the processor executing the largest number of instruction will need a larger CPU time. Considering that processor PI is executing a sequence of 106 instructions and that the CPI of processors PI and P2 do not change, detenl1ine the number of instructions that P2 can execute in the same time that PI needs to execute 106 instructions. P1: 106 instructions, Tcpu (P1) = 0.315 × 10–3 s
P2: Tcpu (P2) = N × 0.75/3 × 109
N = 1.26 × 106
1.14.3 A common fallacy is to use MIPS (millions of instructions per second) to con1pare the
performance of two different processors, and consider that the processor with the largest MIPS has the largest perforn1ance. Check if this is true for PI and P2.
MIPS = Clock rate × 10?6 /CPI
MIPS (P1) = 4 × 10^9 × 10–6/1.25 = 3200 MIPS (P2) = 3 × 10^9 × 10–6/0.75 = 4000
MIPS (P1) < MIPS (P2), performance (P1) < performance (P2) in this case
Another common performance figure is MFLOPS (millions of floating-point operations per second), defined as
MFLOPS = Number of FP operations/(execution time x 10^6) 1.14.4 Find the MFLOPS figures for the programs. a. FP op = 106 × 0.4 = 4 × 105, clock cylesfp = CPI × Number of FP instr. = 4 × 105
Tfp = 4 × 105 × 0.33 × 10–9 = 1.32 × 10–4 then MFLOPS = 3.03 × 103
b. FP op = 3 × 106 × 0.4 = 1.2 × 106, clock cylesfp = CPI × Number of FP instr. = 0.70 × 1.2 ×
106
Tfp = 0.84 × 106 × 0.33 × 10–9 = 2.77 × 10–4 then MFLOPS = 4.33 × 103
1.14.5 Find the MIPS figures for the programs. 5 CPU clock cycles = FP cycles + CPI (L/S) × Number of instr. (L/S) + CPI (Branch) × Number of instr. (Branch)
a. 5 × 105 L/S instr., 4 × 105 FP instr. and 105 Branch instr.
CPU clock cycles = 4 × 105 + 0.75 × 5 × 105 + 1.5 × 105 = 9.25 × 105 Tcpu = 9.25 × 105 × 0.33 × 10–9 = 3.05 × 10–4 MIPS = 106/ (3.05 × 10–4 × 106) = 3.2 × 103
b. 1.2 × 106 L/S instr., 1.2 × 106 FP instr. and 0.6 × 106 Branch instr.
CPU clock cycles = 0.84 × 106 + 1.25 × 1.2 × 106 + 1.25 × 0.6 × 106 = 3.09 × 106 Tcpu = 3.09 × 106 × 0.33 × 10–9 = 1.01 × 10–3 MIPS = 3 × 106/ (1.01 × 10–3 × 106) = 2.97 × 103
1.14.6 Find the performance for the programs and con1pare with MIPS and MFLOPS. a. Performance = 1/Tcpu = 3.2 × 103
b. Performance = 1/Tcpu = 9.9 × 102
The second program has the higher performance, but the ?rst program has the higher MIPS ?gure.
Exercise 1.15
1.15.1 By how much is the total time reduced if the time for FP operations is reduced by 20%? a. Tfp = 35 × 0.8 = 28 s; Tp1 = 28 + 85 + 50 + 30 = 193 s; Reduction of 3.5%
b. Tfp = 50 × 0.8 = 40 s; Tp4 = 40 + 80 + 50 + 30 = 200 s; Reduction of 4.7%
1.15.3 Can the total time be reduced by 20% by reducing only the time for branch instructions? a. Tp1 = 200 × 0.8 = 160 s; Tfp + Tint + Tl/s = 170 s. No the total time cannot be reduced by 20%
b. Tp4 = 210 × 0.8 = 168 s; Tfp + Tint + Tl/s = 180 s. No the total time cannot be reduced by 20%
Assume that each processor has a 2 GHz clock rate.
1.15.4 By how much must we improve the CPI of FP instructions if we want the program to run two times faster?
Clock cyles = CPIfp × Number of FP instr. + CPIint × Number of INT instr. + CPIl/s × Number of L/S instr. + CPIbranch × Number of branch instr.
Tcpu = clock cycles/clock rate = clock cycles/2 × 109
a. 1 processor=> clock cycles = 8192; Tcpu = 4.096 s b. 8 processors=> clock cycles = 1024; Tcpu = 0.512 s
To half the number of clock cycles by improving the CPI of FP instructions:
CPIimproved fp × Number of FP instr. + CPIint × Number of INT instr. + CPIl/s × Number of L/S instr. + CPIbranch × Number of branch instr. = clock cycles/2
CPIimproved fp = (clock cycles/2 ? (CPIint × Number of INT instr. + CPIl/s × Number of L/S instr. + CPIbranch × Number of branch instr.))/Number of FP instr.
a. 1 processor: CPIimproved fp = (4096 – 7632)/560 => not possible b. 8 processors: CPIimproved fp = (512 – 944)/80 => not possible
1.15.5 By how much must we improve the CPI of L/S instructions if we want the program to run two times faster? Using clock cycle from 1.15.4:
To half the number of clock cycles improving the CPI of L/S instructions:
CPIfp × Number of FP instr. + CPIint × Number of INT instr. + CPIimproved l/s × Number of L/S instr. + CPIbranch × Number of branch instr. = clock cycles/2
CPIimproved l/s = (clock cycles/2 ? (CPIfp × Number of FP instr. + CPIint × Number of INT instr. + CPIbranch × Number of branch instr.))/Number of L/S instr.
a. 1 processor: CPIimproved l/s = (4096 – 3072)/1280 = 0.8 b. 8 processors: CPIimproved l/s = (512 – 384)/160 = 0.8
1.15.6 By how much is the execution time of the program improved if the CPI of INT and FP instruction is reduced by 40% and the CPI of L/S and branch is reduced by 30%? Clock cyles = CPIfp× Number of FP instr. + CPIint× Number of INT instr. + CPIl/s× Number of L/S instr. +
CPIbranch × Number of branch instr.
Tcpu = clock cycles/clock rate = clock cycles/2 × 109
CPIint = 0.6 × 1 = 0.6; CPIfp = 0.6 × 1 = 0.6; CPIl/s = 0.7 × 4 = 2.8; CPIbranch = 0.7 × 2 = 1.4
a. 1 processor => Tcpu (before improved) = 4.096 s; Tcpu (after improved) = 2.739 s b. 8 processors =>Tcpu (before improved) = 0.512 s; Tcpu (after improved) = 0.342 s
百度搜索“70edu”或“70教育网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,70教育网,提供经典综合文库homework-1(2)在线全文阅读。
相关推荐: