Тексты на английском
<<  Department of Computer Science & Engineering ART between the Wars  >>
TECH Computer Science
TECH Computer Science
Superscalar Processors vs
Superscalar Processors vs
Superscalar Processor: Intro
Superscalar Processor: Intro
Emergence and spread of superscalar processors
Emergence and spread of superscalar processors
Evolution of superscalar processor
Evolution of superscalar processor
Specific tasks of superscalar processing
Specific tasks of superscalar processing
Parallel decoding {and Dependencies check}
Parallel decoding {and Dependencies check}
Decoding and Pre-decoding
Decoding and Pre-decoding
The principle of perdecoding
The principle of perdecoding
Number of perdecode bits used
Number of perdecode bits used
Specific tasks of superscalar processing: Issue
Specific tasks of superscalar processing: Issue
7.3 Superscalar instruction issue
7.3 Superscalar instruction issue
Issue policies
Issue policies
Instruction issue policies of superscalar processors: ---Performance,
Instruction issue policies of superscalar processors: ---Performance,
Issue rate {How many instructions/cycle}
Issue rate {How many instructions/cycle}
Issue policies: Handing Issue Blockages
Issue policies: Handing Issue Blockages
Issue stopped by True dependency
Issue stopped by True dependency
Issue order of instructions
Issue order of instructions
Aligned vs
Aligned vs
Issue policies: Use of Shelving
Issue policies: Use of Shelving
Direct Issue
Direct Issue
The principle of shelving: Indirect Issue
The principle of shelving: Indirect Issue
Design space of shelving
Design space of shelving
Scope of shelving
Scope of shelving
Layout of shelving buffers
Layout of shelving buffers
Implementation of shelving buffer
Implementation of shelving buffer
Basic variants of shelving buffers
Basic variants of shelving buffers
Using a combined buffer for shelving, renaming, and reordering
Using a combined buffer for shelving, renaming, and reordering
Number of shelving buffer entries
Number of shelving buffer entries
Number of read and write ports
Number of read and write ports
Shelving: Operand fetch policy
Shelving: Operand fetch policy
7.4.4 Operand fetch policies
7.4.4 Operand fetch policies
Operand fetch during instruction issue
Operand fetch during instruction issue
Operand fetch during instruction dispatch
Operand fetch during instruction dispatch
Shelving: Instruction dispatch Scheme //
Shelving: Instruction dispatch Scheme //
7.4.5 instruction dispatch scheme
7.4.5 instruction dispatch scheme
- Dispatch policy
- Dispatch policy
Dispatch policy: Dispatch order
Dispatch policy: Dispatch order
Trend of Dispatch order
Trend of Dispatch order
-Dispatch rate (instructions/cycle)
-Dispatch rate (instructions/cycle)
Maximum issue rate <= Maximum dispatch rates >> issue rate reaches max
Maximum issue rate <= Maximum dispatch rates >> issue rate reaches max
- Scheme for checking the availability of operands: The principle of
- Scheme for checking the availability of operands: The principle of
Schemes for checking the availability of operand
Schemes for checking the availability of operand
Operands fetched during dispatch or during issue
Operands fetched during dispatch or during issue
Use of multiple buses for updaing multiple individual reservation
Use of multiple buses for updaing multiple individual reservation
Interal data paths of the powerpc 604
Interal data paths of the powerpc 604
-Treatment of an empty reservation station
-Treatment of an empty reservation station
7.4.6 Detail Example of Shelving
7.4.6 Detail Example of Shelving
Example overview
Example overview
Cycle i: Issue of the ‘mul’ instruction into the reservation station
Cycle i: Issue of the ‘mul’ instruction into the reservation station
Cycle i+1: Checking for executable instructions and dispatching of the
Cycle i+1: Checking for executable instructions and dispatching of the
Cycle i+1 (2nd phase): Issue of the subsequent two ‘ad’ instructions
Cycle i+1 (2nd phase): Issue of the subsequent two ‘ad’ instructions
Cycle i+2: Checking for executable instruction (mul not yet completed)
Cycle i+2: Checking for executable instruction (mul not yet completed)
Cycle i+3: Updating the FX register file with the result of the ‘mul’
Cycle i+3: Updating the FX register file with the result of the ‘mul’
Cycle i+3 (2nd phase): Checking for executable instructions and
Cycle i+3 (2nd phase): Checking for executable instructions and
=Instruction Issue policies:Register Renaming
=Instruction Issue policies:Register Renaming
Register Remaining and dependency
Register Remaining and dependency
Choronology of introduction of renaming (high complexity, Sparc64 used
Choronology of introduction of renaming (high complexity, Sparc64 used
Static or Dynamic Renaming
Static or Dynamic Renaming
>Design space of register renaming
>Design space of register renaming
-Scope of register renaming
-Scope of register renaming
-Layout of rename buffers
-Layout of rename buffers
-Type of rename buffers
-Type of rename buffers
Rename buffers hold intermediate results
Rename buffers hold intermediate results
-Number of rename buffers
-Number of rename buffers
-Basic mechanisms used for accessing rename buffers
-Basic mechanisms used for accessing rename buffers
-Operand fetch policies and Rename Rate
-Operand fetch policies and Rename Rate
7.5.8 Detailed example of renaming
7.5.8 Detailed example of renaming
Structure of the rename buffers and their supposed initial contents
Structure of the rename buffers and their supposed initial contents
Renaming steps
Renaming steps
Allocation of a new rename buffer to destination register (circular
Allocation of a new rename buffer to destination register (circular
(After allocation) of a destination register
(After allocation) of a destination register
Accessing abailable register values
Accessing abailable register values
Accessing a register value that is not yet available
Accessing a register value that is not yet available
Re-allocate of r2 (a destination register)
Re-allocate of r2 (a destination register)
Updating the rename buffers with computed result of {mul r2, r0, r1}
Updating the rename buffers with computed result of {mul r2, r0, r1}
Deallocation of the rename buffer no
Deallocation of the rename buffer no
7.6 Parallel Execution
7.6 Parallel Execution
7.7 Preserving Sequential Consistency of instruction execution //
7.7 Preserving Sequential Consistency of instruction execution //
Sequential consistency models
Sequential consistency models
Consistency relate to instruction completions or memory access
Consistency relate to instruction completions or memory access
Trend and performance
Trend and performance
Allows the reordering of memory access
Allows the reordering of memory access
Using Re-Order Buffer (ROB) for Preserving: The order in which
Using Re-Order Buffer (ROB) for Preserving: The order in which
Principle of the ROB {Circular Buffer}
Principle of the ROB {Circular Buffer}
Introduction of ROBs in commercial superscalar processors
Introduction of ROBs in commercial superscalar processors
Use ROB for speculative execution
Use ROB for speculative execution
Design space of ROBs
Design space of ROBs
Basic layout of ROBs
Basic layout of ROBs
ROB implementation details
ROB implementation details
7.8 Preserving the Sequential consistency of exception processing
7.8 Preserving the Sequential consistency of exception processing
Sequential consistency of exception processing
Sequential consistency of exception processing
Use ROB for preserving sequential order of interrupt requests
Use ROB for preserving sequential order of interrupt requests
7.9 Implementation of superscalar CISC processors using superscalar
7.9 Implementation of superscalar CISC processors using superscalar
The peinciple of superscalar CISC execution using a superscalar RISC
The peinciple of superscalar CISC execution using a superscalar RISC
PentiumPro: Decoding/converting CISC instructions to RISC operations
PentiumPro: Decoding/converting CISC instructions to RISC operations
Case Studies: R10000 Core part of the micro-architecture of the R10000
Case Studies: R10000 Core part of the micro-architecture of the R10000
Case Studies: PowerPC 620
Case Studies: PowerPC 620
Case Studies: PentiumPro Core part of the micro-architecture
Case Studies: PentiumPro Core part of the micro-architecture
PentiumPro Long pipeline: Layout of the FX and load pipelines
PentiumPro Long pipeline: Layout of the FX and load pipelines

Презентация: «TECH Computer Science». Автор: . Файл: «TECH Computer Science.ppt». Размер zip-архива: 9495 КБ.

TECH Computer Science

содержание презентации «TECH Computer Science.ppt»
СлайдТекст
1 TECH Computer Science

TECH Computer Science

Superscalar Processors

7.1 Introduction 7.2 Parallel decoding 7.3 Superscalar instruction issue 7.4 Shelving 7.5 Register renaming 7.6 Parallel execution 7.7 Preserving the sequential consistency of instruction execution 7.8 Preserving the sequential consistency of exception processing 7.9 Implementation of superscalar CISC processors using a superscalar RISC core 7.10 Case studies of superscalar processors

CH01

2 Superscalar Processors vs

Superscalar Processors vs

VLIW

3 Superscalar Processor: Intro

Superscalar Processor: Intro

Parallel Issue Parallel Execution {Hardware} Dynamic Instruction Scheduling Currently the predominant class of processors Pentium PowerPC UltraSparc AMD K5- HP PA7100- DEC ?

4 Emergence and spread of superscalar processors

Emergence and spread of superscalar processors

5 Evolution of superscalar processor

Evolution of superscalar processor

6 Specific tasks of superscalar processing

Specific tasks of superscalar processing

7 Parallel decoding {and Dependencies check}

Parallel decoding {and Dependencies check}

What need to be done

8 Decoding and Pre-decoding

Decoding and Pre-decoding

Superscalar processors tend to use 2 and sometimes even 3 or more pipeline cycles for decoding and issuing instructions >> Pre-decoding: shifts a part of the decode task up into loading phase resulting of pre-decoding the instruction class the type of resources required for the execution in some processor (e.g. UltraSparc), branch target addresses calculation as well the results are stored by attaching 4-7 bits + shortens the overall cycle time or reduces the number of cycles needed

9 The principle of perdecoding

The principle of perdecoding

10 Number of perdecode bits used

Number of perdecode bits used

11 Specific tasks of superscalar processing: Issue

Specific tasks of superscalar processing: Issue

12 7.3 Superscalar instruction issue

7.3 Superscalar instruction issue

How and when to send the instruction(s) to EU(s)

13 Issue policies

Issue policies

14 Instruction issue policies of superscalar processors: ---Performance,

Instruction issue policies of superscalar processors: ---Performance,

tread-----?

15 Issue rate {How many instructions/cycle}

Issue rate {How many instructions/cycle}

CISC about 2 RISC:

16 Issue policies: Handing Issue Blockages

Issue policies: Handing Issue Blockages

17 Issue stopped by True dependency

Issue stopped by True dependency

True dependency ? (Blocked: need to wait)

18 Issue order of instructions

Issue order of instructions

19 Aligned vs

Aligned vs

unaligned issue

20 Issue policies: Use of Shelving

Issue policies: Use of Shelving

21 Direct Issue

Direct Issue

22 The principle of shelving: Indirect Issue

The principle of shelving: Indirect Issue

23 Design space of shelving

Design space of shelving

24 Scope of shelving

Scope of shelving

25 Layout of shelving buffers

Layout of shelving buffers

26 Implementation of shelving buffer

Implementation of shelving buffer

27 Basic variants of shelving buffers

Basic variants of shelving buffers

28 Using a combined buffer for shelving, renaming, and reordering

Using a combined buffer for shelving, renaming, and reordering

29 Number of shelving buffer entries

Number of shelving buffer entries

30 Number of read and write ports

Number of read and write ports

how many instructions may be written into (input ports) or read out from (output parts) a particular shelving buffer in a cycle depend on individual, group, or central reservation stations

31 Shelving: Operand fetch policy

Shelving: Operand fetch policy

32 7.4.4 Operand fetch policies

7.4.4 Operand fetch policies

33 Operand fetch during instruction issue

Operand fetch during instruction issue

Reg. file

34 Operand fetch during instruction dispatch

Operand fetch during instruction dispatch

Reg. file

35 Shelving: Instruction dispatch Scheme //

Shelving: Instruction dispatch Scheme //

36 7.4.5 instruction dispatch scheme

7.4.5 instruction dispatch scheme

37 - Dispatch policy

- Dispatch policy

Selection Rule Specifies when instructions are considered executable e.g. Dataflow principle of operation Those instructions whose operands are available are executable. Arbitration Rule Needed when more instructions are eligible for execution than can be disseminated. e.g. choose the ‘oldest’ instruction. Dispatch order Determines whether a non-executable instruction prevents all subsequent instructions from being dispatched.

38 Dispatch policy: Dispatch order

Dispatch policy: Dispatch order

39 Trend of Dispatch order

Trend of Dispatch order

40 -Dispatch rate (instructions/cycle)

-Dispatch rate (instructions/cycle)

41 Maximum issue rate <= Maximum dispatch rates >> issue rate reaches max

Maximum issue rate <= Maximum dispatch rates >> issue rate reaches max

more often than dispatch rates

42 - Scheme for checking the availability of operands: The principle of

- Scheme for checking the availability of operands: The principle of

scoreboarding

43 Schemes for checking the availability of operand

Schemes for checking the availability of operand

44 Operands fetched during dispatch or during issue

Operands fetched during dispatch or during issue

45 Use of multiple buses for updaing multiple individual reservation

Use of multiple buses for updaing multiple individual reservation

strations

46 Interal data paths of the powerpc 604

Interal data paths of the powerpc 604

42

47 -Treatment of an empty reservation station

-Treatment of an empty reservation station

48 7.4.6 Detail Example of Shelving

7.4.6 Detail Example of Shelving

Issuing the following instruction cycle i: mul r1, r2, r3 cycle i+1: ad r2, r3, r5 ad r3, r4, r6 format: Rs1, Rs2, Rd

49 Example overview

Example overview

50 Cycle i: Issue of the ‘mul’ instruction into the reservation station

Cycle i: Issue of the ‘mul’ instruction into the reservation station

and fetching of the corresponding operands

51 Cycle i+1: Checking for executable instructions and dispatching of the

Cycle i+1: Checking for executable instructions and dispatching of the

‘mul’ instruction

52 Cycle i+1 (2nd phase): Issue of the subsequent two ‘ad’ instructions

Cycle i+1 (2nd phase): Issue of the subsequent two ‘ad’ instructions

into the reservation station

53 Cycle i+2: Checking for executable instruction (mul not yet completed)

Cycle i+2: Checking for executable instruction (mul not yet completed)

54 Cycle i+3: Updating the FX register file with the result of the ‘mul’

Cycle i+3: Updating the FX register file with the result of the ‘mul’

instruction

55 Cycle i+3 (2nd phase): Checking for executable instructions and

Cycle i+3 (2nd phase): Checking for executable instructions and

dispatching the ‘older’ ’ad’ instruction

56 =Instruction Issue policies:Register Renaming

=Instruction Issue policies:Register Renaming

57 Register Remaining and dependency

Register Remaining and dependency

three-operand instruction format e.g. Rd, Rs1, Rs2 False dependency (WAW) mul r2, …, … add r2, …, … two different rename buffer have to allocated True data dependency (RAW) mul r2, …, … ad …, r2, … rename to e.g. mul p12, …, … ad …, p12, ….

58 Choronology of introduction of renaming (high complexity, Sparc64 used

Choronology of introduction of renaming (high complexity, Sparc64 used

371K transistors that is more than i386)

59 Static or Dynamic Renaming

Static or Dynamic Renaming

60 >Design space of register renaming

>Design space of register renaming

61 -Scope of register renaming

-Scope of register renaming

62 -Layout of rename buffers

-Layout of rename buffers

63 -Type of rename buffers

-Type of rename buffers

64 Rename buffers hold intermediate results

Rename buffers hold intermediate results

Each time a Destination register is referred to, a new rename register is allocated to it. Final results are stored in the Architectural Register file Access both rename buffer and architectural register file to find the latest data, if found in both, the data content in rename buffer (the intermediate result) is chosen. When an instruction completed (retired), (ROB) {retire only in strict program sequence} the correspond rename buffer entry is writing into the architectural register file (as a result modifying the actual program state) the correspond rename buffer entry can be de-allocated

65 -Number of rename buffers

-Number of rename buffers

66 -Basic mechanisms used for accessing rename buffers

-Basic mechanisms used for accessing rename buffers

Rename buffers with associative access (latter e.g.) Rename buffers with indexed access (always corresponds to the most recent instance of renaming)

67 -Operand fetch policies and Rename Rate

-Operand fetch policies and Rename Rate

rename bound: fetch operands during renaming (during instruction issue) dispatch bound: fetch operand during dispatching Rename Rate the maximum number of renames per cycle equals the issue rate: to avoid bottlenecks.

68 7.5.8 Detailed example of renaming

7.5.8 Detailed example of renaming

renaming: mul r2, r0, r1 ad r3, r1, r2 sub r2, r0, r1 format: op Rd, Rs1, Rs2 Assume: separate rename register file, associative access, and operand fetching during renaming

69 Structure of the rename buffers and their supposed initial contents

Structure of the rename buffers and their supposed initial contents

Latest bit: the most recent rename 1, previous 0

70 Renaming steps

Renaming steps

Allocation of a free rename register to a destination register Accessing valid source register value or a register value that is not yet available Re-allocation of destination register Updating a particular rename buffer with a computed result De-allocation of a rename buffer that is no longer needed.

71 Allocation of a new rename buffer to destination register (circular

Allocation of a new rename buffer to destination register (circular

buffer: Head and Tail) (before allocation)

72 (After allocation) of a destination register

(After allocation) of a destination register

73 Accessing abailable register values

Accessing abailable register values

74 Accessing a register value that is not yet available

Accessing a register value that is not yet available

3 is the index

75 Re-allocate of r2 (a destination register)

Re-allocate of r2 (a destination register)

1

76 Updating the rename buffers with computed result of {mul r2, r0, r1}

Updating the rename buffers with computed result of {mul r2, r0, r1}

(register 2 with the result 0)

1

77 Deallocation of the rename buffer no

Deallocation of the rename buffer no

0 (ROB retires instructions) (update tail pointer)

78 7.6 Parallel Execution

7.6 Parallel Execution

Executing several instruction in parallel instructions will generally be finished in out-of-program order to finish operation of the instruction is accomplished, except for writing back the result into the architectural register or memory location specified, and/or updating the status bits to complete writing back the results to retire (ROB) write back the results, and delete the completed instruction from the last ROB entry

79 7.7 Preserving Sequential Consistency of instruction execution //

7.7 Preserving Sequential Consistency of instruction execution //

Multiple EUs operating in parallel, the overall instruction execution should >> mimic sequential execution the order in which instruction are completed the order in which memory is accessed

80 Sequential consistency models

Sequential consistency models

81 Consistency relate to instruction completions or memory access

Consistency relate to instruction completions or memory access

82 Trend and performance

Trend and performance

83 Allows the reordering of memory access

Allows the reordering of memory access

it permits load/store reordering either loads can be performed before pending stores, or vice versa a load can be performed before pending stores only IF none of the preceding stores has the same target address as the load it makes Speculative loads or stores feasible When addresses of pending stores are not yet available, speculative loads avoid delaying memory accesses, perform the load anywhere. When store addresses have been computed, they are compared against the addresses of all younger loads. Re-load is needed if any hit is found. it allows cache misses to be hidden if a cache miss, it allows loads to be performed before the missed load; or it allows stores to be performed before the missed store.

84 Using Re-Order Buffer (ROB) for Preserving: The order in which

Using Re-Order Buffer (ROB) for Preserving: The order in which

instruction are <completed>

1. Instruction are written into the ROB in strict program order: One new entry is allocated for each active instruction 2. Each entry indicates the status of the corresponding instruction issued (i), in execution (x), already finished (f) 3. An instruction is allowed to retire only if it has finished and all previous instruction are already retired. retiring in strict program order only retiring instructions are permitted to complete, that is, to update the program state: by writing their result into the referenced architectural register or memory

85 Principle of the ROB {Circular Buffer}

Principle of the ROB {Circular Buffer}

86 Introduction of ROBs in commercial superscalar processors

Introduction of ROBs in commercial superscalar processors

7.61

87 Use ROB for speculative execution

Use ROB for speculative execution

Guess the outcome of a branch and execution the path before the condition is ready 1. Each entry is extended to include a speculative status field indicating whether the corresponding instruction has been executed speculatively 2. speculatively executed instruction are not allow to retire before the related condition is resolved 3. After the related condition is resolved, if the guess turn out to be right, the instruction can retire in order. if the guess is wrong, the speculative instructions are marked to be cancelled. Then, instruction execution continue with the correct instructions.

88 Design space of ROBs

Design space of ROBs

89 Basic layout of ROBs

Basic layout of ROBs

90 ROB implementation details

ROB implementation details

91 7.8 Preserving the Sequential consistency of exception processing

7.8 Preserving the Sequential consistency of exception processing

When instructions are executed in parallel, interrupt request, which are caused by exceptions arising in instruction <execution>, are also generated out of order. If the requests are acted upon immediately, the requests are handled in different order than in a sequential operation processor called imprecise interrupts Precise interrupts: handling the interrupts in consistent with the state of a sequential processor

92 Sequential consistency of exception processing

Sequential consistency of exception processing

93 Use ROB for preserving sequential order of interrupt requests

Use ROB for preserving sequential order of interrupt requests

Interrupts generated in connection with instruction execution can handled at the correct point in the execution, by accepting interrupt requests only when the related instruction becomes the next to retire.

94 7.9 Implementation of superscalar CISC processors using superscalar

7.9 Implementation of superscalar CISC processors using superscalar

RISC core

CISC instructions are first converted into RISC-like instructions <during decoding>. Simple CISC register-to-register instructions are converted to single RISC operation (1-to-1) CISC ALU instructions referring to memory are converted to two or more RISC operations (1-to-(2-4)) SUB EAX, [EDI] converted to e.g. MOV EBX, [EDI] SUB EAX, EBX More complex CISC instructions are converted to long sequences of RISC operations (1-to-(more than 4)) On average one CISC instruction is converted to 1.5-2 RISC operations

95 The peinciple of superscalar CISC execution using a superscalar RISC

The peinciple of superscalar CISC execution using a superscalar RISC

core

96 PentiumPro: Decoding/converting CISC instructions to RISC operations

PentiumPro: Decoding/converting CISC instructions to RISC operations

(are done in program order)

97 Case Studies: R10000 Core part of the micro-architecture of the R10000

Case Studies: R10000 Core part of the micro-architecture of the R10000

67

98 Case Studies: PowerPC 620

Case Studies: PowerPC 620

99 Case Studies: PentiumPro Core part of the micro-architecture

Case Studies: PentiumPro Core part of the micro-architecture

100 PentiumPro Long pipeline: Layout of the FX and load pipelines

PentiumPro Long pipeline: Layout of the FX and load pipelines

«TECH Computer Science»
http://900igr.net/prezentacija/anglijskij-jazyk/tech-computer-science-136795.html
cсылка на страницу

Тексты на английском

46 презентаций о текстах на английском
Урок

Английский язык

29 тем
Слайды