《高级计算机体系结构 英文》求取 ⇩

PART 1 THEORY OF PARALLELISM1

Chapter 1 Parallel Computer Models3

1.1 The State of Computing3

1.1.1 Computer Development Milestones3

1.1.2 Elements of Modern Computers6

1.1.3 Evolution of Computer Architecture9

1.1.4 System Attributes to Pertormance14

Foreword17

Preface19

1.2 Multiprocessors and Multicomputers19

1.2.1 Shared-Memory Multiprocessors19

1.2.2 Distrlbuted-Memory Multicomputers24

1.2.3 A Taxonomy of MIMD Computers27

1.3 Multivector and SIMD Computers27

1.3.1 Vector Supercomputers27

1.3.2 SIMD Supercomputers30

1.4 PRAM and VLSI Models32

1.4.1 Parallel Random-Access Machines33

1.4.2 VLSI Complexity Model38

1.5 Architectural Development Tracks41

1.5.1 Multiple-Processor Tracks41

1.5.2 Multivector and SIMD Tracks43

1.5.3 Multithreaded and Dataflow Tracks44

1.6 Bibliographic Notes and Exercises45

Chapter 2 Program and Network Properties51

2.1 Conditions of Parallelism51

2.1.1 Data and Resource Dependences51

2.1.2 Hardware and Software Parallelism57

2.1.3 The Role of Compilers60

2.2 Program Partitioning and Scheduling61

2.2.1 Grain Sizes and Latency61

2.2.2 Grain Packing and Scheduling64

2.2.3 Static Multiprocessor Scheduling67

2.3 Program Flow Mechanisms70

2.3.1 Control Flow Versus Data Flow71

2.3.2 Demand-Driven Mechanisms74

2.3.3 Comparison of flow Mechanisms75

2.4 System Interconnect Architectures76

2.4.1 Network Properties and Rorting77

2.4.2 Static Connection Networks80

2.4.3 Dynamic Connection Networks89

2.5 Bibliographic Notes and Exercises96

Chapter 3 Principles of Scalable Performance105

3.1 Performance Metrics and Measures105

3.1.1 Parallelism Profile in Programs105

3.1.2 Harmonic Mean Performance108

3.1.3 Efficiency,Utilization,and Quality112

3.1.4 Standard Performance Measures115

3.2.1 Massive Parallelism for Grand Challenges118

3.2 Parallel Processing Applications118

3.2.2 Application Models of Parallel Computers122

3.2.3 Scalability of Parallel Algorithms125

3.3 Speedup Performance Laws129

3.3.1 Amdahl s Law for a Fixed Workload129

3.3.2 Gustafson s Law for Scaled Problems131

3.3.3 Memory-Bounded Speedup Model134

3.4.1 Scalability Metrics and Goals138

3.4 Scalability Analysis and Approaches138

3.4.2 Evolution of Scalable Computers143

3.4.3 Research Issues and Solutions147

3.5 Bibliographic Notes and Exercises149

PART Ⅱ HARDWARE TECHNOLOGIES155

Chapter 4 Processors and Memory Hierarchy157

4.1 Advanced Processor Technology157

4.1.1 Design Space of Processors157

4.1.2 Instruction-Set Architectures162

4.1.3 CISC Scalar Processors165

4.1.4 RISC Scalar Processors169

4.2 Superscalar and Vector Processors177

4.2.1 Superscalar Processors178

4.2.2 The VLIW Architecture182

4.2.3 Vector and Symbolic Processors184

4.3 Memory Hierarchy Technology188

4.3.1 Hierarchical Memory Technology188

4.3.2 Inclusion,Coherence,and Locality190

4.3.3 Memory Capacity Planning194

4.4 Virtual Memory Technology196

4.4.1 Virtual Memory Models196

4.4.2 TLB,Paging,and Segmentation198

4.4.3 Memory Replacement Policies205

4.5 Bibliographic Notes and Exercises208

5.1 Backplane Bus Systems213

5.1.1 Backplane Bus Specification213

Chapter 5 Bus,Cache,and Shared Memory213

5.1.2 Addressing and timing Protocols216

5.1.3 Arbitration,Transaction,and Interrupt218

5.1.4 The IEEE Futurebus+Standards221

5.2 Cache Memory Organizations224

5.2.1 Cache Addressing Models225

5.2.2 Direct Mapping and Associative Caches228

5.2.3 Set-Associative and Sector Caches232

5.2.4 Cache Performance Issues236

5.3 Shared-Memory Organizations238

5.3.1 Interleaved Memory Organization239

5.3.2 Bandwidth and Fault Tolerance242

5.3.3 Memory Allocation Schemes244

5.4 Sequential and Weak Consistency Models248

5.4.1 Atomicity and Event Ordering248

5.4.2 Sequential Consistency Model252

5.4.3 Weak Consistency Models253

5.5 Bibliographic Notes and Exercises256

Chapter 6 Pipelining and Superscalar Techniques265

6.1 Linear Pipeline Processors265

6.1.1 Asynchronous and Synchronous Models265

6.1.2 Clocking and Timing Control267

6.1.3 Speedup,Efficiency,and Throughput268

6.2 Nonlinear Pipeline Processors270

6.2.1 Resservation and Latency Analysis270

6.2.2 Collision-Free Scheduling274

6.2.3 Pipeline Schedule Optimization276

6.3 Instruction Pipeline Design280

6.3.1 Instruction Execution Phases280

6.3.2 Mechanisms for Instruction Pipelining283

6.3.3 Dynamic Instruction Scheduling288

6.3.4 Branch Handling Techniques291

6.4 Arithmetic Pipeline Design297

6.4.1 Computer Arithmetic Principles297

6.4.2 Static Arithmetic Pipelines299

6.4.3 Multifunctional Arithmetic Pipelines307

6.5 Superscalar and Superpipeline Design308

6.5.1 Superscalar Pipeline Design310

6.5.2 Superpipelined Design316

6.5.3 Supersymmetry and Design Tradeoffs320

6.6 Bibliographic Notes and Exercises322

PART Ⅲ PARALLEL AND SCALABLE ARCHITECTURES329

Chapter 7 Multiprocessors and Multicomputers331

7.1 Multiprocessor System Interconnects331

7.1.1 Hierarchical Bus Systems333

7.1.2 Crossbar Switch and Multiport Memory336

7.1.3 Multistage and Combining Networks341

7.2 Cache Coherence and Synchronization Mechanisms348

7.2.1 The Cache Coherence Problem348

7.2.2 Snoopy Bus Protocols351

7.2.3 Directory-Based Protocls358

7.2.4 Hardware Synchronization Mechanisms364

7.3.1 Design Choices in the Past368

7.3 Three Generations of Multicomputers368

7.3.2 Present and Future Development370

7.3.3 The Intel Paragon System372

7.4 Message-passing Mechanisms375

7.4.1 Message-Routing Schemes375

7.4.2 Deadlock and Virtual Channels379

7.4.3 Flow Control Strategies383

7.4.4 Multicast Routing Algorithms387

7.5 Bibliographic Notes and Exercises393

Chapter 8 Multivector and SIMD Computers403

8.1 Vector Processing Principles403

8.1.1 Vector Instruction Types403

8.1.2 Vector-Access Memory Schemes408

8.1.3 Past and Present Supercomputers410

8.2.1 Performance-Directed Design Rules415

8.2 Multivector Multiprocessors415

8.2.2 Cray Y-MP,C-90,and MPP419

8.2.3 Fujitsu VP2000 and VPP500425

8.2.4 Mainframes and Minisupercomputers429

8.3 Compound Vector Processing435

8.3.1 Compound Vector Operations436

8.3.2 Vector Loops and Chaining437

8.3.3 Multipipeline Networking442

8.4 SIMD Computer Organizations447

8.4.1 Implementation Models447

8.4.2 The CM-2 Architecture449

8.4.3 The MasPar MP-1 Architecture453

8.5 The Connection Machine CM-5457

8.5.1 A Synchronized MIMD Machine457

8.5.2 The CM-5 Network Archiecture460

8.5.3 Control Processors and Processing Nodes462

8.5.4 Interprocessor Communications465

8.6 Bibliographic Notes and Exercises468

Chapter 9 Scalable,Multithreaded,and Dataflow Architectures475

9.1 Latency-Hiding Techniques475

9.1.1 Shared Virtual Memory476

9.1.2 Prefetching Techniques480

9.1.3 Distributed Coherent Caches482

9.1.4 Scalable Coherence Interface483

9.1.5 Relaxed Memory Consistency486

9.2.1 Multithreading Issues and Solutions490

9.2 Principles of Multithreading490

9.2.2 Multiple-Context Processors495

9.2.3 Multidimensional Architectures499

9.3 Fine-Grain Multicomputers504

9.3.1 Fine-Grain Parallelism505

9.3.2 The MIT J-Machine506

9.3.3 The Caltech Mosaic C514

9.4.1 The Stanford Dash Multiprocessor516

9.4 Scalable and Multithreaded Architectures516

9.4.2 The Kendall Square Research KSR-1521

9.4.3 The Tera Multiprocessor System524

9.5 Dataflow and Hybrid Architectures531

9.5.1 The Evolution of Dataflow Computers531

9.5.2 The ETL/EM-4 in Japan534

9.5.3 The MIT/Motorola*T Prototype536

9.6 Bibliographic Notes and Exercises539

PART IV SOFTWARE FOR PARALLEL PROGRAMMING545

Chapter 10 Parallel Models, Languages,and Compilers547

10.1 Parallel Programiming Models547

10.1.1 Shared-Variable Model547

10.1.2 Message-Passing Model551

10.1.3 Data-Parallel Model554

10.1.4 Object-Oriented Model556

10.1.5 Functional and Logic Models559

10.2.1 Language Features for Parallelism560

10.2 Parallel Languages and Compilers560

10.2.2 Parallel Language Constructs562

10.2.3 Optimizing Compilers for Parallelism564

10.3 Dependence Analysis of Data Arrays567

10.3.1 Iteration Space and Dependence Analysis567

10.3.2 Subscript Separability and Partitioning570

10.3.3 Categorized Dependence Tests573

10.4 Code Optimization and Scheduling578

10.4.1 Scalar Optimization with Basic Blocks578

10.4.2 Local and Global Optimizations581

10.4.3 Vectorization and Parallelization Methods585

10.4.4 Code Generation and Scheduling592

10.4.5 Trace Scheduling Compilation596

10.5 LooP Parallelization and Pipelining599

10.5.1 Loop Transformation Theory599

10.5.2 Parallelization and Wavefronting602

10.5.3 Tiling and Localization605

10.5.4 Software Pipelining610

10.6 Bibliographic Notes and Exercises612

Chapter 11 Parallel Program Development and Environments617

11.1 Parallel Programming Environments617

11.1.1 Software Tools and Environments617

11.1.2 Y-MP,Paragon,and CM-5 Environments621

11.1.3 Visualization and Performance Tuning623

11.2 Synchronization and Multiprocessing Modes625

11.2.1 Principles of Synchrnization625

11.2.2 Multiprocessor Execution Modes628

11.2.3 Multitasking on Cray Multiprocessors629

11.3 Shared-Variable Program Structures634

11.3.1 Locks for Protected Access634

11.3.2 Semaphores and Applications637

11.3.3 Monitors and Applications640

11.4.1 Distributing the Computation644

11.4 Message-Passing Program Development644

11.4.2 Synchronous Message Passing645

11.4.3 Asynchronous Message Passing647

11.5 Mapping Programs onto Multicomputers648

11.5.1 Domain Decomposition Techniques648

11.5.2 Control Decomposition Techniques652

11.5.3 Heterogeneous Processing656

11.6 Bibliographic Notes and Exercises661

12.1 Multiprocessor UNIX Design Goals667

Chapter 12 UNIX,Mach,and OSF/1 for Parallel Computers667

12.1.1 Conventional UNIX Limitations668

12.1.2 Compatibility and Portability670

12.1.3 Address Space and Load Balancing671

12.1.4 Parallel I/O and Network Services671

12.2 Master-Slave and Multithreaded UNIX672

12.2.1 Master-Slave Kernels672

12.2.2 Floating-Executive Kernels674

12.2.3 Multithreaded UNIX Kernel678

12.3 Multicomputer UNIX Extensions683

12.3.1 Message-Passing OS Models683

12.3.2 Cosmic Environment and Reactive Kernel683

12.3.3 Intel NX/2 Kernel and Extensions685

12.4 Mach/OS Kernel Architecture686

12.4.1 Mach/OS Kernel Functions687

12.4.2 Multithreaded Multitasking688

12.4.3 Message-Based Communications694

12.4.4 Virtual Memory Management697

12.5 OSF/1 Architecture and Applications701

12.5.1 The OSF/1 Architecture702

12.5.2 The OSF/1 Programming Environment707

12.5.3 Improving Performance with Threads709

12.6 Bibliographic Notes and Exercises712

Bibliography717

Index739

Answers to Selected Problems765

1999《高级计算机体系结构 英文》由于是年代较久的资料都绝版了,几乎不可能购买到实物。如果大家为了学习确实需要,可向博主求助其电子版PDF文件(由(美)(黄铠)Kai Hwang著 1999 北京:机械工业出版社 出版的版本) 。对合法合规的求助,我会当即受理并将下载地址发送给你。

高度相关资料

计算机系统结构(1992 PDF版)
计算机系统结构
1992 北京:科学出版社
计算机系统结构  上( PDF版)
计算机系统结构 上
计算机网络体系结构与协议( PDF版)
计算机网络体系结构与协议
计算机组成原理与体系结构(1997 PDF版)
计算机组成原理与体系结构
1997 北京:国防工业出版社
计算机系统结构(1992 PDF版)
计算机系统结构
1992 北京:北京航空航天大学出版社
计算机系统结构(1984 PDF版)
计算机系统结构
1984 西北电讯工程学院出版社
实用计算机体系结构(1982 PDF版)
实用计算机体系结构
1982 北京:人民邮电出版社
计算机系统结构(1996 PDF版)
计算机系统结构
1996 北京:高等教育出版社
美神之囚(1990 PDF版)
美神之囚
1990 上海:上海文艺出版社
并行计算机体系结构  英文版·第2版(1999 PDF版)
并行计算机体系结构 英文版·第2版
1999 北京:机械工业出版社
计算机系统结构(1998 PDF版)
计算机系统结构
1998 北京:清华大学出版社
计算机体系结构(1988 PDF版)
计算机体系结构
1988 北京:中国铁道出版社
计算机系统结构(1985 PDF版)
计算机系统结构
1985 黑龙江省:黑龙江工业出版社
计算机系统结构(1997 PDF版)
计算机系统结构
1997 重庆:重庆大学出版社
计算机体系结构(1988 PDF版)
计算机体系结构
1988 长沙:国防科技大学出版社