《可扩展并行计算 技术、结构与编程 英文版》求取 ⇩

PartⅠScalability and Clustering1

Chapter 1 Scalable Computer Platforms and Models3

1.1 Evolution of Computer Architecture5

1.1.1 Computer Generations5

1.1.2 Scalable Computer Architectures6

1.1.3 Converging System Architectures8

1.2 Dimensions of Sealability9

1.2.1 Resource Scalability9

1.2.2 Application Scalability11

1.2.3 Technology Scalability12

1.3 Parallel Computer Models13

1.3.1 Semantic Attributes14

1.3.2 Performance Attributes17

1.3.3 Abstract Machine Models18

1.3.4 Physical Machine Models26

1.4 Basic Concepts of Clustering30

1.4.1 Cluster Characteristics30

1.4.2 Architectural Comparisons31

1.4.3 Benefits and Difficulties of Clusters32

1.5.1 Principle of Independence37

1.5 Scalable Design Principles37

1.5.2 Principle of Balanced Design39

1.5.3 Design for Scalability44

1.6 Bibliographic Notes and Problems47

Chapter 2 Basics of Parallel Programming51

2.1 Parallel Programming Overview51

2.1.1 Why Is Parallel Programming Difficult?52

2.1.2 Parallel Programming Environments55

2.1.3 Parallel Programming Approaches56

2.2.1 Definitions of an Abstract Process59

2.2 Processes, Tasks, and Threads59

2.2.2 Execution Mode62

2.2.3 Address Space63

2.2.4 Process Context65

2.2.5 Process Descriptor66

2.2.6 Process Control67

2.2.7 Variations of Process70

2.3 Parallelism Issues71

2.3.1 Homogeneity in Processes72

2.3.2 Static versus Dynamic Parallelism74

4.3 Microprocessor Architecture Families74

2.3.3 Process Grouping75

2.3.4 Allocation Issues76

2.4 Interaction/Communication Issues77

2.4.1 Interaction Operations77

2.4.2 Interaction Modes80

2.4.3 Interaction Patterns82

2.4.4 Cooperative versus Competitive Interactions84

2.5 Semantic Issues in Parallel Programs85

2.5.1 Program Termination85

2.5.2 Determinacy of Programs86

2.6 Bibliographic Notes and Problems87

3.1 System and Application Benchmarks91

Chapter 3 Performance Metrics and Benchmarks91

3.1.1 Micro Benchmarks92

3.1.2 Parallel Computing Benchmarks96

3.1.3 Business and TPC Benchmarks98

3.1.4 SPEC Benchmark Family100

3.2 Performance versus Cost102

3.2.1 Execution Time and Throughput103

3.2.2 Utilization and Cost-Effectiveness104

3.3 Basic Performance Metrics108

3.3.1 Workload and Speed Metrics108

3.3.2 Caveats in Sequential Performance111

3.4.1 Computational Characteristics113

3.4 Performance of Parallel Computers113

3.4.2 Parallelism and Interaction Overheads115

3.4.3 Overhead Quantification118

3.5 Performance of Parallel Programs126

3.5.1 Performance Metrics126

3.5.2 Available Parallelism in Benchmarks131

3.6.1 Amdahl s Law: Fixed Problem Size134

3.6.2 Gustafson s Law: Fixed Time136

3.6.3 Sun and Ni s Law: Memory Bounding139

3.6.4 Isoperformance Models144

3.7Bibliographic Notes and problems148

Part ⅡEnabling Technologies153

Chapter 4 Microprocessors as Building Blocks155

4.1 System Development Trends155

4.1.1 Advances in Hardware156

4.1.2 Advances in Software159

4.1.3 Advances in Applications160

4.2.1 Basics of Instruction Pipeline164

4.2 Principles of Processor Design164

4.2.2 From CISC to RISC and Beyond169

4.2.3 Architectural Enhancement Approaches172

4.3.1 Major Architecture Families174

4.3.2 Superscalar versus Superpipelined Processors175

4.3.3 Embedded Microprocessors180

4.4.1 Digital s Alpha 21164 Microprocessor182

4.4 Case Studies of Microprocessors182

4.4.2 Intel Pentium Pro Processor186

4.5 Post-RISC, Multimedia,and VLIW191

4.5.1 Post-RISC Processor Features191

4.5.2 Multimedia Extensions195

4.5.3 The VLIW Architecture199

4.6 The Future of Microprocessors201

4.6.1 Hardware Trends and Physical Limits201

4.6.3 Future Workloads and Challenges203

4.6.3 Future Microprocessor Architectures204

4.7 Bibliographic Notes and Problems206

5.1.1 Characteristics of Storage Devices211

5.1 Hierarchical Memory Technology211

Chapter 5 Distributed Memory and Latency Tolerance211

5.1.2 Memory Hierarchy Properties214

5.1.3 Memory Capacity Planning217

5.2 Cache Coherence Protocols220

5.2.1 Cache Coherency Problem220

5.2.2 Snoopy Coherency Protocols222

5.2.3 The MESI Snoopy Protocol224

5.3.1 Memory Event Ordering228

5.3 Shared-Memory Consistency228

5.3.2 Memory Consistency Models231

5.3.3 Relaxed Memory Models234

5.4 Distributed Cache/Memory Architecture237

5.4.1 NORMA, NUMA ,COMA, and DSM Models237

5.4.2 Directory-Based Coherency Protocol243

5.4.3 The Stanford Dash Multiprocessorr245

5.4.5 Directory-Based Protocol in Dash248

5.5 Latency Tolerance Techniques250

5.5.1 Latency Avoidance, Reduction ,and Hiding250

5.5.2 Distributed Coherent Caches253

5.5.3 Data Prefetching Strategies255

5.6 Multithreaded Latency Hiding257

5.5.4 Effects of Relaxed memory Consistency257

5.6.1 Multithreaded Processor Model258

5.6.2 Context-Switching Policies260

5.6.3 Combining latency Hiding Mechanisms265

5.7 Bibliographic Notes and Problems266

Chapter 6 System Interconnects and Gigabit Networks273

6.1 Basics of Interconnection Network273

6.1.1 Interconnection Environments273

6.1.2 Network Components276

6.1.3 Network Characteristics277

6.1.4 Network Performance Metrics280

6.2 Network Topologies and Properties281

6.2.1 Topological and Functional Properties281

6.2.2 Routing Schemes and Functions283

6.2.3 Networking Topologies286

6.3.1 Multiprocessor Buses294

6.3 Buses, Crossbar ,and Multistage Switches294

6.3.2 Crossbar Switches298

6.3.3 Multistage Interconnection Networks301

6.3.4 Comparison of Switched Interconnects305

6.4 Gigabit Network Technologies307

6.4.1 Fiber Channel and FDDI Rings307

6.4.2 Fast Ethernet and Gigabit Ethernet310

6.4.3 Myrinet for SAN/LAN Construction313

6.4.4 HiPPI and SuperHiPPI314

3.6 Scalability and Speedup Analysis314

6.5 ATM Switches and Networks318

6.5.1 ATM Technology318

6.5.2 ATM Network Interfaces320

6.5.3 Four Layers of ATM Architecture321

6.5.4 ATM Internetwork Connectivity324

6.6 Scalable Coherence Interface326

6.6.1 SCI Interconnects327

6.6.2 Implementation Issues329

6.6.3 SCI Coherence Protocol332

6.7 Comparison of Network Technologies334

6.7.1 Standard Networks and Perspectives334

6.7.2 Network Performance and Applications335

6.8 Bibliographic Notes and Problems337

Chapter 7 Threading, Synchronization ,and Communication343

7.1 Software Multithreading343

7.1.1 The Thread Concept344

7.1.2 Threads Management346

7.1.3 Thread Synchronization348

7.2 Synchronization Mechanisms349

7.2.1 Atomicity versus Mutual Exclusion349

7.2.2 High-Level Synchronization Constructs355

7.2.3 Low-Level Synchronization Primitives360

7.2.4 Fast Locking Mechanisms364

7.3 The TCP/IP Communication Protocol Suite366

7.3.1 Features of The TCP/IP Suite367

7.3.2 UDP, TCP, and IP371

7.3.3 The Sockets Interface375

7.4 Fast and Efficient Communication376

7.4.1 Key Problems in Communication377

7.4.2 The log P Communication Model384

7.4.3 Low-Level Communications Support386

7.4.4 Communication Algorithms396

7.5 Bibliographic Notes and Problems398

Part Ⅲ Systems Architecture403

Chapter 8 Symmetric and CC-NUMA Multiprocessors407

8.1 SMP and CC-NUMA Technology407

8.1.1 Multiprocessor Architecture407

8.1.2 Commercial SMP Servers412

8.1.3 The Intel SHV Server Board413

8.2 Sun UItra Enterprise 10000 System416

8.2.1 The Uitra E-10000 Architecture416

8.2.2 System Board Architecture418

8.2.3 Scalability and Availability Support418

8.2.4 Dynamic Domains and Performance420

8.3 HP/Convex Exemplar X-Class421

8.3.1 The Exemplar X System Architecture421

8.3.2 Exemplar Software Environment424

8.4 The Sequent NUMA-Q 2000425

8.4.1 The NUMA-Q 2000 Architecture426

8.4.2 Software Environment of NUMA-Q430

8.4.3 Performance of the NUMA-Q431

8.5 the SGI/Cray Origin 2000 Superserver434

8.5.1 Design Goals of Origin 2000 Series434

8.5.2 The Origin 2000 Architecture435

8.5.3 The Cellular IRIX Environment443

8.5.4 Performance of the Origin 2000447

8.6 Comparison of CC-NUMA Architectures447

8.7 Bibliographic Notes and Problems451

9.1.1 Classification of Clusters453

Chapter 9 Support of Clustering and Availability453

9.1 Challenges in Clustering453

9.1.2 Cluster Architectures456

9.1.3 Cluster Design Issues457

9.2 Availability Support for Clustering459

9.2.1 The Availability Concept460

9.2.2 Availability Techniques463

9.2.3 Checkpointing and Failure Recovery468

9.3 Support for Single System Image473

9.3.1 Single System Image Layers473

9.3.2 Single Entry and Single File Hierarchy475

9.3.3 Single I/O,Networking,and Memory Space479

9.4 Single System Image in Solaris MC482

9.4.1 Global File System482

9.4.2 Global Process Management484

9.4.3 Single I/O System Image485

9.5 Job Management in Clusters486

9.5.1 Job Management System486

9.5.2 Survey of Job Management Systems492

9.5.3 Load-Sharing Facility(LSF)494

9.6 Bibliographic Notes and Problems501

Chapter 10 Clusters of Servers and Workstations505

10.1 Cluster Products and Research Projects505

10.1.1 Supporting Trend of Cluster Products506

10.1.2 Cluster of SMP Servers508

10.1.3 Cluster Research Projects509

10.2 Microsoft Wolfpack for NT Clusters511

10.2.1 Microsoft Wolfpack Configurations512

10.2.2 Hot Standby Multiserver Clusters513

10.2.3 Active Availability Clusters514

10.2.4 Fault-Tolerant Multiserver Cluster516

10.3 The IBM SP System518

10.3.1 Design Goals and Strategies518

10.3.2 The SP2 System Architecture521

10.3.3 I/O and Internetworking523

10.3.4 The SP System Software526

10.3.5 The SP2 and Beyond530

10.4 The Digital TurCluster531

10.4.1 The TruCluster Architecture531

10.4.2 The Memory Channel Interconnect534

10.4.3 Programming the TruCluster537

10.4.4 The TruCluster System Software540

10.5 The Berkeley NOW Project541

10.5.1 Active Messages for Fast Communicatio541

10.5.2 GLUnix for Global Resource Management547

10.5.3 The xFS Serverless Network File System549

10.6 TreadMarks: A Software-Implemented DSM Cluster556

10.6.1 Boundary Conditions556

10.6.2 User Interface for DSM557

10.6.3 Implementation Issues559

10.7 Bibliographic Notes and Problems561

11.1.1 MPP Characteristics and Issues565

11.1 An Overview of MPP Technology565

Chapter 11 MPP Architecture and Performance565

11.1.2 MPP Systems-An Overview569

11.2 The Cray T3E System570

11.2.1 The System Architecture of T3E571

11.2.2 The System Software in T3E573

11.3 New Generation of ASCI/MPPs574

11.3.1 ASCI Scalable Design Strategy574

11.3.2 Hardware and Software Requirements576

11.3.3 Contracted ASCI/MPP Platforms577

11.4.1 The Option Red Architecture579

11.4 Intel/Sandia ASCI Option Red579

11.4.2 Option Red System Software582

11.5 Parallel NAS Benchmark Results584

11.5.1 The NAS Parallel Benchmarks585

11.5.2 Superstep Structure and Granularity586

11.5.3 Memory, I/O, and Communications587

11.6 MPI and STAP Benchmark Results590

11.6.1 MPI Performance Measurements590

11.6.2 MPI Latency and Aggregate Bandwidth592

11.6.3 STAP Benchmark Evaluation of MPPs594

11.6.4 MPP Architectural Implications600

11.7 Bibliographic Notes and Problems603

Part Ⅳ Parallel Programming607

Chapter 12 Parallel Paradigms and Programming Models609

12.1 Paradigms and Programmability609

12.1.1 Algorithmic Paradigms609

12.1.2 Programmability Issues612

12.1.3 Parallel Programming Examples614

12.2 Parallel Programming Models617

12.2.1 Implicit Parallelism617

12.2.2 Explicit Parallel Models621

12.2.3 Comparison of Four Models624

12.2.4 Other Parallel Programming Models627

12.3 Shared-Memory Programming629

12.3.1 The ANSI X3H5 Shared-Memory Model629

12.3.2 The POSIX Threads (Pthreads)Moedl634

12.3.3 The OpenMP Standard636

12.3.4 The SGI Power C Model640

12.3.5 C//: A Structured Parallel C Language643

12.4 Bibliographic Notes and Problems649

13.1.1 Message-Passing Libraries653

Chapter 13 Message-Passing Programming653

13.1 The Message-Passing Paradigm653

13.1.2 Message-Passing Modes655

13.2 Message-Passing Interface(MPI)658

13.2.1 MPI Message661

13.2.2 Message Envelope in MPI668

13.2.3 Point-to-Point Communications674

13.2.4 Collective MIP Communications678

13.2.5 The MPI-2 Extensions682

13.3 Parallel Virtual Machine(PVM)686

13.3.1 Virtual Machine Construction687

13.3.2 Process Management in PVM689

13.3.3 Communication with PVM693

13.4 Bibliographic Notes and Problems699

Chapter 14 Data-Parallel Programming705

14.1 The Data-Parallel Moedl705

14.2 The Fortran 90 Approach706

14.2.1 Parallel Array Operations706

14.2.2 Intrinsic Functions in Fortran 90708

14.3 High-Performance Fortran711

14.3.1 Support for Data Parallelism712

14.3.2 Data Mapping in HPF715

14.3.3 Summary of Fortran 90 and HPF721

14.4 Other Data-Parallel Approaches725

14.4.1 Fortran 95 and Fortran 2001725

14.4.2 The pC++and Nesl Approaches728

14.5 Bibliographic Notes and Problems733

Bibliography737

Web Resources List765

Subject Index787

Author Index799

1999《可扩展并行计算 技术、结构与编程 英文版》由于是年代较久的资料都绝版了,几乎不可能购买到实物。如果大家为了学习确实需要,可向博主求助其电子版PDF文件(由(黄凯)KaiHwang,(徐志伟)ZhiweiXu著 1999 北京:机械工业出版社 出版的版本) 。对合法合规的求助,我会当即受理并将下载地址发送给你。

高度相关资料

计算机会计信息系统  结构与技术(1990 PDF版)
计算机会计信息系统 结构与技术
1990 杭州:浙江大学出版社
微型计算机算法与程序扩展BASIC(1983 PDF版)
微型计算机算法与程序扩展BASIC
1983
计算机算术运算原理、结构与设计(1980 PDF版)
计算机算术运算原理、结构与设计
1980
微型计算机算法与程序 扩展BASIC(1983 PDF版)
微型计算机算法与程序 扩展BASIC
1983
高等计算机系统结构并行性可扩展性可编程性(1995 PDF版)
高等计算机系统结构并行性可扩展性可编程性
1995
工程结构可靠性设计与估计(1990 PDF版)
工程结构可靠性设计与估计
1990 北京:人民交通出版社
结构计算与程序设计(1977 PDF版)
结构计算与程序设计
1977 北京:中国建筑工业出版社
并行处理计算机结构(1982 PDF版)
并行处理计算机结构
1982 北京:国防工业出版社
并行计算结构力学(1993 PDF版)
并行计算结构力学
1993 重庆:重庆大学出版社
并行计算  结构·算法·编程(1999 PDF版)
并行计算 结构·算法·编程
1999 北京:高等教育出版社
工程力学与结构计算(1985 PDF版)
工程力学与结构计算
1985 北京:水利电力出版社
计算机结构与并行处理(1990 PDF版)
计算机结构与并行处理
1990 北京:科学出版社
计算机算术运算  原理、结构与设计(1980 PDF版)
计算机算术运算 原理、结构与设计
1980 北京:科学出版社
数据结构与程序设计技术(1981 PDF版)
数据结构与程序设计技术
1981 北京:科学出版社
冰心儿童散文选(1980 PDF版)
冰心儿童散文选
1980 长春:吉林人民出版社