CS704 : Advanced Computer Architecture-II

Course Overview

Course Synopsis

This is a graduate level course. It builds on the concepts presented in the undergraduate computer architecture course. The emphasis is given to expose advances in the field through cost-performance-power trade-offs and good engineering design of computers. The course introduces the quantitative principles of computer design, performance enhancement methodologies, static and dynamic exploitation of instruction level parallelism in high-performance processors and performance enhancement of memory and input/output systems.

Course Learning Outcomes

Upon successful completion of this course, students should be able to:

  • Understand the quantitative principles of computer design and metrics for performance measurement.
  • Familiarize the benchmark to analyze the performance of different architectures.
  • Exploit instruction level parallelism using static and dynamic techniques in high-performance processors including superscalar execution.
  • Recognize the centralized and distributed share-memory multiprocessor architectures.
  • Design memory hierarchy and storage systems with optimum performance.
  • Be acquainted to input/output systems design and their performance benchmarks.

Course Calendar

History and Introduction1Hennessy and Patterson (2002)
Growth in Processor Performance, Price-Performance Design, CPU Performance Metrics, CPU Benchmarks Suites2Hennessy and Patterson (2002)
I/O Performance, Performance Enhancement, Concluding: Quantitative Principles3Hennessy and Patterson (2002)
ISA Taxonomy, Memory Addressing Modes, Types of Operands, Types of Operations4Hennessy and Patterson (2002)
Instruction Set Encoding, MIPS Instruction Set5Hennessy and Patterson (2002)
DSP Media Operations, ISA Performance Putting it all Together6Hennessy and Patterson (2002)
Basics of Computer Hardware Design, Single Cycle Design: Data Path Design, Control Design7Hennessy and Patterson (1998)
Example of Single Cycle Design, Multi Cycle Design: Datapath8Stallings (2003)
Assignment No. 1
Features of Multi Cycle Design, Multi Cycle Control Design, Introduction to Pipeline Datapath9Stallings (2003)
Key Components of Pipeline Datapath, Performance Enhancement due to Pipeline, Hazards in Pipelined Datapath10Hennessy and Patterson (2002)
Structural Hazards, Data Hazards, Control Hazards11do.
Longer Pipelines - FP Instructions, Loop Level Parallelism, FP Loop Hazards12do.
In-order Execution, Out-of-Order Execution, Scoreboard Technique13do.
Tomasulo's Approach14do.
Dynamic Branch Prediction, Branch Prediction Buffer15do.
Correlating Branch Predictors, Tournament Predictor, High Performance Instruction Delivery16do.
Assignment No. 2
Superscalar Processors17do.
Hardware-based Speculations, Speculating on the Outcome of Branches, Extension in the Tomasulo’s Hardware18do.
Limitations of ILP19do.
Software Approaches to Exploit ILP20do.
Static Multiple Issue: VLIW Approach, Detecting and Enhancing Loop Level Parallelism21do.
Eliminating Dependent Computations, Software Pipelining, Trace Scheduling, Superblocks22do.
Mid-term Examination
H/W Support at Compile Time23do.
H/W Support at Compile Time (Cont.), Speculation Mechanism: H/W Vs. S/W24do.
Storage Technologies, RAM and Enhanced DRAM, Disk Storage25do.
Concept of Cache Memory, Principle of Locality, Cache Addressing Techniques, RAM vs. Cache Transaction26do.
Cache Performance Metrics, Cache Designs, Addressing Techniques27do.
Placement and Replacement Policies, Cache Write Strategy, Cache Performance Enhancement28do.
Assignment No. 3
Cache Performance, Reducing Miss Penalty29do.
Classification of Cache Misses, Reducing Cache Miss Rate30do.
Reducing Miss Penalty or Miss Rate using Parallelism, Reducing Hit Time31do.
Course Viva
Main Memory Performance, Virtual Memory Performance32do.
Virtual Memory Address Translation, Protection of Multiple Processes Sharing Memory33do.
Parallel Processing, Parallel Processing Architectures34do.
Multiprocessor Cache Coherence, Enforcing Coherence, Performance of Cache Coherence Schemes35do.
Example of Invalidation Scheme, Coherence in Distributed Memory Architecture36do.
Performance of Multiprocessors with: Symmetric Shared-Memory and Distributed Shared Memory37do.
Disk Storage Systems, Interfacing Storage Devices38do.
I/O Interconnect Trends, Bus-based Interconnect, Bus Standards39do.
Redundant Array of Inexpensive Disks, I/O Benchmarks40do.
A Simple Network, Network Topology, Internetworking41do.
Switch Topologies, Clusters42do.
Internetworks, Clusters43do.
Case Studies44
Review Lecture45
Final-term Examination
Back to Top