PARALLEL COMPUTING (BCS702)

PARALLEL COMPUTING

Course Code BCS702
CIE Marks 50
Teaching Hours/Week (L:T:P: S) 3:0:2:0
SEE Marks 50
Total Hours of Pedagogy 40 hours Theory + 8-10 Lab slots
Total Marks 100
Credits 04
Exam Hours 03
Examination nature (SEE) Theory/Practical

MODULE-1

Introduction to parallel programming, Parallel hardware and parallel software –

Classifications of parallel computers, SIMD systems, MIMD systems, Interconnection networks,

Cache coherence, Shared-memory vs. distributed-memory, Coordinating the processes/threads,

Shared-memory, Distributed-memory.

MODULE-2

GPU programming, Programming hybrid systems, MIMD systems, GPUs, Performance –

Speedup and efficiency in MIMD systems, Amdahl’s law, Scalability in MIMD systems, Taking

timings of MIMD programs, GPU performance.

MODULE-3

Distributed memory programming with MPI – MPI functions, The trapezoidal rule in MPI,

Dealing with I/O, Collective communication, MPI-derived datatypes, Performance evaluation of

MPI programs, A parallel sorting algorithm.

MODULE-4

Shared-memory programming with OpenMP – openmp pragmas and directives, The trapezoidal

rule, Scope of variables, The reduction clause, loop carried dependency, scheduling, producers and

consumers, Caches, cache coherence and false sharing in openmp, tasking, tasking, thread safety.

MODULE-5

GPU programming with CUDA - GPUs and GPGPU, GPU architectures, Heterogeneous

computing, Threads, blocks, and grids Nvidia compute capabilities and device architectures, Vector

addition, Returning results from CUDA kernels, CUDA trapezoidal rule I, CUDA trapezoidal rule

II: improving performance, CUDA trapezoidal rule III: blocks with more than one warp.

PRACTICAL COMPONENT OF IPCC

Experiments

1 Write a OpenMP program to sort an array on n elements using both sequential and parallel

mergesort(using Section). Record the difference in execution time.

2 Write an OpenMP program that divides the Iterations into chunks containing 2 iterations,

respectively (OMP_SCHEDULE=static,2). Its input should be the number of iterations, and

its output should be which iterations of a parallelized for loop are executed by which thread.

For example, if there are two threads and four iterations, the output might be the following:

a. Thread 0 : Iterations 0 −− 1

b. Thread 1 : Iterations 2 −− 3

3 Write a OpenMP program to calculate n Fibonacci numbers using tasks.

4 Write a OpenMP program to find the prime numbers from 1 to n employing parallel for

directive. Record both serial and parallel execution times.

5 Write a MPI Program to demonstration of MPI_Send and MPI_Recv.

6 Write a MPI program to demonstration of deadlock using point to point communication and

avoidance of deadlock by altering the call sequence

7 Write a MPI Program to demonstration of Broadcast operation.

8 Write a MPI Program demonstration of MPI_Scatter and MPI_Gather

9 Write a MPI Program to demonstration of MPI_Reduce and MPI_Allreduce (MPI_MAX,

MPI_MIN, MPI_SUM, MPI_PROD)

Suggested Learning Resources:

Textbook:

1. Peter S Pacheco, Matthew Malensek – An Introduction to Parallel Programming, second edition, Morgan Kauffman.

2. Michael J Quinn – Parallel Programming in C with MPI and OpenMp, McGrawHill.

Reference Books:

1. Calvin Lin, Lawrence Snyder – Principles of Parallel Programming, Pearson

2. Barbara Chapman – Using OpenMP: Portable Shared Memory Parallel Programming,

Scientific and Engineering Computation

3. William Gropp, Ewing Lusk – Using MPI:Portable Parallel Programing, Third edition,

Scientific and Engineering Computation

About Me

Az Documents

PARALLEL COMPUTING (BCS702)