This page is the syllabus for the NVIDIA/UIUC Accelerated Computing Teaching Kit and outlines each module's organization in the downloaded Teaching Kit .zip file. It shows the content and the associated file names for every module as well as a link to the suggested online Deep Learning Institute (DLI) content for each module. You will also find links to stream lecture (.mp4) video files.

Module 1: Course Introduction |
|||||

In this module we review course goals and syllabus and introduce the concepts of heterogeneous and parallel programming. | |||||

Lectures: | |||||

1.1 Course Introduction and Overview | |||||

Lecture-1-1-overview.pdf | |||||

pptx | Lecture-1-1-overview.pptx | ||||

Video Lecture | |||||

1.2 Introduction to Heterogeneous Parallel Computing | |||||

Lecture-1-2-heterogeneous-computing.pdf | |||||

pptx | Lecture-1-2-heterogeneous-computing.pptx | ||||

Video Lecture | |||||

1.3 Portability and Scalability in Heterogeneous Parallel Computing | |||||

Lecture-1-3-portability-scalability.pdf | |||||

pptx | Lecture-1-3-portability-scalability.pptx | ||||

Video Lecture | |||||

Book Chapters: | |||||

Chapter 1 - Introduction: 3rd-Edition-Chapter01-introduction.pdf | |||||

Module 2: Introduction to CUDA C |
|||||

In this module we cover the basic API functions in CUDA host code and introduce CUDA threads, the main mechanism for exploiting data parallelism. | |||||

Lectures: | |||||

2.1 CUDA C vs. Thrust vs. CUDA Libraries | |||||

Lecture-2-1-cuda-thrust-libs.pdf | |||||

pptx | Lecture-2-1-cuda-thrust-libs.pptx | ||||

Video Lecture | |||||

2.2 Memory Allocation and Data Movement API Functions | |||||

Lecture-2-2-cuda-data-allocation-API.pdf | |||||

pptx | Lecture-2-2-cuda-data-allocation-API.pptx | ||||

Video Lecture | |||||

2.3 Threads and Kernel Functions | |||||

Lecture-2-3-cuda-parallelism-threads.pdf | |||||

pptx | Lecture-2-3-cuda-parallelism-threads.pptx | ||||

Video Lecture | |||||

2.4 Introduction to the CUDA Toolkit | |||||

Lecture-2-4-cuda-toolkit.pdf | |||||

pptx | Lecture-2-4-cuda-toolkit.pptx | ||||

Video Lecture | |||||

2.5 Nsight Compute and NSight Systems | |||||

Lecture-2-5-nsight-systems-compute.pdf | |||||

pptx | Lecture-2-5-nsight-systems-compute.pptx | ||||

Video Lecture | |||||

2.6 Unified Memory | |||||

Lecture-2-6-unified-memory.pdf | |||||

pptx | Lecture-2-6-unified-memory.pptx | ||||

Video Lecture | |||||

Labs: | |||||

Device Query: Module[2]-DeviceQuery.pdf | |||||

CUDA Toolkit: Lab-2.4.cuda-toolkit.zip | |||||

Quiz: | |||||

Module 2 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 2 - Data Parallel Computing: 3rd-Edition-Chapter02-data-parallel-computing.pdf | |||||

Module 3: CUDA Parallelism Model |
|||||

In this module we introduce the CUDA kernel, efficient memory access patterns, and thread scheduling. | |||||

Lectures: | |||||

3.1 Kernel-Based SPMD Parallel Programming | |||||

Lecture-3-1-kernel-SPMD-parallelism.pdf | |||||

pptx | Lecture-3-1-kernel-SPMD-parallelism.pptx | ||||

Video Lecture | |||||

3.2 Multidimensional Kernel Configuration | |||||

Lecture-3-2-kernel-multidimension.pdf | |||||

pptx | Lecture-3-2-kernel-multidimension.pptx | ||||

Video Lecture | |||||

3.3 Color-to-Grayscale Image Processing Example | |||||

Lecture-3-3-color-to-greyscale-image-processing-example.pdf | |||||

pptx | Lecture-3-3-color-to-greyscale-image-processing-example.pptx | ||||

Video Lecture | |||||

3.4 Image Blur Example | |||||

Lecture-3-4-blur-kernel.pdf | |||||

pptx | Lecture-3-4-blur-kernel.pptx | ||||

Video Lecture | |||||

3.5 Thread Scheduling | |||||

Lecture-3-5-transparent-scaling.pdf | |||||

pptx | Lecture-3-5-transparent-scaling.pptx | ||||

Video Lecture | |||||

Labs: | |||||

NVIDIA DLI Online Course: Fundamentals of Accelerated Computing with CUDA C/C++, Section 1: Accelerating Applications with CUDA C/C++ | |||||

CUDA Image Blur: Module[3]-ImageBlur.pdf | |||||

CUDA Image Color to Grayscale: Module[3]-ImageColorToGrayscale.pdf | |||||

CUDA Thrust Vector Add: Module[3]-ThrustVectorAdd.pdf | |||||

CUDA Vector Add: Module[3]-VectorAdd.pdf | |||||

Quiz: | |||||

Module 3 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 3 - Scalable Parallel Execution: 3rd-Edition-Chapter03-scalable-parallel-execution.pdf | |||||

Module 4: Memory and Data Locality |
|||||

In this module we introduce the CUDA memory types and explore their effective use in tiled parallel algorithms. | |||||

Lectures: | |||||

4.1 CUDA Memories | |||||

Lecture-4-1-cuda-memories.pdf | |||||

pptx | Lecture-4-1-cuda-memories.pptx | ||||

Video Lecture | |||||

4.2 Tiled Parallel Algorithms | |||||

Lecture-4-2-tiled-algorithms.pdf | |||||

pptx | Lecture-4-2-tiled-algorithms.pptx | ||||

Video Lecture | |||||

4.3 Tiled Matrix Multiplication | |||||

Lecture-4-3-tiled-matrix-multiplication.pdf | |||||

pptx | Lecture-4-3-tiled-matrix-multiplication.pptx | ||||

Video Lecture | |||||

4.4 Tiled Matrix Multiplication Kernel | |||||

Lecture-4-4-tiled-matrix-multiplication-kernel.pdf | |||||

pptx | Lecture-4-4-tiled-matrix-multiplication-kernel.pptx | ||||

Video Lecture | |||||

4.5 Handling Arbitrary Matrix Sizes in Tiled Algorithms | |||||

Lecture-4-5-tile-boundary-condition.pdf | |||||

pptx | Lecture-4-5-tile-boundary-condition.pptx | ||||

Video Lecture | |||||

Labs: | |||||

NVIDIA DLI Online Course: Fundamentals of Accelerated Computing with CUDA C/C++, Section 2: Managing Accelerated Application Memory with CUDA Unified Memory and nsys | |||||

Basic Matrix Multiplication: Module[4]-BasicMatrixMultiplication.pdf | |||||

CUDA Tiled Matrix Multiplication: Module[4]-TiledMatrixMultiplication.pdf | |||||

Quiz: | |||||

Module 4 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 4 - Memory and Data Locality: 3rd-Edition-Chapter04-memory-and-data-locality.pdf | |||||

Module 5: Thread Execusion Efficiency |
|||||

In this module we explore how CUDA threads execute on SIMD Hardware and how to analyze the performance impact of control divergence. | |||||

Lectures: | |||||

5.1 Warps and SIMD Hardware | |||||

Lecture-5-1-warps-simd.pdf | |||||

pptx | Lecture-5-1-warps-simd.pptx | ||||

Video Lecture | |||||

5.2 Performance Impact of Control Divergence | |||||

Lecture-5-2-control-divergence.pdf | |||||

pptx | Lecture-5-2-control-divergence.pptx | ||||

Video Lecture | |||||

Quiz: | |||||

Module 5 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 5 - Performance Considerations: 3rd-Edition-Chapter05-performance-considerations.pdf | |||||

Module 6: Memory Access Performance |
|||||

In this module we explore the significance of memory coalescing to effectively utilize memory bandwidth in CUDA. | |||||

Lectures: | |||||

6.1 DRAM Bandwidth | |||||

Lecture-6-1-dram-bandwidth.pdf | |||||

pptx | Lecture-6-1-dram-bandwidth.pptx | ||||

Video Lecture | |||||

6.2 Memory Coalescing in CUDA | |||||

Lecture-6-2-memory-coalescing.pdf | |||||

pptx | Lecture-6-2-memory-coalescing.pptx | ||||

Video Lecture | |||||

Quiz: | |||||

Module 6 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 5 - Performance Considerations: 3rd-Edition-Chapter05-performance-considerations.pdf | |||||

Module 7: Parallel Computation Patterns (Histogram) |
|||||

In this module we introduce the parallel histogram computation pattern and learn to write a high performance kernel by privatizing outputs. | |||||

Lectures: | |||||

7.1 Histogramming | |||||

Lecture-7-1-histogram.pdf | |||||

pptx | Lecture-7-1-histogram.pptx | ||||

Video Lecture | |||||

7.2 Introduction to Data Races | |||||

Lecture-7-2-data-race.pdf | |||||

pptx | Lecture-7-2-data-race.pptx | ||||

Video Lecture | |||||

7.3 Atomic Operations in CUDA | |||||

Lecture-7-3-CUDA-Atomic.pdf | |||||

pptx | Lecture-7-3-CUDA-Atomic.pptx | ||||

Video Lecture | |||||

7.4 Atomic Operation Performance | |||||

Lecture-7-4-atomic-performance.pdf | |||||

pptx | Lecture-7-4-atomic-performance.pptx | ||||

Video Lecture | |||||

7.5 Privatization Technique for Improved Throughput | |||||

Lecture-7-5-privatized-histogram.pdf | |||||

pptx | Lecture-7-5-privatized-histogram.pptx | ||||

Video Lecture | |||||

Labs: | |||||

Histogram: Module[7]-Histogram.pdf | |||||

Text Histogram: Module[7]-TextHistogram.pdf | |||||

Thrust Histogram Sort: Module[7]-ThrustHistogramSort.pdf | |||||

Quiz: | |||||

Module 7 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 11 - Parallel Patterns: Parallel Histogram Computation: 3rd-Edition-Chapter09-parallel-histogram-conputation.pdf | |||||

Module 8: Parallel Computation Patterns (Stencil) |
|||||

In this module we introduce the tiled convolution pattern. We will learn to analyze the cost and benefit of tiled parallel convolution algorithms. | |||||

Lectures: | |||||

8.1 Convolution | |||||

Lecture-8-1-convolution.pdf | |||||

pptx | Lecture-8-1-convolution.pptx | ||||

Video Lecture | |||||

8.2 Tiled Convolution | |||||

Lecture-8-2-tiled-convolution.pdf | |||||

pptx | Lecture-8-2-tiled-convolution.pptx | ||||

Video Lecture | |||||

8.3 Tile Boundary Conditions | |||||

Lecture-8-3-tile-boundary-condition.pdf | |||||

pptx | Lecture-8-3-tile-boundary-condition.pptx | ||||

Video Lecture | |||||

8.4 Analyzing Data Reuse in Tiled Convolution | |||||

Lecture-8-4-convolution-reuse.pdf | |||||

pptx | Lecture-8-4-convolution-reuse.pptx | ||||

Video Lecture | |||||

Labs: | |||||

Convolution: Module[8]-Convolution.pdf | |||||

Stencil: Module[8]-Stencil.pdf | |||||

Quiz: | |||||

Module 8 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 7 - Parallel Patterns: Convolution: 3rd-Edition-Chapter07-convolution.pdf | |||||

Module 9: Parallel Computation Patterns (Reduction) |
|||||

In this module we introduce the parallel reduction pattern. | |||||

Lectures: | |||||

9.1 Parallel Reduction | |||||

Lecture-9-1-reduction.pdf | |||||

pptx | Lecture-9-1-reduction.pptx | ||||

Video Lecture | |||||

9.2 A Basic Reduction Kernel | |||||

Lecture-9-2-reduction-kernel.pdf | |||||

pptx | Lecture-9-2-reduction-kernel.pptx | ||||

Video Lecture | |||||

9.3 A Better Reduction Kernel | |||||

Lecture-9-3-better-reduction-kernel.pdf | |||||

pptx | Lecture-9-3-better-reduction-kernel.pptx | ||||

Video Lecture | |||||

Labs: | |||||

Reduction: Module[9]-Reduction.pdf | |||||

Thrust Reduction: Module[9]-ThrustReduction.pdf | |||||

Quiz: | |||||

Module 9 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 5 - Performance Considerations: 3rd-Edition-Chapter05-performance-considerations.pdf | |||||

Module 10: Parallel Computation Patterns (Scan) |
|||||

In this module we introduce the parallel scan (prefix sum) pattern. | |||||

Lectures: | |||||

10.1 Prefix Sum | |||||

Lecture-10-1-scan-parallel-prefix-sum.pdf | |||||

pptx | Lecture-10-1-scan-parallel-prefix-sum.pptx | ||||

Video Lecture | |||||

10.2 A Work-inefficient Scan Kernel | |||||

Lecture-10-2-work-inefficient-scan-kernel.pdf | |||||

pptx | Lecture-10-2-work-inefficient-scan-kernel.pptx | ||||

Video Lecture | |||||

10.3 A Work-Efficient Parallel Scan Kernel | |||||

Lecture-10-3-work-efficient-scan-kernel.pdf | |||||

pptx | Lecture-10-3-work-efficient-scan-kernel.pptx | ||||

Video Lecture | |||||

10.4 More on Parallel Scan | |||||

Lecture-10-4-more-on-parallel-scan.pdf | |||||

pptx | Lecture-10-4-more-on-parallel-scan.pptx | ||||

Video Lecture | |||||

Labs: | |||||

List Scan: Module[10]-ListScan.pdf | |||||

Thrust List Reduction: Module[10]-ThrustListScan.pdf | |||||

Quiz: | |||||

Module 10 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 9 - Parallel Patterns: PrefixSum: 3rd-Edition-Chapter08-prefix-sum.pdf | |||||

Module 11: Breadth-First (BFS) Queue |
|||||

In this module we cover Breadth-First Search Queue. | |||||

Labs: | |||||

Breadth-First Search Queue: Module[11]-BfsQueue.pdf | |||||

Module 12: Floating-Point Considerations |
|||||

In this module we introduce the fundmentals of floating-point representation. | |||||

Lectures: | |||||

12.1 Floating-Point Precision and Accuracy | |||||

Lecture-12-1-floating-point-basics.pdf | |||||

pptx | Lecture-12-1-floating-point-basics.pptx | ||||

Video Lecture | |||||

12.2 Numerical Stability | |||||

Lecture-12-2-numerical-stability.pdf | |||||

pptx | Lecture-12-2-numerical-stability.pptx | ||||

Video Lecture | |||||

Book Chapters: | |||||

Chapter 6 - Numerical Considerations: 3rd-Edition-Chapter06-numerical-considerations.pdf | |||||

Module 13: GPU as Part of the PC Architecture |
|||||

In this module we introduce how GPUs fit in the PC architecture. | |||||

Lectures: | |||||

13.1 GPU as Part of the PC Architecture | |||||

Lecture-13-GPU-in-PC-Architecture.pdf | |||||

pptx | Lecture-13-GPU-in-PC-Architecture.pptx | ||||

Video Lecture | |||||

Book Chapters: | |||||

Chapter 18 - Programming a Heterogeneous Computing Cluster: 3rd-Edition-Chapter18-heterogeneous-cluster.pdf | |||||

Module 14: Efficient Host-Device Data Transfer |
|||||

In this module we discuss important concepts involved in copying (transferring) data between host and device. | |||||

Lectures: | |||||

14.1 Pinned Host Memory | |||||

Lecture-14-1-Data-Transfer.pdf | |||||

pptx | Lecture-14-1-Data-Transfer.pptx | ||||

Video Lecture | |||||

14.2 Task Parallelism in CUDA | |||||

Lecture-14-2-CUDA-Streams.pdf | |||||

pptx | Lecture-14-2-CUDA-Streams.pptx | ||||

Video Lecture | |||||

14.3 Overlapping Data Transfer with Computation | |||||

Lecture-14-3-Overlap-Transfer.pdf | |||||

pptx | Lecture-14-3-Overlap-Transfer.pptx | ||||

Video Lecture | |||||

14.4 CUDA Unified Memory | |||||

Lecture-14-4-cuda-unified-memory.pdf | |||||

pptx | Lecture-14-4-cuda-unified-memory.pptx | ||||

Video Lecture | |||||

Labs: | |||||

NVIDIA DLI Online Course: Fundamentals of Accelerated Computing with CUDA C/C++, Section 3: Asynchronous Streaming, and Visual Profiling for Accelerated Applications with CUDA C/C++ | |||||

Vector Addition Using CUDA Streams: Module[14]-VectorAdd_Stream.pdf | |||||

Vector Addition Using Pinned Memory: Module[14]-PinnedMemoryStreamsVectorAdd.pdf | |||||

CUDA Unified Memory Matrix Multiplication: Module[14]-UMMatrixMultiplication.pdf | |||||

Quiz: | |||||

Module 14 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 18 - Programming a Heterogeneous Computing Cluster: 3rd-Edition-Chapter18-heterogeneous-cluster.pdf | |||||

Chapter 20 - More on CUDA and Grahpics Processing Unit Computing: 3rd-Edition-Chapter20-more-cuda-gpu-computing.pdf | |||||

Module 15: Application Case Study: Advanced MRI Reconstruction |
|||||

In this module we introduce the MRI Reconstruction case study. | |||||

Lectures: | |||||

15.1 Advanced MRI Reconstruction | |||||

Lecture-15-1-MRI-reconstruction.pdf | |||||

pptx | Lecture-15-1-MRI-reconstruction.pptx | ||||

Video Lecture | |||||

15.2 Kernel Optimizations | |||||

Lecture-15-2-MRI-kernel-optimization.pdf | |||||

pptx | Lecture-15-2-MRI-kernel-optimization.pptx | ||||

Video Lecture | |||||

Book Chapters: | |||||

Chapter 14 - Application Case Study - Non-Cartesian Magnetic Resonance Imaging: 3rd-Edition-Chapter14-case-study-MRI.pdf | |||||

Module 16: Application Case Study: Electrostatic Potential Calculation |
|||||

In this module we introduce the Electrostatic Potential Calculation case study. | |||||

Lectures: | |||||

16.1 Electrostatic Potential Calculation - Part 1 | |||||

Lecture-16-1-VMD-case-study-Part1.pdf | |||||

pptx | Lecture-16-1-VMD-case-study-Part1.pptx | ||||

Video Lecture | |||||

16.2 Electrostatic Potential Calculation - Part 2 | |||||

Lecture-16-2-VMD-case-study-Part2.pdf | |||||

pptx | Lecture-16-2-VMD-case-study-Part2.pptx | ||||

Video Lecture | |||||

Module 17: Computational Thinking for Parallel Programming |
|||||

In this module we provide a framework for thinking about the problems of parallel programming | |||||

Lectures: | |||||

17.1 Introduction to Computational Thinking | |||||

Lecture-17-1-Computational-Thinking.pdf | |||||

pptx | Lecture-17-1-Computational-Thinking.pptx | ||||

Book Chapters: | |||||

Chapter 17 - Parallel Programming and Computational Thinking: 3rd-Edition-Chapter17-computational-thinking.pdf | |||||

Module 18: Related Programming Models: MPI |
|||||

In this module we introduce the MPI programming model. | |||||

Lectures: | |||||

18.1 Introduction to Heterogeneous Supercomputing and MPI | |||||

Lecture-18-MPI-CUDA-intro.pdf | |||||

pptx | Lecture-18-MPI-CUDA-intro.pptx | ||||

Book Chapters: | |||||

Chapter 18 - Programming a Heterogeneous Computing Cluster: 3rd-Edition-Chapter18-heterogeneous-cluster.pdf | |||||

Module 19: CUDA Python using Numba |
|||||

In this module we introduce CUDA Python using Numba. | |||||

Labs: | |||||

NVIDIA DLI Online Course: Fundamentals of Accelerated Computing with CUDA Python | |||||

Module 20: Related Programming Models: OpenCL |
|||||

In this module we introduce the OpenCL programming model. | |||||

Lectures: | |||||

20.1 OpenCL Data Parallelism Model | |||||

Lecture-20-1-opencl-parallelism.pdf | |||||

pptx | Lecture-20-1-opencl-parallelism.pptx | ||||

20.2 OpenCL Device Architecture | |||||

Lecture-20-2-opencl-architecture.pdf | |||||

pptx | Lecture-20-2-opencl-architecture.pptx | ||||

20.3 OpenCL Host Code | |||||

Lecture-20-3-opencl-host-code.pdf | |||||

pptx | Lecture-20-3-opencl-host-code.pptx | ||||

Labs: | |||||

OpenCL Vector Addition: Module[20]-OpenCLVectorAddition.pdf | |||||

Quiz: | |||||

Module 20 Quiz.pdf | |||||

Book Chapters: | |||||

Appendix - An Introduction to OpenCL: 3rd-Edition-AppendixA-intro-to-OpenCL.pdf | |||||

Module 21: Related Programming Models: OpenACC |
|||||

In this module we introduce the OpenACC programming model. | |||||

Lectures: | |||||

21.1 Introduction to OpenACC | |||||

Lecture-21-1-openACC-intro.pdf | |||||

pptx | Lecture-21-1-openACC-intro.pptx | ||||

Video Lecture | |||||

21.2 OpenACC Subtleties | |||||

Lecture-21-2-openACC-subtleties.pdf | |||||

pptx | Lecture-21-2-openACC-subtleties.pptx | ||||

Video Lecture | |||||

Labs: | |||||

NVIDIA DLI Online Course: Fundamentals of Accelerated Computing with OpenACC | |||||

OpenACC CUDA Vector Add: Module[21]-OpenACCVectorAdd.pdf | |||||

Quiz: | |||||

Module 21 Quiz.pdf | |||||

Book Chapters: | |||||

Chapter 15 - Parallel Programming with OpenACC: 3rd-Edition-Chapter19-programming-with-OpenACC.pdf | |||||

Module 22: Related Programming Models: OpenGL |
|||||

In this module we introduce the OpenGL programming model. | |||||

(Module scheduled for a future relase of the teaching kit.) |
|||||

Module 23: Dynamic Parallelism |
|||||

In this module we introduce dynamic parallelism. | |||||

Lectures: | |||||

23.1 Dynamic Parallelism | |||||

Lecture-23-Dynamic-parallelism.pdf | |||||

pptx | Lecture-23-Dynamic-parallelism.pptx | ||||

Video Lecture | |||||

Labs: | |||||

Dynamic Parallelism: Module[23]-DynamicParallelism.pdf | |||||

Book Chapters: | |||||

Chapter 13 - CUDA dynamic parallelism: 3rd-Edition-Chapter13-cuda-dynamic-parallelism | |||||

Module 24: Multi-GPU |
|||||

In this module we discuss programming with multiple GPUs. | |||||

Lectures: | |||||

24.1 OpenMP | |||||

Lecture-24-1-openmp.pdf | |||||

pptx | Lecture-24-1-openmp.pptx | ||||

24.2 Multi-GPU Introduction I | |||||

Lecture-24-2-multi-gpu-introduction-i.pdf | |||||

pptx | Lecture-24-2-multi-gpu-introduction-i.pptx | ||||

24.3 Multi-GPU Introduction II | |||||

Lecture-24-3-multi-gpu-introduction-ii.pdf | |||||

pptx | Lecture-24-3-multi-gpu-introduction-ii.pptx | ||||

24.4 OpenMP and Cooperative Groups | |||||

Lecture-24-4-openmp-and-cooperative-groups.pdf | |||||

pptx | Lecture-24-4-openmp-and-cooperative-groups.pptx | ||||

24.5 Multi-GPU Heat Equation | |||||

Lecture-24-5-multi-gpu-heat-equation.pdf | |||||

pptx | Lecture-24-5-multi-gpu-heat-equation.pptx | ||||

Labs: | |||||

Multi-GPU Heat Equation: Module[24]-HeatEquation.pdf | |||||

Quiz: | |||||

Module 24 Quiz.pdf | |||||

Module 25: Using CUDA Libraries |
|||||

In this module we introduce the effective use of CUDA libraries. | |||||

Lectures: | |||||

25.1 cuBLAS | |||||

Lecture-25-1-cublas.pdf | |||||

pptx | Lecture-25-1-cublas.pptx | ||||

25.2 cuSOLVER | |||||

Lecture-25-2-cusolver.pdf | |||||

pptx | Lecture-25-2-cusolver.pptx | ||||

25.3 cuFFT | |||||

Lecture-25-3-cufft.pdf | |||||

pptx | Lecture-25-3-cufft.pptx | ||||

25.4 Thrust | |||||

Lecture-25-4-thrust.pdf | |||||

pptx | Lecture-25-4-thrust.pptx | ||||

Labs: | |||||

Heat Equation with NVIDIA libraries: Module[25]-HeatEquationLibs.pdf | |||||

Quiz: | |||||

Module 25 Quiz.pdf | |||||

Book Chapters: | |||||

THRUST: a productivity-oriented library for CUDA: 3rd-Edition-AppendixB-thrust | |||||

Module 26: Advanced Thrust |
|||||

In this module we discuss advanced Thrust topics. | |||||

(Module scheduled for a future relase of the teaching kit.) |
|||||

© 2024 - NVIDIA & University of Illinois