GPUs
What is a GPU?
A GPU (Graphics Processing Unit) is a hardware device that is optimized for performing computations on large quantities of data in parallel. A GPU has many Streaming Multiprocessors (SM). Each SM has many Arithmetic Units (AUs), which perform operations. Even though the clock speed of a CPU core is often twice as fast as that of an Arithmetic Unit (~4GHz vs ~2GHz), GPUs often have an order of magnitude more Arithmetic Units than CPUs have cores (1,000 vs 100), which allow GPUs to perform operations at an order of magnitude higher throughput than CPUs. GPUs have a SIMD (same instruction, multiple data) architecture, which means that all of the Arithemtic Units on a Streaming Multiprocessor must perform the same operation on the same clock cycle.
A thread is the smallest unit of work that can be performed on a GPU. It is a sequence of instructions. A Thread Block is a group of threads that get assigned to an SM. A Thread Block is divided into Warps, which is a fixed sized group of threads, typically 32. All threads in a warp execute the same instruction at the same time, but on different data.
GPUs have their own memory. Each SM has its own L1 cache, and all the SMs share an L2 cache. When you want to perform computation on the GPU, you have to transfer the data from main memory to the GPUs memory. This is typically done via the PCIe bus by issuing a command to the DMA controller. After the GPU is finished when the computation, you have to transfer the data back to main memory.
What is a device driver?
A device driver is a piece of code that tells the operating system how to communicate with hardware devices. On Linux, drivers are implemented as kernel modules that get loaded into the operating system.
- Architecture - Interfacing with CPU and Memory - Driver - Code