CUDA 2.2
CUDA 2.2 Ranking & Summary
CUDA 2.2 description
The CUDA Toolkit is a C language development environment for CUDA-enabled GPUs.
In a matter of a few years, the programmable graphics processor unit has
developed into an absolute computing workhorse. With multiple cores driven by very high memory bandwidth, todays GPUs offer incredible resources for both graphics and non-graphics processing.
The main reason behind such an evolution is that the GPU is specialized for compute-intensive, highly parallel computation exactly what graphics rendering is about and therefore is designed such that more transistors are devoted to data processing rather than data caching and flow control.
The CUDA development environment includes:
- nvcc C compiler
- CUDA FFT and BLAS libraries for the GPU
- Profiler
- gdb debugger for the GPU (alpha available in March, 2008)
- CUDA runtime driver (now also available in the standard NVIDIA GPU driver)
- CUDA programming manual
The CUDA Developer SDK provides examples with source code to help you get started with CUDA. Examples include:
- Parallel bitonic sort
- Matrix multiplication
- Matrix transpose
- Performance profiling using timers
- Parallel prefix sum (scan) of large arrays
- Image convolution
- 1D DWT using Haar wavelet
- OpenGL and Direct3D graphics interoperation examples
- CUDA BLAS and FFT library usage examples
- CPU-GPU C- and C++-code integration
- Binomial Option Pricing
- Black-Scholes Option Pricing
- Monte-Carlo Option Pricing
- Parallel Mersenne Twister (random number generation)
- Parallel Histogram
- Image Denoising
- Sobel Edge Detection Filter
- MathWorks MATLAB Plug-in
Main features:
- Standard C programming language enabled on a GPU.
- Unified hardware and software solution for parallel computing on CUDA-enabled NVIDIA GPUs.
- CUDA compatible GPUs range from lower power notebook GPUs to high performance, multi-GPU systems.
- CUDA-enabled GPUs support the Parallel Data Cache and Thread Execution Manager.
- Standard numerical libraries for FFT (Fast Fourier Transform) and BLAS (Basic Linear Algebra Subroutines).
- Dedicated CUDA driver for computing.
- Optimized direct upload and download path from the CPU to CUDA-enabled GPU.
- CUDA driver interoperates with OpenGL and DirectX graphics drivers.
- Support for Linux 32/64-bit and Windows XP 32/64-bit operating systems.
- Direct driver and assembly level access through CUDA for research and language development.
Enhancements
- Visual Profiler for the GPU - The most common step in tuning application performance is profiling the application and then modifying the code. The CUDA Visual Profiler is a graphical tool that enables the profiling of C applications running on the GPU. This latest release of the CUDA Visual Profiler includes metrics for memory transactions, giving developers visibility into one of the most important areas they can tune to get better performance.
- Improved OpenGL Interop - Delivers improved performance for Medical Imaging and other OpenGL applications running on Quadro GPUs when computing with CUDA and rendering OpenGL graphics functions are performed on different GPUs.
- Texture from Pitch Linear Memory - Delivers up to 2x bandwidth savings for video processing applications.
- Zero-copy - Enables streaming media, video transcoding, image processing and signal processing applications to realize significant performance improvements by allowing CUDA functions to read and write directly from pinned system memory. This reduces the frequency and amount of data copied back and forth between GPU and CPU memory. Supported on MCP7x and GT200 and later GPUs.
- Pinned Shared Sysmem - Enables applications that use multiple GPUs to achieve better performance and use less total system memory by allowing multiple GPUs to access the same data in system memory. Typical multi-GPU systems include Tesla servers, Tesla Personal Supercomputers, workstations using QuadroPlex deskside units and consumer systems with multiple GPUs.
- Asynchronous memcopy on Vista - Allows applications to realize significant performance improvements by copying memory asynchronously. This feature was already available on other supported platforms but is now available on Vista.
- Hardware Debugger for the GPU - Developers can now use a hardware level debugger on CUDA-enabled GPUs that offers the simplicity of the popular open-source GDB debugger yet enables a developer to easily debug a program that is running 1000s of threads on the GPU. This CUDA GDB debugger for Linux has all the features required to debug directly on the GPU, including the ability to set breakpoints, watch variables, inspect state, etc.
- Exclusive Device Mode - This system configuration option allows an application to get exclusive use of a GPU, guaranteeing that 100% of the processing power and memory of the GPU will be dedicated to that application. Multiple applications can still be run concurrently on the system, but only one application can make use of each GPU at a time. This configuration is particularly useful on Tesla cluster systems where large applications may require dedicated use of one or more GPUs on each node of a Linux cluster.
Pinned Memory Support:
- These new memory management functions (cuMemHostAlloc() and cudaHostAlloc()) enable pinned memory to be made "portable" (available to all CUDA contexts), "mapped" (mapped into the CUDA address space), and/or "write combined" (not cached and faster for the GPU to access).
- cuMemHostAlloc
- cuMemHostGetDevicePointer
- cudaHostAlloc
- cudaHostGetDevicePointer
Function attribute query:
- This function allows applications to query various function properties.
- cuFuncGetAttribute
2D Texture reads from pitch linear memory:
- You can bind linear memory that you get from cuMemAlloc() or cudaMalloc() directly to a 2D texture. In previous releases, you were only able to bind cuArrayCreate() or cudaMallocArray() arrays to 2D textures.
- cuTexRefSetAddress2D
- cudaBindTexture2D
Flags for event creation:
- Applications can now create events that use blocking synchronization.
- cudaEventCreateWithFlags
New device management and context creation flags:
- The function cudaSetDeviceFlags() allows the application to specify attributes such as mapping host memory and support for blocking synchronization.
- cudaSetDeviceFlags
Improved runtime device management:
- The runtime now defaults to attempting context creation on other devices in the system before returning any failure messages. The new call cudaSetValidDevices() allows the application to specify a list of acceptable devices for use.
- cudaSetValidDevices
Driver/runtime version query functions:
- Applications can now directly query version information about the underlying driver/runtime.
- cuDriverGetVersion
- cudaDriverGetVersion
- cudaRuntimeGetVersion
New device attribute queries:
- CU_DEVICE_ATTRIBUTE_INTEGRATED
- CU_DEVICE_ATTRIBUTE_CAN_MAP_HOST_MEMORY
- CU_DEVICE_ATTRIBUTE_COMPUTE_MODE
Documentation:
- Doxygen-generated and cross-referenced html, pdf, and man pages.
- Runtime API
- Driver API
CUDA 2.2 Screenshot
CUDA 2.2 Keywords
Bookmark CUDA 2.2
CUDA 2.2 Copyright
Want to place your software product here?
Please contact us for consideration.
Contact WareSeeker.com