An AI accelerator is a class of microprocessor or computer system designed to accelerate artificial neural networks, machine vision and other machine learning algorithms for robotics, internet of things and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic. A number of vendor-specific terms exist for devices in this space.
Video AI accelerator
History of AI acceleration
Computer systems have frequently complemented the CPU with special purpose accelerators for specialized tasks, most notably video cards for graphics, but also sound cards for sound, etc. As Deep learning and AI workloads rose in prominence, specialize hardware was created or adapted from previous products to accelerate these tasks.
Early attempts
As early as 1993, DSPs have been used as neural network accelerators e.g. to accelerate OCR software,. In the 1990s there were also attempts to create parallel high throughput systems for workstations aimed at various applications, including neural network simulations. FPGA-based accelerators were also first explored in the 1990s for both inference and training. ANNA was a neural net CMOS accelerator developed by Yann LeCun.
Heterogeneous computing
Heterogeneous computing began the incorporation of a number of specialized processors in a single system, or even a single chip, each optimized for a specific type of task. Architectures such as the Cell microprocessor have features significantly overlapping with AI accelerators including: support for packed low precision arithmetic, dataflow architecture, and prioritising 'throughput' over latency. The Cell microprocessor would go on to be applied to a number of tasks including AI.
CPUs themselves also gained increasingly wide SIMD units (driven by video and gaming workloads) and support for packed low precision data types.
Use of GPU
Graphics processing units or GPUs are specialized hardware for the manipulation of images. As the mathematical basis of neural networks and image manipulation are similar, embarrassingly parallel tasks involving matrices, GPUs became increasingly used for machine learning tasks. As such, as of 2016 GPUs are popular for AI work, and they continue to evolve in a direction to facilitate deep learning, both for training and inference in devices such as self-driving cars. - and gaining additional connective capability for the kind of dataflow workloads AI benefits from (e.g. Nvidia NVLink). As GPUs have been increasingly applied to AI acceleration, GPU manufacturers have incorporated neural network specific hardware to further accelerate these tasks. Tensor cores are intended to speed up the training of neural networks.
Use of FPGA
Deep learning frameworks are still evolving, making it hard to design custom hardware. Reconfigurable devices like field-programmable gate arrays (FPGA) make it easier to evolve hardware, frameworks and software alongside each other.
Microsoft has used FPGA chips to accelerate inference. The application of FPGAs to AI acceleration has also motivated Intel to purchase Altera with the aim of integrating FPGAs in server CPUs, which would be capable of accelerating AI as well as general purpose tasks.
Emergence of dedicated AI accelerator ASICs
Whilst GPUs and FPGAs perform far better than CPUs for these AI related tasks, a factor of 10 in efficiency can still be gained with a more specific design, via an application-specific integrated circuit (ASIC). These include differences in memory use and the use of lower precision numbers.
Maps AI accelerator
Nomenclature
As of 2016, the field is still in flux and vendors are pushing their own marketing term for what amounts to an "AI accelerator", in the hope that their designs and APIs will dominate. There is no consensus on the boundary between these devices, nor the exact form they will take, however several examples clearly aim to fill this new space, with a fair amount of overlap in capabilities.
In the past when consumer graphics accelerators emerged, the industry eventually adopted Nvidia's self-assigned term, "the GPU", as the collective noun for "graphics accelerators", which had taken many forms before settling on an overall pipeline implementing a model presented by Direct3D.
Examples
Stand alone products
- Google Tensor processing unit is an accelerator specifically designed by Google for its TensorFlow framework, which is extensively used for convolutional neural networks. It focuses on a high volume of 8-bit precision arithmetic. The initial first generation focused on inference, while the second generation increased capability for neural network training also.
- Adapteva epiphany is a many-core coprocessor featuring a network on a chip scratchpad memory model, suitable for a dataflow programming model, which should be suitable for many machine learning tasks.
- Intel Nervana NNP (Neural Network Processor) (a.k.a. "Lake Crest"), which Intel claims is the first commercially available chip with a purpose built architecture for deep learning. Facebook was a partner in the design process.
- Movidius Myriad 2 is a many-core VLIW AI accelerator complemented with video fixed function units.
- Mobileye EyeQ is a processor specialized for vision processing for self-driving cars
GPU based products
- Nvidia Tesla is Nvidia's line of GPU derived products marketed for GPGPU and AI tasks.
- Nvidia Volta is a microarchitecture which augments the Graphics processing unit with additional 'tensor units' targeted specifically at accelerating calculations for neural networks
- Nvidia DGX-1 is a Nvidia workstation/server product which incorporates Nvidia brand GPUs for GPGPU tasks including machine learning.
- Radeon Instinct is AMD's line of GPU derived products for AI acceleration.
AI accelerating co-processors
- The processor in Qualcomms mobile platform Snapdragon 845 contains a Hexagon 685 DSP core for AI processing in camera, voice, XR and gaming applications
- PowerVR 2NX NNA (Neural Net Accelerator) is an IP core from Imagination Technologies licensed for integration into chips.
- Neural Engine is an AI accelerator core within the Apple A11 Bionic SoC.
- Cadence Tensilica Vision C5 is a neural networks optimized DSP IP core
- The Neural Processing Unit is a neural network accelerator within the HiSilicon Kirin 970
- January 2018 CEVA, Inc. launched a family of four AI processors called NeuPro, each containing one programmable vector DSP and one hardwired implementation of 8-bit or 16-bit neural network layers supporting neural nets with performances ranging from 2 TOPS thru 12.5 TOPS.
Research and unreleased products
- In December 2017 Tesla Motors confirms a rumour that it is developing an AI chip for autonomous driving. Jim Keller has been working in this project since at least early 2016.
- Eyeriss is an accelerator design aimed explicitly at convolutional neural networks, using a scratchpad and on chip network architecture.
- Kalray is an accelerator for convolutional neural nets.
- SpiNNaker is a many-core design specialized for simulating a large neural network.
- Graphcore IPU is a graph-based AI accelerator.
- DPU, by wave computing, a dataflow architecture
- STMicroelectronics at the start of 2017 presented a demonstrator SoC manufactured in a 28 nm process containing a deep CNN accelerator.
- NM500 is the latest as of 2016 in a series of accelerator chips for Radial Basis Function neural nets from General Vision.
- TrueNorth is a manycore design based on spiking neurons rather than traditional arithmetic.
- Intel Loihi is an experimental neuromorphic chip.
- BrainChip in September 2017 introduced a commercial PCI Express card with a Xilinx Kintex Ultrascale FPGA running neuromorphic neural cores applying pattern recognition on 600 video images per second using 16 watts of power.
- IIT Madras is designing a spiking neuron accelerator for big-data analytics.
Potential applications
- Autonomous cars, Nvidia have targeted their Drive PX-series boards at this space.
- Military robots
- Agricultural robots, for example chemical-free weed control.
- Voice control, e.g. in mobile phones, a target for Qualcomm Zeroth.
- Machine translation
- Unmanned aerial vehicles, e.g. navigation systems, e.g. the Movidius Myriad 2 has been demonstrated successfully guiding autonomous drones.
- Industrial robots, increasing the range of tasks that can be automated, by adding adaptability to variable situations.
- Healthcare assisting with diagnoses
- Search engines, increasing the energy efficiency of data centres and ability to use increasingly advanced queries.
- Natural language processing
See also
- Neuromorphic engineering
References
External links
- http://www.nextplatform.com/2016/04/05/nvidia-puts-accelerator-metal-pascal/
- http://eyeriss.mit.edu
Source of the article : Wikipedia