Hardware for machine learning: the 5 best hardware options in comparison
For artificial intelligences that use machine learning as a learning mechanism to learn optimally and efficiently, choosing the right hardware is crucial. Our AI Engineer Melvin Klein explains why, the advantages and disadvantages of each option, and which hardware is best suited for artificial intelligence in his guest post.
These 5 types of hardware are suitable for machine learning
Machine Learning (ML) tasks are diverse, and so is the hardware on which they can be computed. To better understand how to speed up the calculations, it is important to know what kind of calculations are performed. Calculations used by ML algorithms are largely vector and matrix calculations. For this reason, a suitable computational accelerator for ML applications must be able to perform these calculations.
Central Processing Unit (CPU)
The conventional CPU is the heart of every desktop computer. Since most computer programs can only use a few cores, the core count of the CPU is also low. Although the number has increased significantly in the last few years, even the maximum 96 cores of an AMD EPYC are low compared to other options. CPUs must handle a variety of tasks and therefore have additional instruction set extensions that can speed up certain types of calculations. All modern x86/amd64 CPUs are equipped with the AVX (Advanced Vector Extensions) extension of the x86 instruction set. As the name suggests, vector calculations can be accelerated with this extension. The fourth generation of Intel Scalable Processors (Sapphire Rapids) can additionally use Intel AMX (Advanced Matrix Extensions) to further accelerate matrix calculations.
CPUs will continue to play a very large role when it comes to computing ML algorithms, despite numerous alternatives. This is not only due to the fact that these are already available to almost everyone (without the need for an expensive purchase of new hardware), but also due to the numerous new developments that ensure a constant improvement in performance and reduction in power consumption. The two major manufacturers, Intel and AMD, relied on chiplet-based CPUs in the server sector. AMD also does this in the desktop sector. Here, a CPU is assembled from several separate chips. This will also make it possible to replace individual chips with specialized hardware in the future. It is to be expected that specialized ML chips will be offered as standard in server CPUs. Especially since, according to Intel, 70% of all AI inferencing takes place on Intel Xeon server processors.
One disadvantage of CPUs is the low compute density. Compared to other options, CPUs require significantly more space for the same computing power. This can be a disadvantage, but it does not have to be. Here, not only the location where the application is to be executed plays a role, but also the actual computing capacity required. For example, compute density in a data center is much more important than running on existing desktop computers. Another important factor is the power consumption of a CPU under full load. But not only the direct electricity costs have to be considered, but also the costs for cooling. In addition, modern high-performance server CPUs must be liquid-cooled. This can cause additional costs, especially if the required infrastructure is not yet in place.
Graphics Processing Unit (GPU)
Another hardware option is the calculation on a GPU. GPUs were originally developed as computing accelerators for graphical calculations (graphics processors). Since these consist largely of vector calculations, graphics cards are also very well suited for ML calculations.
Intel and Nvidia, two of the three major graphics chip manufacturers, use the same architecture for their desktop and server GPUs and also use special circuitry in their designs to speed up the computation of ML tasks. Only AMD uses two different architectures for compute and graphics calculations, namely CDNA and RDNA. Their dedicated ML accelerators are offered exclusively in the CDNA-based server GPUs. But even without these special accelerators, GPUs are vastly superior to CPUs in most applications of ML.
Compared to CPUs, this high compute density consumes significantly more power. This is a big disadvantage of GPUs. The power supply and cooling requirements for GPU servers are significantly higher compared to servers without GPU. These increased requirements, especially for cooling, makes for significantly higher prices for GPU servers in data centers.
FPGAs are programmable circuits. Compared to CPUs or GPUs, which both have fixed circuits, it is possible to freely program the circuitry of FPGAs. This makes them the most flexible type of computing accelerator – but at the same time, it is also the most difficult to implement. Most FPGAs are sold as chips. However, the complete circuit to actually use it must be created by yourself. Therefore, it is important to consider FPGAs already during product design and to integrate them into the circuitry of the product. This means: FPGAs are only suitable if the ML algorithm is to be integrated into a new product with its own hardware. Another drawback is the small number of available ML libraries that are compatible with FPGAs. This severely limits the usability of FPGAs, but FPGAs also have great advantages: since the circuit can be changed, it can be optimized for the intended use and later also adapted and updated again. This not only allows great flexibility, but also good power efficiency at high compute density. By optimizing them for their intended use, high-end FPGAs can also keep up with GPUs when it comes to computing power.
TinyML does not directly describe the hardware. Rather, the concept is to optimize and adapt ML tasks in such a way that they can also be run on µcontrollers, whose power consumption is often in the mW range and which have only limited memory and computing capacities. Often, these optimizations result in ML models being drastically downsized with very little performance degradation. The models can then be executed by commercially available µControllers. Smartphones with limited battery and processing power are also a good target for TinyML.
One well-known type of specialized hardware is Google’s TPUs (Tensor Processing Unit). These computational accelerators were built by Google specifically for calculations with the ML library TensorFlow and are comparable to graphics cards that only consist of matrix computation units. TPUs are only available when using Google’s cloud computing service.
It’s hard to make a general statement about custom ML hardware, because the range and the intended use of the available hardware is too large. This starts with small µcontrollers like the MAX78000 and goes up to whole clusters of ML specialized hardware with millions of cores like Cerebra’s Andromeda supercomputer.
CPUs and GPUs are by far the most well-known and widespread means of ML computation but they are not the only ones. The big advantages of CPUs and GPUs are availability and versatility. They are not only usable for ML applications and can be used for other purposes as well. In addition, almost all ML libraries have been built with a focus on CPU and GPU computations. Therefore, both the most features and the greatest reliability can be found here.
For applications that are to be executed on desktop computers at the customer’s, calculations on GPUs are particularly interesting. These often have special ML circuits that are not used by most graphical applications. This allows you to increase the utilization and utility of the hardware without much effort. FPGAs as well as µControllers are only suitable for very special fields of application, where own hardware is designed and therefore can be completely integrated into the design from the beginning. The additional effort required to design, test, and build the hardware, as well as the additional effort and support for custom hardware, would otherwise be too great.
Want to learn more about AI and machine learning?