Intel 8080, an early CPU design from Intel in 1970s powered one of the first Personal Computers, Altair 8800. Following commercial success of desktop computers created the whole new industry of software development. Significant evolutions in hardware design such as SIMD processors, VLIW architectures and multi-core CPUs over the time have influenced the way efficient software is designed and deployed. Appearance of multi-core CPUs such as AMD Athlon64 X2 and IBM POWER4 and a huge increase in the number of Supercomputers in the last decade resulted in the growth of the field of parallel computing. RISC based low-power architectures from ARM was perhaps the major reason for the boom in handheld devices and the associated software innovations & services. In the last few years, we have seen the advent GPUs in general purpose computing, changing the way applications are developed for HPC applications.
In essence, the advancements in the hardware world profoundly influences the trends of Computer Science and Engineering research. The recent breakthroughs and surge in popularity of Deep Learning clearly manifests this. Therefore, the development of next generation computing infrastructure will be based on today's needs and will define the way next generation software is developed, making it relevant to the larger CS Research community.
There are a few factors that requires us to bring a major change in the way we perform compute.
If we look back at the recent trends in the SoCs released for applications ranging from sensors to servers, there is one trend that is quite common. Consider the specs of an early ARM based Qualcomm SoC for mobile devices: Snapdragon S1. It has an ARM CPU, a GPU and a small Video processor other than the modem. However, if we look at a recent Snapdragon 835 SoC, along with the CPU, GPU and Video, there is a dedicated Image Processor, a display control processor, a dedicated Audio Codec, a powerful DSP processor in three variants and a security subsystem. There are also talks of the addition of dedicated processor for Machine Learning named NPU (Neural Processing Unit). This can be seen even in Desktop class SoCs from Intel and AMD. There is one key take away here: Things are getting Heterogeneous.
To give a perspective, the Silicon area occupied by the CPU is <15% in a modern mobile SoC!
Well, you must be thinking, "Of course! ASICs are more efficient since they sacrifice the flexibility". Yes, that is the point: they are orders of magnitude more efficient and Silicon area isn't a concern if the power consumption is low. Most systems today are deployed for specific tasks- IoT systems, Automotive Infotainments, Specialised servers for ML, Big Data, Genome Sequencing etc. We know what they will be used for, then there isn't a compelling necessity to offer the "flexibility" that CPUs have to offer (But should be programmable for that specific application). Moreover, specialised hardwares in the SoC can be power gated and can be brought up when needed.
The next generation computers are going to get a lot heterogeneous. There will be plenty of special purpose hardware blocks, optimised for different tasks. To put it in other words, the CPU will no more be a Central "Processing" Unit, rather it will be a Central "Control" Unit. CPUs are built to handle tasks with lot of conditional branches, but not built for the sheer compute horsepower. The heavy processing will be lifted by the special purpose hardwares, and CPU would run the OS and the control algorithms.
The ol' times! [ref] |
Why do you think there is a revolution coming?
- Moore's Law Uncertainty: We have seen the process nodes shrinking over time, but now dangerously close to the atomic sizes. Moore's scaling may not continue to push the Perf/Watt of CPUs anymore. I spoke to many researchers who are working on sub-10nm technologies and all of them express skepticism in going further. They explained how challenging the scaling in terms of transistor density is and how low the yield could be.
- Increase in Performance Demand: Fast growing fields such as Deep Learning demand more compute power than ever, because for them, the availability of more compute is the key for better results. Usual routine of improving 10% performance per CPU design cycle isn't going to cut it.
- Parallelism is limited: Multi-core era helped us to push through the performance barriers posed by saturating single thread performance. However, as per Amdahl's law, the amount to which programs can be parallelized limits the performance growth using multi-core systems. This is true in most of the day to day raw compute requirements. It is clear that putting in 1000s of cores isn't a great solution for many applications.
- Power Wall: There are two extremes here.
- Low-end devices: Growth in handheld devices and the expected boom of IoT devices restrict us in terms of power and Silicon area budget. Many applications that were restricted to high-end computers are coming down to handheld devices, which requires a performance scaling at the same power envelope.
- High performance computing: The dream of Exascale computing would remain a dream unless we figure how we can power such a system without needing a Thermal Power plant for the computer itself[ref]!
Towards Heterogeneous Compute
If we look back at the recent trends in the SoCs released for applications ranging from sensors to servers, there is one trend that is quite common. Consider the specs of an early ARM based Qualcomm SoC for mobile devices: Snapdragon S1. It has an ARM CPU, a GPU and a small Video processor other than the modem. However, if we look at a recent Snapdragon 835 SoC, along with the CPU, GPU and Video, there is a dedicated Image Processor, a display control processor, a dedicated Audio Codec, a powerful DSP processor in three variants and a security subsystem. There are also talks of the addition of dedicated processor for Machine Learning named NPU (Neural Processing Unit). This can be seen even in Desktop class SoCs from Intel and AMD. There is one key take away here: Things are getting Heterogeneous.
Overview of the Snapdragon 820 featuring various special purpose hardware components |
To give a perspective, the Silicon area occupied by the CPU is <15% in a modern mobile SoC!
What's the next big thing?
Well, you must be thinking, "Of course! ASICs are more efficient since they sacrifice the flexibility". Yes, that is the point: they are orders of magnitude more efficient and Silicon area isn't a concern if the power consumption is low. Most systems today are deployed for specific tasks- IoT systems, Automotive Infotainments, Specialised servers for ML, Big Data, Genome Sequencing etc. We know what they will be used for, then there isn't a compelling necessity to offer the "flexibility" that CPUs have to offer (But should be programmable for that specific application). Moreover, specialised hardwares in the SoC can be power gated and can be brought up when needed.
How efficient are ASICs? |
The next generation computers are going to get a lot heterogeneous. There will be plenty of special purpose hardware blocks, optimised for different tasks. To put it in other words, the CPU will no more be a Central "Processing" Unit, rather it will be a Central "Control" Unit. CPUs are built to handle tasks with lot of conditional branches, but not built for the sheer compute horsepower. The heavy processing will be lifted by the special purpose hardwares, and CPU would run the OS and the control algorithms.
The Heterogeneous Revolution has already begun and like all the revolutions, we only realise it once it is over! This is high time for computer architecture researchers to explore other fields and offer ASIC solutions, and integrate them in the system. The system integration in terms of I/Os and software is really the key, and in the coming days, we would only see SoCs becoming larger and larger with more specialised functionalities.