Hardware Acceleration for AI applications and Neuromorphic Computing

Artificial Intelligent algorithms are exerting an evolutionary influence on human society covering a wide spectrum of applications, including image recognition, speech understanding, robots intelligence, traffic control, and data analytics. The improving performance of Artificial Intelligent algorithms demand high-performance computation and memory resources. Domain specific architecture, hardware friendly algorithms as well as emerging technologies are necessary to fuel the development of Artificial Intelligent applications. Our research focus on addressing the challenges of AI hardware acceleration and neuromorphic computing in the following three aspects:

A.Solving the Computing Challenges for AI applications

We address the computing challenges for AI hardware acceleration with various approaches: (1) Instruction set architecture design for neural network. Collaborating with Cambricon Inc and Chinese Academy of Science, we propose a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, based on a comprehensive analysis of existing NN techniques.  (2) We study FPGA-based Deep-Neural Network Accelerators design to achieve high-performance and low power implementation. Software-optimization techniques from Deephi such as model compression leads to significant power savings and performance improvement. Deep Learning Accelerator Unit (DLAU) accelerator employs three pipelined processing units to improve the throughput and utilizes tile techniques to explore locality for deep learning applications (3) Heterogeneous computing for neural network applications and robotics applications. We propose CNNLab, a novel deep learning framework using GPU and FPGA-based accelerators. CNNLab provides a uniform programming model to users so that the hardware implementation and the scheduling are invisible to the programmers. For robotic workloads, we propose a heterogeneous architecture named HEMERA, to achieve required performance and energy efficiency. A runtime framework is further proposed to efficiently manage acceleration of different workloads.

B.Solving the Memory Challenges for AI Hardware Acceleration

We propose various methods to boost the memory performance and energy efficiency for AI acceleration: (1) Process-In-Memory Accelerator Using Emerging Non-volatile Devices (PRIME). Processing-in-memory (PIM) is a promising solution to address the “memory wall” challenges for future computer systems. The emerging metal-oxide resistive random access memory (ReRAM) has showed its potential to be used for main memory. (2) DRAM-based Reconfigurable In-situ Processing architecture (DRISA). DRISA is majorly composed of DRAM memory arrays in which each memory bitline is designed to perform bitwise Boolean logic operations (such as NOR) and support in-situ computing. (3) HBM-Enabled GPU for Data-intensive Applications. We propose a software pipeline to alleviate the capacity limitation of the HBM for CNN. We then design two programming techniques to improve the utilization of memory bandwidth for BFS application.

C.Novel Architecture Design for Neuromorphic Computing with Emerging Technology

Neuromorphic computing learns from the high energy-efficiency information processing capability from human brain. We proposes many novel architectures for neuromorphic computing enabling by emerging device and circuit technologies. (1) Simulation Platform for Memristor-based Neuromorphic Computing System.  We propose a simulation platform for the memristor-based neuromorphic system, called MNSIM. A hierarchical structure for memristor-based neuromorphic computing accelerator is proposed to provides flexible interfaces for customization. (2) Training-in-memory Architecture for RRAM-based Multilayer Neural Networks. We propose a novel architecture, TIME, and peripheral circuit designs to enable the training of NN in RRAM. (3) Neural Network Transformation and Co-design under Neuromorphic Hardware Constraints. We propose a toolset called NEUTRAMS (Neural network Transformation, Mapping and Simulation), and include three key components: a neural network (NN) transformation algorithm, a configurable clock-driven simulator of neuromorphic chips and an optimized runtime tool that maps NNs onto the target hardware for better resource utilization.(4) We also study how to utilize 3D technology to better support Neural Networks.