2.5D Chiplet Architecture for Embedded Processing of High Velocity Streaming Data
Johns Hopkins University
This dissertation presents an energy efficient 2.5D chiplet-based architecture for real-time probabilistic processing of high-velocity sensor data, from an autonomous real-time ubiquitous surveillance imaging system. This work addresses problems at all levels of description. At the lowest physical level, new standard cell libraries have been developed for ultra-low voltage CMOS synthesis, as well as custom SRAM memory blocks, and mixed-signal physical true random number generators based on the perturbation of Sigma-Delta structures using random telegraph noise (RTN) in single transistor devices. At the chip level architecture, an innovative compact buffer-less switched circuit mesh network on chip (NoC) capable of reaching very high throughput (1.6Tbps), finite packet delay delivery, free from packet dropping, and free from dead-locks and live-locks, was designed for this chiplet-based solution. Additionally, a second NoC connecting processors in the network, was implemented based on token-rings, allowing access to external DDR memory. Furthermore, a new clock tree distribution network, and a wide bandwidth DRAM physical interface have been designed to address the data flow requirements within and across chiplets. At the algorithm and representation levels, the Online Change Point Detection (CPD) algorithm has been implemented for on-line learning of background-foreground segmentation. Instead of using traditional binary representation of numbers, this architecture relies on unconventional processing of signals using a bio-inspired (spike-based) unary representation of numbers, where these numbers are represented in a stochastic stream of Bernoulli random variables. By using this representation, probabilistic algorithms can be executed in a native architecture with precision on demand, where if more accuracy is required, more computational time and power can be allocated. The SoC chiplet architecture has been extensively simulated and validated using state of the art CAD methodology, and has been submitted to fabrication in a dedicated 55nm GF CMOS technology wafer run. Experimental results from fabricated test chips in the same technology are also presented.
Neuromorphic Engineering, ASIC, SoC, NoC, interposer, H-tree, Fishbone, Conical-Fishbone, Ultra-low skew, bufferless routing, multi-core, on-chip networks, deflection routing, CMP, Programmable delay, monotonically increasing delay, high speed digital delay, low voltage standard cell library, low power, ultra-low voltage SRAM, Bayesian inference, Online Changepoint Detection, stochastic computing, stochastic processing, stochastic neural computation, Bernstein polynomials, TRNG (true random number generator), RTN (Random Telegraph Noise), Sigma-Delta, High speed memory interface