Projects

Current Projects




Multi-FPGA system for quantum error correction

Highlights : 

A practical decoder for quantum error correction must decode a dynamic decoding graph comprising all logical qubits in the system in real time, a challenging task due to resource limitations. Helios-net, a first-of-its-kind decoding system that overcomes resource constraints through a distributed multi-FPGA architecture. It employs a hybrid tree-grid topology to minimize latency for lattice surgery operations distributed across multiple FPGAs. Furthermore, Helios-net introduces fusion-Union-Find, a novel approach to decoding merged logical qubits that avoids redundant computations associated with traditional window decoders. Additionally, we designed Helios-net architecture to overcome the IO limitation and minimize data movement latency when integrating with existing quantum control systems. Our exploratory prototype of \name consists of five Xilinx VMK-180 FPGAs and can decode 100 logical qubits (d=5) faster than the rate of measurement. 

FPGA-based Distributed Union-Find Decoder for Surface Codes

Relevant papers

FPGA-based Distributed Union-Find Decoder for Surface Codes (Accepted for IEEE TQE)
Scalable Quantum Error Correction for Surface Codes using FPGA (IEEE QCE 2023)

Highlights


A fault-tolerant quantum computer must decode and correct errors faster than they appear to prevent exponential slowdown due to error correction. The Union-Find (UF) decoder is promising with an average time complexity slightly higher than O(d^3). We report a distributed version of the UF decoder that exploits parallel computing resources for further speedup. Using an FPGA-based implementation, we empirically show that this distributed UF decoder has a sublinear average time complexity with regard to d,  given $O(d^3)$ parallel computing resources. The decoding time per measurement round decreases as d increases, the first time for a quantum error decoder. The implementation employs a scalable architecture called Helios that organizes parallel computing resources into a hybrid tree-grid structure. Using a Xilinx VCU129 FPGA, we successfully implement d up to 21 with an average decoding time of 11.5 ns per measurement round under 0.1% phenomenological noise, and 23.7 ns for d=17 under equivalent circuit-level noise. This performance is significantly faster than any existing decoder implementation. Furthermore, we show that \name can optimize for resource efficiency by decoding d=51 on a Xilinx VCU129 FPGA with an average latency of 544ns per measurement round.

Previous Projects




Fault Recovery in Theseus OS

In this project we implement and evaluate fault recovery in the Theseus Operating System (OS), a new OS developed from scratch. Theseus is a modern OS written from scratch in Rust that explores intralingual design, novel OS structure, and state management. Fault recovery is essential in Theseus as a faulty task can potentially corrupt any OS structure, in the absence of hardware provided isolation.

We implement a series of fault recovery mechanisms on Theseus that take increasingly drastic measures to recover, if recovery was unsuccessful at the previous stage. At first we fully unwind and restart faulty tasks. If the fault is persistent, we replace potentially corrupted modules by loading fresh copies of those modules from the disk to a different location in memory.

We evaluate Theseus’s ability to recover from faults by stress testing our fault recovery implementation in the presence of hardware faults. Furthermore, we show that Theseus can recover from faults occurring in core OS components, e.g., those that necessarily exist within a microkernel, which goes beyond the capabilities of existing works.





Minion : Retrofitting Mobile Sensing Systems with Analog Datapath

In continuous sensing systems, the analog readout circuit, especially the analog to digital convertor (ADC) dominates the power consumption. Yet most of the energy used by ADC is wasted as these systems are always on and discard most data after simple processing.

In this work we try to achieve minimal runtime energy overhead while preserving general-purpose expressiveness when using analog circuits for sensor data processing. Our key idea is a mixed-signal, programmable analog processor, between the sensors and the digital processor, which is responsible for early process and discard of data before the ADC. 

 I worked on designing the digital control path of the project and emulating it using FPGA



FPGA based architectures for Screen Content Coding

Screen content coding (SCC) extension to High Efficiency Video Coding (HEVC) offers substantial compression efficiency over the existing video coding standards for computer generated content. This gain is possible due to exploiting features such as repeated patterns, limited colors, highly saturated areas which are common in computer generated content. However, this gain in compression efficiency is achieved at the expense of further computational complexity with several resource hungry coding tools. Hence, extension of SCC to hardware encoders can be challenging.  

In this work we designed resource efficient hardware architectures for two key SCC tools, Intra Block Copy and Palette Coding. These architectures were emulated on a Virtex-7 FPGA 




FPGA based HEVC encoder

Higher compression efficiency in HEVC encoders comes with increased computational complexity, making real time encoding of high resolution videos a challenging task. This challenge can be addressed by software, yet hardware solutions are more appealing due to their superior performance and low power consumption. 

In this work we designed an FPGA based implementation of an all intra HEVC encoder, which can encode 8 bits per sample, 1920x1080 resolution, 30 frames per second raw video, that is viable in real time even at low operating frequencies. A major obstacle to real time encoding in available architectures is the dependency created by reference generation. 

In this work we designed a new three stage architecture to reduce these dependencies and increase parallelism. All modules can operate up to 200 MHz and the encoder can achieve real time encoding with a minimum operating frequency of 140 MHz in a Xilinx Zynq ZC706 FPGA.

A talk I gave on this work is available here