Reconfigurable architectures are those architecture whose functionality is plastic and unfixed. Unlike for example a general-purpose processor -- whose functionality (# ALUs, # Cores, etc.) is fixed --
a reconfigurable architecture is flexible. One example of a modern reconfigurable architecture is the Field-Programmable Gate Array (FPGA). I have been working with FPGAs for a long time,
particularly on how they can be used to complement-/compete-with existing compute devices with respect to performance and power-efficiency. My research have included both the use of
High-Level Synthesis (HLS) tools and the development of them.
Some exciting papers of mine on the subject:
- Accelerating Parallel Computations with OpenMP-Driven System-on-Chip Generation for FPGAs, Artur Podobas, 2014, MCSoC (Link)
- Empowering OpenMP with Automatically Generated Hardware, Podobas et al., 2016, SAMOS (Link)
- Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs using OpenCL, Zohouri and Podobas et al., 2018, FPGA Link)
- Hardware Implementation of POSITs and their Application in FPGAs, Podobas et al., 2018, RAW (Link)
- Scaling Performance for N-Body Stream Computation with a Ring of FPGAs, Huthmann, Shin, Podobas et al., 2019, HEART (Link)
A recent interest of mine is how to build neuromorphic -- biologically inspired -- systems, preferably using reconfigurable architectures. Here I am looking at how to map different neuron- and synapse-models
to modern hardware in order to execute (or "simulate") as many and as large systems as possible. Turns out that reconfigurable architectures are a good match for bio-inspired systems, and even using more
abstract programming methods such as High-Level Synthesis (HLS) can yields many times better execution performance than on general-purpose systems.
Two papers on the subject:
- Designing and Accelerating Spiking Neural Networks using OpenCL for FPGAs, Artur Podobas and Satoshi Matsuoka, 20147 FPT (Link)
- Accelerating Spiking Neural Networks on FPGAs using OpenCL, Artur Podobas and Satoshi Matsuoka 2017, IPSJ ARC Tech.Report (Link)
My research interest in parallel computing began even before my PhD studies, where I researched strengths and weaknesses in different parallel programming libraries (e.g. Cilk-5, TBB, OpenMP, GCD).
I have since done various research on both homogeneous and hetergeneoeus systems, specifically honoring the concepts of tasks. During my PhD studies I also created a prototype system called BLYSK.
BLYSK was a task-based runtime system I created to use for experimentation with runtime system schedulers during. It is supposed to be a api-compatible
replacement for GCCs OpenMP runtime system (libgomp), but significantly faster and more versatile. BLYSK was also used in the PaPP (https://artemis-ia.eu/project/44-papp.html
me and colleague Lars Bonnichsen (PhD, DTU) included support for speculation in OpenMP.
You can find a version of BLYSK at: Github
Some selected and exciting publications on the subject of parallel computing, performance vizualization, and performance prediction:
- TurboBLYSK: scheduling for improved data-driven task performance with fast dependency resolution, Podobas et al., 2014, IWOMP (BEST PAPER AWARD) (Link)
- Using transactional memory to avoid blocking in OpenMP synchronization directives, Bonnischen and Podobas, 2015, IWOMP (Link)
- A comparative performance study of common and popular task-centric programming frameworks, Podobas et al., Concurrency and Computation, 2015 (Link)
- Grain Graphs: OpenMP Performance Made Easy , Muddukrishna and Podobas et al., PPoPP, 2016 (Link)
- MACC: An OpenACC Transpiler for Automatic Multi-GPU Use, Matsumura and Podobas et al., SCAsia, 2018 (Link)
- Learning Neural Representations for Predicting GPU Performance, Shweta, Drozd, Podobas, et al. , ISC, 2019 (Link)