Skip to content

Commit 3b55249

Browse files
committed
Add NDSEG and Columbia essays
0 parents  commit 3b55249

File tree

9 files changed

+345
-0
lines changed

9 files changed

+345
-0
lines changed

LICENSE

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
2+
Version 2, December 2004
3+
4+
Copyright (C) 2022 Haoda Wang <[email protected]>
5+
6+
Everyone is permitted to copy and distribute verbatim or modified
7+
copies of this license document, and changing it is allowed as long
8+
as the name is changed.
9+
10+
DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
11+
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
12+
13+
0. You just DO WHAT THE FUCK YOU WANT TO.

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Some guy's grad school stuff

fellowships/ndseg/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# NDSEG
2+
3+
DoD-funded 3-year fellowship - [Link](https://ndseg.org/)

fellowships/ndseg/essay.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Personal Essay
2+
3+
"But what if it was... faster?" This single sentence somehow accurately sums up the entirety of my research direction. I focused on this idea throughout my work across various fields including binary analysis, networking and operating systems, though these projects have also helped me realize that “faster” must also include "reliable" and "secure." Over the course of my research career, I have had the privilege to find new ways to build "faster" software systems across both government and academic research labs.
4+
5+
I started my research career as a first-year undergrad in a cybersecurity lab, where I was able to collaborate with PhD students on projects with topics ranging from network security to binary analysis and symbolic execution. These projects have resulted in two publications submitted to NDSS and USENIX Security.
6+
7+
This work then led me to a Department of Energy lab, where I built a line-rate packet capture system on commodity hardware to support 100Gbps network cards. This enabled high-speed packet capture for network testbeds without the need for expensive hardware. This was my first independent research project, and I was able to undertake every part of the research project lifecycle. My research there also provided me with a strong understanding of computer systems at the OS level and the many limits modern computers still face.
8+
9+
I then joined a project on simulation and verification for the Mars 2020 project, where I built tools that ensured the integrity of the mission, centered around the rover’s flight software. One of these tools was a fuzzer which found multiple fatal bugs within the flight software of the Mars 2020 rover.
10+
11+
Through my work on Mars 2020, I also developed an interest in FPGAs and heterogeneous computing. This interest brought me to an academic networking lab this summer, where I worked on an FPGA-accelerated intrusion detection system. Through this work, I was also able to appreciate the value of FPGAs in this post-Moore era of computing.
12+
13+
Currently, I am working on a DARPA-funded project attempting to extract mathematical expressions from cyber-physical binaries. This project will significantly advance the field of reverse engineering by developing new methods to understand the execution of an unknown binary. I am also building a line-rate FPGA SmartNIC simulation tool for testbeds as part of my senior thesis, which will enhance HDL testing for FPGA developers and testbed users interested in FPGAs.
14+
15+
I hope to continue working in this field by entering a PhD program next Fall, where I can explore the interplay between software and hardware in computing systems. I am especially interested in finding novel ways to build secure, reliable, and performant systems.
16+
17+
After my PhD, I wish to continue finding ways to improve systems as a research scientist in academia or a national lab. The NDSEG Fellowship will enable me to work on my research ideas while also allowing me to find connections within the DoD research directorate that may help point me in new directions for my research in the future.

fellowships/ndseg/proposal.pdf

111 KB
Binary file not shown.

fellowships/ndseg/src/main.tex

Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
\documentclass[11pt]{article}
2+
\usepackage[letterpaper, margin=1in]{geometry}
3+
\usepackage[utf8]{inputenc}
4+
\usepackage[T1]{fontenc}
5+
\usepackage{listings}
6+
\usepackage{xcolor}
7+
\usepackage{inconsolata}
8+
\usepackage{graphicx}
9+
\usepackage{enumitem}
10+
\usepackage{amsfonts}
11+
\usepackage{amsmath}
12+
\usepackage{indentfirst}
13+
\usepackage[backend=biber, citestyle=ieee]{biblatex}
14+
15+
\addbibresource{references.bib}
16+
17+
\lstset{
18+
language=C,
19+
belowcaptionskip=1\baselineskip,
20+
breaklines=true,
21+
frame=L,
22+
xleftmargin=\parindent,
23+
showstringspaces=false,
24+
basicstyle=\ttfamily,
25+
keywordstyle=\bfseries\color{green!40!black},
26+
commentstyle=\itshape\color{purple!40!black},
27+
identifierstyle=\color{blue},
28+
stringstyle=\color{orange},
29+
}
30+
31+
\title{\vspace{-2cm} {Automating FPGA Offload with Symbolic Execution}}
32+
\date{\vspace{-2cm}}
33+
34+
\begin{document}
35+
36+
\maketitle
37+
38+
\section{Background}
39+
40+
Field-programmable gate arrays (FPGAs) are an essential component of the latest datacenters and compute clusters.
41+
They are especially valuable as compute accelerators since they can often outperform CPUs at specific tasks \cite{ma2017optimizing}.
42+
However, their design also presents a substantial challenge to the users of these clusters and datacenters.
43+
FPGAs require the user to write program descriptions in a hardware description language (HDL) such as Verilog, which has a significant learning curve compared to high-level languages such as C.
44+
45+
While solutions such as high-level synthesis (HLS) can transpile a high-level language such as C to HDL, integrating the FGPA into a compute workflow still requires significant effort.
46+
Since not all logic is suitable to be run on an FPGA, some code segments still have to be run on the host.
47+
Thus, communication libraries between the CPU and FPGA need to be written carefully to ensure maximum performance and often require domain-specific knowledge on interconnects such as PCIe \cite{sommer2017openmp}.
48+
Validation of FPGA code is also extremely time-consuming due to the inability of CPUs to effectively emulate FPGAs \cite{simpson2010fpga}.
49+
Thus, it may often be impractical to spend time integrating FPGAs into a workflow, despite their substantial performance benefits.
50+
A potential solution may be to find a way automatically select which components of a program to offload to an FPGA.
51+
52+
\section {Proposal}
53+
54+
This project will address the difficulty in creating performant and reliable FPGA offloading solutions by creating a compiler that will automatically convert code segments into HDL.
55+
This automation can work since the semantics of a program can reflect its suitability for offloading.
56+
57+
The most primary requirement for logic that is suitable for offload to an FPGA is that the code must be able to be pipelined \cite{simpson2010fpga}.
58+
Namely, the code must have minimal overlapping operations, and side effects cannot modify the execution of the function.
59+
Both of these requirements can be tested for through binary analysis.
60+
61+
Symbolic execution is especially suitable for checking code segments for offload suitability.
62+
A symbolic execution engine such as \texttt{angr} can generate an abstract syntax tree (AST) from a given binary \cite{wang2017angr}.
63+
The AST represents the set of operations that will define the value of a variable at some time during the program’s execution.
64+
Heuristics to determine if the function is pipelineable can be created using this data.
65+
66+
The data dependency graph (DDG) will provide an important metric for determining offload suitability.
67+
Each node in this graph is a variable in the function, and each directed edge represents a variable that influences the final state of another variable.
68+
If cycles exist in such a graph, it may indicate that this function might not be efficiently pipelined.
69+
The control flow graph (CFG) of the function can also be used to identify loop conditions that may not be efficiently pipelined.
70+
71+
Other suitability analyses will need to be made as well to ensure that the FPGA can implement the required design.
72+
For example, the total size of all variables cannot exceed the total number of registers available on the FPGA, and allocated memory cannot exceed the total memory on the FPGA.
73+
Other considerations may include identifying code regions that may work well on specialized components within the FPGA aceclerators.
74+
For example, many FPGA vendors offer intellecutal property cores (IPs) that offer better performance than custom implementations of the same function.
75+
The compiler may need to take these IPs into account by generating profiles for each FPGA.
76+
77+
This project will create a compiler that can compile C source code into both a CPU portion and an FPGA portion.
78+
The compiler will compile the input C program into a binary and then analyze each function of the binary as described above.
79+
This allows the program to make use of compiler optimizations such as code inlining and loop unrolling detect code regions that may not otherwise be offloadable.
80+
81+
The compiler will then automatically determine which sections of the program can be offloaded to FPGAs through the heuristics above.
82+
It will extract the largest selection of code regions that was found to be offloadable and decompile that binary to HLS C.
83+
This preserves the semantics of the compiled program and allows conversion to the subset of C used by the HLS suite while keeping compiler optimizations.
84+
It will also add functions for communication between the host and the FPGA, and finally flash the compiled HLS project to the FPGA.
85+
86+
This project will also include a library for both the FPGA and the CPU which will enable communication between them.
87+
This can be done by building a kernel driver and a Verilog \texttt{module} that will interface between the offloaded code segments and the vendor's PCIe IP.
88+
89+
\section{Evaluation}
90+
91+
This project can be evaluated on any computer with an FPGA card supporting PCIe.
92+
Real-world programs making heavy use of parallelization will be evaluated.
93+
This will include applications such as the NAMD molecular dynamics library \cite{phillips2005scalable} or the Stockfish chess engine.
94+
These programs are highly parallelized and will test the pipeline heuristic of the prototype.
95+
Furthermore, some of these projects also support common APIs such as OpenMP and OpenMPI, which this project should also support.
96+
97+
Benchmarks such as those in NAMD and Stockfish also have custom metrics that can also be compared between the two versions.
98+
These results could take into account side effects that may not be apparent when looking solely at latency and throughput.
99+
100+
Performance comparisons can then be made between the host-only version and the offloaded version.
101+
Significant metrics to test include the latency and throughput of the application when offloaded, as well as total runtime of each version.
102+
The performance of the communication library will play a substantial role in the test results as well.
103+
104+
\section{Related Work}
105+
106+
Sommer et al. presented an addition for the OpenMP API that runs user-specified blocks on an FPGA using Vivado HLS \cite{sommer2017openmp}. While this presents an option to write FPGA offloads without writing any communication libraries, the project cannot automatically select code locations to offload. Instead, they are specified using a \texttt{\#pragma} directive, like a standard OpenMP call. However, the project also implemented a communication library between the host and the FPGA using PCIe.
107+
108+
Yamato presented a project that allowed for automatic offloading of specific function blocks \cite{yamato2021automatic}.
109+
This project relies on static code analysis on the source code and therefore cannot account for compiler optimizations that may unroll loops and present new opportunities for offloading.
110+
Use of symbolic execution on a compiled binary will take advantage of compiler optimizations such as loop unrolling to identify offloadable blocks that may not be identifiable in the source.
111+
Furthermore, Yamato's project also requires the source code of the original program and thus cannot support offloading from a binary.
112+
113+
\section{Relevance to the Department of Defense}
114+
115+
This project will enable software developers to deploy their applications to FPGA accelerators with minimal effort.
116+
If successful, the compiler developed by this project may significantly increase the performance of many common computing operations.
117+
118+
This is especially relevant for ARL's research focus on supercomputing technologies, under section KCI-CS-1 of BAA W911NF-17-S-0003.
119+
In particular, this work will allow users to leverage large-scale on-chip parallelism without the specialist knowledge in RTL programming usually required.
120+
121+
This project may also be applicable to ARL's dynamic binary translation focus, by allowing developers to target FPGAs with pre-existing binaries.
122+
While the proposed project will work with source code, it can also be used to translate binaries as it operates at an intermediate representation level.
123+
124+
\section{Societal Impacts}
125+
126+
Highly parallel processing is an essential part of many scientific discoveries that may prove to be societally significant.
127+
For example, the SUMMIT supercomputer was used to model the COVID-19 spike protein and discovered a set of molecules that could disrupt the virus’ infectivity \cite{smith2020repurposing}.
128+
Similar techniques are used to model the Earth's climate and present possible futures \cite{mizielinski2014high}.
129+
However, these calculations require considerable amounts of compute time, and the clusters often draw a large amount of power while doing so.
130+
131+
FPGAs also often offer better power efficiency and can potentially speed up computations by orders of magnitude \cite{asano2009performance}.
132+
Accelerating similar scientific projects through FPGA offloading could benefit other societally significant work by decreasing computation time and power usage.
133+
Having their results sooner also allows scientists to spend less time waiting for their experiments to finish or allow previously infeasible calculations to be undertaken.
134+
135+
\section{Relevant Qualifications}
136+
137+
A significant portion of my undergraduate experience focused on building high-performance distributed systems for various networked use cases.
138+
My research on defenses against denial of service attacks familiarized me with the Linux kernel.
139+
Similarly, my project at a NNSA laboratory on building network testbed technology provided me with a solid background in building high performance software systems.
140+
Furthermore, I am familiar with software development through my participation in the Mars 2020 project that will be essential for building a prototype with real-world applicability.
141+
142+
Furthermore, I also have extensive experience working with FPGAs in a high performance setting.
143+
My work with FPGA SmartNICs this past summer provided me with the experience needed to build designs for such accelerators.
144+
I am also currently working on a related project that is developing a high performance Verilog-to-C transpiler for networking tasks.
145+
These experiences has provided me with additional insights into designing for FPGAs with both HDL and Verilog.
146+
147+
I also have significant experience working with the \texttt{angr} binary analysis toolkit, including on a DARPA-funded project to extract mathematical expressions from cyber-physical binaries.
148+
This work familiarized me with symbolic execution and static analysis methods, and will be invaluable for the success of this project.
149+
150+
\newpage
151+
152+
\printbibliography
153+
154+
\end{document}

fellowships/ndseg/src/references.bib

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
@inproceedings{asano2009performance,
2+
title={Performance comparison of FPGA, GPU and CPU in image processing},
3+
author={Asano, Shuichi and Maruyama, Tsutomu and Yamaguchi, Yoshiki},
4+
booktitle={2009 international conference on field programmable logic and applications},
5+
pages={126--131},
6+
year={2009},
7+
organization={IEEE}
8+
}
9+
10+
@inproceedings{ma2017optimizing,
11+
title={Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks},
12+
author={Ma, Yufei and Cao, Yu and Vrudhula, Sarma and Seo, Jae-sun},
13+
booktitle={Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},
14+
pages={45--54},
15+
year={2017}
16+
}
17+
18+
@inproceedings{sommer2017openmp,
19+
title={OpenMP device offloading to FPGA accelerators},
20+
author={Sommer, Lukas and Korinth, Jens and Koch, Andreas},
21+
booktitle={2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)},
22+
pages={201--205},
23+
year={2017},
24+
organization={IEEE}
25+
}
26+
27+
@book{simpson2010fpga,
28+
title={FPGA design},
29+
author={Simpson, Philip},
30+
year={2010},
31+
publisher={Springer}
32+
}
33+
34+
@inproceedings{wang2017angr,
35+
title={Angr-the next generation of binary analysis},
36+
author={Wang, Fish and Shoshitaishvili, Yan},
37+
booktitle={2017 IEEE Cybersecurity Development (SecDev)},
38+
pages={8--9},
39+
year={2017},
40+
organization={IEEE}
41+
}
42+
43+
@article{phillips2005scalable,
44+
title={Scalable molecular dynamics with NAMD},
45+
author={Phillips, James C and Braun, Rosemary and Wang, Wei and Gumbart, James and Tajkhorshid, Emad and Villa, Elizabeth and Chipot, Christophe and Skeel, Robert D and Kale, Laxmikant and Schulten, Klaus},
46+
journal={Journal of computational chemistry},
47+
volume={26},
48+
number={16},
49+
pages={1781--1802},
50+
year={2005},
51+
publisher={Wiley Online Library}
52+
}
53+
54+
@article{yamato2021automatic,
55+
title={Automatic offloading method of loop statements of software to FPGA},
56+
author={Yamato, Yoji},
57+
journal={International Journal of Parallel, Emergent and Distributed Systems},
58+
pages={1--13},
59+
year={2021},
60+
publisher={Taylor \& Francis}
61+
}
62+
63+
@article{smith2020repurposing,
64+
title={Repurposing therapeutics for COVID-19: Supercomputer-based docking to the SARS-CoV-2 viral spike protein and viral spike protein-human ACE2 interface},
65+
author={Smith, Micholas and Smith, Jeremy C},
66+
year={2020}
67+
}
68+
69+
@article{mizielinski2014high,
70+
title={High-resolution global climate modelling: the UPSCALE project, a large-simulation campaign},
71+
author={Mizielinski, MS and Roberts, MJ and Vidale, PL and Schiemann, R and Demory, M-E and Strachan, J and Edwards, T and Stephens, A and Lawrence, BN and Pritchard, M and others},
72+
journal={Geoscientific Model Development},
73+
volume={7},
74+
number={4},
75+
pages={1629--1640},
76+
year={2014},
77+
publisher={Copernicus GmbH}
78+
}

schools/columbia/columbia_sop.pdf

33.9 KB
Binary file not shown.

0 commit comments

Comments
 (0)