Runyu Lu’s Personal Page
Biography
- 20 years old now. Undergrad@HUST -> PhD@UMich(Ang Chen&Mosharaf Chowdhury),
- I am very fortunate to be advised by Hao Zhang(UCSD), Lingming Zhang(UIUC) and Zhiyuan Shao(HUST) in my undergrad Research.
- I was involved in the development of ColossalAI@HPCAI-Tech, LLVM@SenseTime, NCNN@Tencent.
Btw, my wechat id is
Runyu_Lu
, please feel free to chat with me or drop me an email!
Education
- B.S. in WuHan, Huazhong University of Science and Technology, the elite program in the Department of Computer Science, with GPA 3.95, 2020.9-2024.6
- PhD in Ann Arbor, University of Michigan, Department of Computer Science & Engineering, Advisors: Ang Chen & Mosharaf Chowdhury, 2024.9-
Publications
Authors: Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu, Reyhaneh Jabbarvand, Lingming Zhang
(Preprint), Oct 2023, Arxiv available
*
denotes joint first authors
Academic Experiences
UCSD lmsys Lab La Jolla Augu. 2023 - Present
- Role: Role: Research Intern, Second author, paper already submitted to xxxx’24, Arxiv Available soon
- Mentor: Jiangfei Duan, Hao Zhang
- Advisor: Hao Zhang
- Profiled the bottleneck of current SOTA LLM Serving framework(e.g., vllm, ppl.llm).
- Improve the GPU SM utilization to accelerate the serving throughput of LLMs
UIUC PLSE Champaign, IL; June. 2023 - Sept. 2023
- Role: Role: Research Intern, Third author, paper already submitted to xxxx’24, Arxiv Available in this link
- Mentor: Chenyuan Yang Yinlin Deng Jiawei Liu,
- Advisor: Lingming Zhang
- Duties included: Responsible for the LLVM part of this project.
- Test optimization in compilers with white-box fuzzing technique by leveraging LLMs
- Detect 96 bugs of Pytorch, TensorFlow XLA, TensorFlowLite, LLVM based on the optimization source code
HUST CGCL WuHan, China; Oct. 2022 - June. 2023
- Role: Research Intern, Co-first author, paper is submitted to xxxx’24
- Mentor: Hongru Gao
- Advisor: Zhiyuan Shao, Hai Jin
- Duties included: Based on the the memory bound of graph processing SOTA algorithm, a more efficient dynamic-graph-friendly data storage format is proposed, which involves modifications to the page table operating system kernel(Linux).
- Remap the PageTable of OS Kernel to accelerate the dynamic graph processing system.
- Speed up existing SOTA algorithms by more than 10x times.
Industrial Experiences
@Sensetime Company Shanghai, China; April 2023 - Augu 2023
- Role: LLVMer
- Mentor: Wenqiang Yin
- Duties included: Optimizing the backend of LLVM based on the SenseTime GPU.
- 4000+ line LLVM GPU Backend Optimization Codes
- Instruction Selection, Instruction Pattern Match, such as optimize the ld/st into async_ld/st, CodeGen Emitter
- Optimize the threadidx/blockDim based on their range.
- If there’s one thing LLVM has taught me, it’s that patience is a virtue. A virtue I never knew I had until I spent countless hours debugging its intricacies :(
@Tencent Company ShenZhen, China; June 2022 - Nov. 2022
- Project: ncnn, an open source project with 18k+ stars in Github
- Role: Top15 committer(util
Nov.2022
) of 269 committers in total - Mentor: nihui(with Github 6k followers)
- Duties included: Write and Optimize(such as SIMD) operators for ncnn, mainly aligned with pytorch, some examples I built:
- GridSample: Given an input and a flow-field grid, computes the output using input values and pixel locations from grid.
- To be noted, the PNNX of ncnn, a new
PyTorch Neural Network eXchange
, draw on the design concept ofMLIR
- To be noted, the PNNX of ncnn, a new
- GELU: Implement
sse/avx/avx512
version of gelu, with a fast version oferfc
.
- GridSample: Given an input and a flow-field grid, computes the output using input values and pixel locations from grid.
Hobbies
- Football, crazy fan of Lionel Messi, FC Barcelona, and Argentina National Team.
- F1, crazy fan of Charles Leclerc(racing in Formula One for Scuderia Ferrari), Guanyu Zhou(a Chinese racing driver who currently competes in Formula One for Alfa Romeo).