Runyu Lu


  • 20 years old now. Undergrad@HUST -> PhD@UMich(Ang Chen&Mosharaf Chowdhury),
  • My primary research area is Machine Learning System. In the past I mainly worked on High Performance Computing, in the future I want to dive into the Network and Cloud Computing Part.
  • I am very fortunate to be advised by Hao Zhang(UCSD), Lingming Zhang(UIUC) and Zhiyuan Shao(HUST) in my undergrad Research.
  • I was involved in the development of ColossalAI@HPCAI-Tech(37k+star), LLVM@SenseTime(not open source), NCNN@Tencent(18k+star).

Btw, my wechat id is Runyu_Lu, please feel free to chat with me or drop me an email!



Under Review, xxx/xxxx’24

Authors: Chenyuan Yang, Yinlin Deng, Runyu Lu, Jiayi Yao, Jiawei Liu, Reyhaneh Jabbarvand, Lingming Zhang
(Preprint), Oct 2023, Arxiv available

Authors: Jiangfei Duan, Runyu Lu, Haojie Duanmu, Xiuhong Li, Xingcheng Zhang, Dahua Lin, Ion Stoica, Hao Zhang
The 41st International Conference on Machine Learning (system track)
(ICML 2024), Vienna, Austria, Arxiv available

Academic Experiences

UCSD lmsys Lab                                                       La Jolla Augu. 2023 - Present
  • Role: Research Intern
  • Mentor: Jiangfei Duan, Hao Zhang
  • Advisor: Hao Zhang
  • Profiled the bottleneck of current SOTA LLM Serving framework(e.g., vllm, ppl.llm).
  • Improve the GPU SM utilization to accelerate the serving throughput of LLMs
UIUC PLSE                                                       Champaign, IL; June. 2023 - Sept. 2023
  • Role: Research Intern
  • Mentor: Chenyuan Yang Yinlin Deng Jiawei Liu,
  • Advisor: Lingming Zhang
  • Duties included: Responsible for the LLVM part of this project.
  • Test optimization in compilers with white-box fuzzing technique by leveraging LLMs
  • Detect 96 bugs of Pytorch, TensorFlow XLA, TensorFlowLite, LLVM based on the optimization source code
HUST CGCL                                                  WuHan, China; Oct. 2022 - June. 2023
  • Role: Research Intern
  • Mentor: Hongru Gao
  • Advisor: Zhiyuan Shao, Hai Jin
  • Duties included: Based on the the memory bound of graph processing SOTA algorithm, a more efficient dynamic-graph-friendly data storage format is proposed, which involves modifications to the page table operating system kernel(Linux).
  • Remap the PageTable of OS Kernel to accelerate the dynamic graph processing system.
  • Speed up existing SOTA algorithms by more than 10x times.

Industrial Experiences

@HPCAI-Tech Company                                      Beijing, China; March 2024 - Present
@Sensetime Company                                      Shanghai, China; April 2023 - Augu 2023
  • Project: LLVM, the internal version of SenseTime
  • Role: LLVMer
  • Mentor: Wenqiang Yin
  • Duties included: Optimizing the backend of LLVM based on the SenseTime GPU.
    • 4000+ line LLVM GPU Backend Optimization Codes
    • Instruction Selection, Instruction Pattern Match, such as optimize the ld/st into async_ld/st, CodeGen Emitter
    • Optimize the threadidx/blockDim based on their range.
    • If there’s one thing LLVM has taught me, it’s that patience is a virtue. A virtue I never knew I had until I spent countless hours debugging its intricacies :(
@Tencent Company                                          ShenZhen, China; June 2022 - Nov. 2022
  • Project: ncnn, an open source project with 18k+ stars in Github
  • Role: Top15 committer(util Nov.2022) of 269 committers in total
  • Mentor: nihui(with Github 6k followers), and she’s cute :)
  • Duties included: Write and Optimize(such as SIMD) operators for ncnn, mainly aligned with pytorch, some examples I built:
    • GridSample: Given an input and a flow-field grid, computes the output using input values and pixel locations from grid.
      • To be noted, the PNNX of ncnn, a new PyTorch Neural Network eXchange, draw on the design concept of MLIR
    • GELU: Implement sse/avx/avx512 version of gelu, with a fast version of erfc.


  • Football, crazy fan of Lionel Messi, FC Barcelona, and Argentina National Team.
  • F1, crazy fan of Charles Leclerc(racing in Formula One for Scuderia Ferrari), Guanyu Zhou(a Chinese racing driver who currently competes in Formula One for Stake F1 Team Kick Sauber).
  • a little dance, like jazz/hiphop