CV

Contact

mdiener@illinois.edu • matthias.diener@gmail.com
https://matthiasdiener.github.io
https://github.com/matthiasdiener
+1 (413) 317-1713

National Center for Supercomputing Applications (NCSA)
University of Illinois Urbana-Champaign

Key Skills

High-Performance Computing (HPC): Expertise in developing, optimizing, and maintaining HPC software for large-scale simulations. Emphasis on performance, memory efficiency, reproducibility, and cross-platform portability. Strong understanding of low-level computer architecture as well as Linux kernel programming.
Scientific Software Engineering: Proficient in Python, C/C++, OpenMP, and MPI; significant experience developing and maintaining complex codebases such as MIRGE-Com and Charm++; experience in development for accelerators (Nvidia GPUs (CUDA/OpenCL), AMD GPUs (OpenCL), and Intel Xeon Phi).
Open Source Contributor: Active contributor to major HPC, scientific computing, and Python ecosystem projects (PoCL, Spack, conda-forge, vmprof, pyinstrument, among others).

Work Experience

Research Scientist in Computer Science, September 2021 – present. National Center for Supercomputing Applications (NCSA), University of Illinois Urbana-Champaign.
- Research in scientific computing focused on developing and implementing software, tools, and algorithms for current and next-generation HPC platforms.
- Determinism and reproducibility: Developed data structures and algorithms for the deterministic and reproducible execution of distributed applications.
- Performance: Developed methods to profile, track, and improve the performance of distributed applications.
- Memory: Developed methods to detect, track, and fix memory leaks in distributed applications.
Postdoctoral researcher in Computer Science, January 2017 – August 2021. Coordinated Science Laboratory (CSL) and National Center for Supercomputing Applications (NCSA), University of Illinois Urbana-Champaign.
- Memory affinity improvements in parallel systems: Developed automatic mapping mechanisms to detect and optimize the memory access behavior of parallel applications, drastically improving performance and energy efficiency.
- Heterogeneous computing: Designed a model for heterogeneous CPU+GPU computing with an adaptive load balancing scheme to distribute work, based on the OpenMP framework.
Postdoctoral researcher in Computer Science, November 2015 – December 2016. Federal University of Rio Grande do Sul (UFRGS), Brazil.
- Performance portability of parallel applications in the cloud: Improved scheduling of large parallel applications by accounting for network speed, compute variability, and tenant interference, leading to significantly better execution time and portability across cloud platforms.
- Memory tracing: Designed fast memory tracing techniques capable of tracking physical addresses to distinguish between co-located applications, outperforming existing solutions.

Education

Ph.D. in Computer Science, October 2015. Federal University of Rio Grande do Sul (UFRGS), Brazil and Berlin University of Technology (TU Berlin), Germany. Summa cum laude.
Diploma (equivalent to M.Sc.) in Computer Engineering, November 2010. Berlin University of Technology (TU Berlin), Germany.

Technical Skills

Programming languages
- Currently using: Python, C, C++, Shell (Bash).
- Previously used: Rust, Java, Fortran, x86 Assembly.
Operating systems
- Linux (kernel programming in the scheduling and virtual memory subsystems).
Tools
- HPC frameworks: OpenMP, MPI, OpenCL, CUDA, ROCm, Charm++, Charm4py, Adaptive MPI (AMPI).
- Performance analysis: pyinstrument, vmprof, perf, Intel vTune/PCM, gprof, nvprof, nvvp, rocprof.
- Benchmark suites: NAS-NPB, PARSEC, Rodinia, SPEC OMP2001/2012, SPEC CPU2006.
- Applications: MIRGE-Com, PlasCom2, PlasComCM.
- Simulation: Intel Pin, Simics, Gem5.
- CI/CD: GitHub Actions, GitLab CI; Pytest testing framework; automated deployment workflows to PyPI.
- Virtualization/Containers: Docker, QEMU.
- Databases: SQLite.
- LLM-assisted development: Integrated large language models (LLMs) into development workflows for pair programming, rapid prototyping, and automated Python type annotations, among others.

Awards

Best paper award: International Symposium on Benchmarking, Measuring and Optimizing, 2020.
Best paper award: International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2015.
Distinction (summa cum laude): Ph.D. dissertation, 2015.

Languages

Native German speaker.
Fluent in English and Portuguese.
Working knowledge in Spanish and French.

Grants

“Efficient Smart Memories for Data Intensive Computing.” (researcher, 3.6% of proposals accepted, Brazil), 2017.
“High Performance Computing for Energy (HPC4E).” (researcher, joint project of the European Union and Brazil), 2015.
Intel Modern Code, technical lead, 2016.
HP Enterprise (HPC-ELO project), technical lead, 2016.

Reviewer

A verified record of reviews is available at https://publons.com/author/1341957/.

Journals

ACM Computing Surveys (CSUR).
ACM Journal on Emerging Technologies in Computing Systems (JETC).
ACM Transactions on Architecture and Code Optimization (TACO).
ACM Transactions on Emerging Topics in Computing (TETCSI).
ACM Transactions on Modeling and Performance Evaluation of Computing Systems (ToMPECS).
ACM Transactions on Parallel Computing (TOPC).
Concurrency and Computation: Practice and Experience.
Elsevier Computers & Electrical Engineering.
Elsevier Future Generation Computer Systems (FGCS).
Elsevier Journal of Parallel and Distributed Processing (JPDC).
Elsevier Journal of Systems Architecture (JSA).
Elsevier Microelectronic Engineering.
Elsevier Microprocessors and Microsystems.
Elsevier Parallel Computing (Parco).
IEEE Access.
IEEE Transactions on Parallel and Distributed Systems (TPDS).
International Journal of Computational Science and Engineering.
Springer Computing.

Conferences & Workshops

IEEE International Parallel & Distributed Processing Symposium (IPDPS).
International Conference on Computational Science (ICCS).
International European Conference on Parallel and Distributed Computing (Euro-Par).
International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).
International Symposium on Parallel and Distributed Processing with Application (ISPA).
International Conference on Performance Evaluation Methodologies and Tools (ValueTools).
International Workshop on OpenMP (IWOMP).
International Heterogeneity in Computing Workshop (HCW).

Program Committee Memberships

Conferences

International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) 2019, 2024, 2025.
IEEE Cluster 2020, 2024, 2025.

Workshops & Tutorials

Heterogeneity in Computing Workshop (HCW, co-located with IPDPS) 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2025.
Open Workshop on Data Locality (COLOC, co-located with Euro-Par) 2017, 2018, 2019, 2021.
Tutorials Program Committee for Supercomputing 2018, 2019.

Advised Students

Douglas Pereira Pasqualin, “Thread and Data Mapping in STM architectures.” (Ph.D. thesis, co-advised with André Rauber Du Bois), 2021.
João Paulo Tarasconi Ruschel, “Parallel Implementations of the Cholesky Decomposition on CPUs and GPUs.” (Undergraduate thesis, co-advised with Philippe O. A. Navaux), 2016.
Supervised 4 undergraduate students through research internships at NCSA.

Participation in examination committees

Simone Dominico (Ph.D. thesis, 2022).
Douglas Pereira Pasqualin (Ph.D. thesis, 2021).
Charles Cardoso de Oliveira (Master’s thesis, 2019).
Tiago Rodrigo Kepe (Ph.D. thesis, 2017).
João Paulo Tarasconi Ruschel (Undergraduate thesis, 2016).
Guilherme Grunewald de Magalhães (Undergraduate thesis, 2016).

Invited Talks

“Thread and Data Mapping in Shared Memory Architectures.” University of Darmstadt (Germany), February 2016.
“Thread and Data Mapping in NUMA Architectures: An operating system perspective.” Inria, Grenoble (France), December 2013.

Publications

More than 60 peer-reviewed publications with over 1200 citations (h-index: 20). A full list of publications and citations is available on my Google Scholar profile.