Control Flow Duplication for Columnar Arrays in a Dynamic Compiler (‹Programming› 2023 - Research Papers)

Who

Sebastian Kloibhofer, Lukas Makor, David Leopoldseder, Daniele Bonetta, Lukas Stadler, Hanspeter Mössenböck

Track

‹Programming› 2023 Research Papers

Time Zone

The program is currently displayed in (GMT+09:00) Osaka, Sapporo, Tokyo.

Use conference time zone: (GMT+09:00) Osaka, Sapporo, TokyoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 16 Mar 2023 09:00 - 09:30 at Faculty of Engineering Building 2, Room 212 - Research Papers 4 Chair(s): Joel Jakubovic

Abstract

Columnar databases are an established way to speed up online analytical processing (OLAP) queries. Nowadays, data processing (e.g., storage, visualization, and analytics) is often performed at the programming language level, hence it is desirable to also adopt columnar data structures for common language runtimes.

While there are frameworks, libraries, and APIs to enable columnar data stores in programming languages, their integration into applications typically requires developer interference. In prior work, researchers implemented an approach for automated transformation of arrays into columnar arrays in the GraalVM JavaScript runtime. However, this approach suffers from performance issues on smaller workloads as well as on more complex nested data structures. We find that the key to optimizing accesses to columnar arrays is to identify queries and apply specific optimizations to them.

In this paper, we describe novel compiler optimizations in the GraalVM Compiler that optimize queries on columnar arrays. At JIT compile time, we identify loops that access potentially columnar arrays and duplicate them in order to specifically optimize accesses to columnar arrays. Additionally, we describe a new approach for creating columnar arrays from arrays consisting of complex objects by performing multi-level storage transformation. We demonstrate our approach via an implementation for JavaScript Date objects.

Our work shows that automatic transformation of arrays to columnar storage is feasible even for small workloads and that more complex arrays of objects could benefit from a multi-level transformation. Furthermore, we show how we can optimize methods that handle arrays in different states by the use of duplication. We evaluated our work on microbenchmarks and established data analytics workloads (TPC-H) to demonstrate that it significantly outperforms previous efforts, with speedups of up to 10x for particular queries. Queries additionally benefit from multi-level transformation, reaching speedups of around 2x. Additionally, we show that we do not cause significant overhead on workloads not suitable for storage transformation.

We argue that automatically created columnar arrays could aid developers in data-centric applications as an alternative approach to using dedicated APIs on manually created columnar arrays. Via automatic detection and optimization of queries on potentially columnar arrays, we can improve performance of data processing and further enable its use in common—particularly dynamic—programming languages.

Link to Publication

https://programming-journal.org/2023/7/9

Sebastian Kloibhofer

Johannes Kepler University Linz

Lukas Makor

JKU Linz

Austria

David Leopoldseder

Oracle Labs

Austria

Daniele Bonetta

Oracle Labs

United States

Lukas Stadler

Oracle Labs, Austria

Austria

Hanspeter Mössenböck

JKU Linz

Austria

Time Zone

The program is currently displayed in (GMT+09:00) Osaka, Sapporo, Tokyo.

Use conference time zone: (GMT+09:00) Osaka, Sapporo, TokyoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 16 Mar
Displayed time zone: Osaka, Sapporo, Tokyo change

09:00 - 10:30	Research Papers 4Research Papers at Faculty of Engineering Building 2, Room 212 Chair(s): Joel Jakubovic University of Kent

09:00 30m Talk		Control Flow Duplication for Columnar Arrays in a Dynamic CompilerVol. 7 Research Papers Sebastian Kloibhofer Johannes Kepler University Linz, Lukas Makor JKU Linz, David Leopoldseder Oracle Labs, Daniele Bonetta Oracle Labs, Lukas Stadler Oracle Labs, Austria, Hanspeter Mössenböck JKU Linz Link to publication
09:30 30m Talk		Gradual Soundness: Lessons from Static PythonVol. 7remote Research Papers Kuang-Chen Lu Brown University, USA, Ben Greenman Brown University, USA, Carl Meyer Meta, Dino Viehland Meta, Aniket Panse Meta, Shriram Krishnamurthi Brown University, United States Link to publication
10:00 30m Talk		Symphony: Expressive Secure Multiparty Computation with CoordinationVol. 7remote Research Papers Ian Sweet Galois, Inc., David Darais Galois, David Heath University of Illinois Urbana-Champaign, Ryan Estes University of Vermont, William Harris Galois, Inc., Michael Hicks University of Maryland; Amazon Link to publication