‹Programming› 2023
Mon 13 - Fri 17 March 2023 Tokyo, Japan

Columnar databases are an established way to speed up online analytical processing (OLAP) queries. Nowadays, data processing (e.g., storage, visualization, and analytics) is often performed at the programming language level, hence it is desirable to also adopt columnar data structures for common language runtimes.

While there are frameworks, libraries, and APIs to enable columnar data stores in programming languages, their integration into applications typically requires developer interference. In prior work, researchers implemented an approach for automated transformation of arrays into columnar arrays in the GraalVM JavaScript runtime. However, this approach suffers from performance issues on smaller workloads as well as on more complex nested data structures. We find that the key to optimizing accesses to columnar arrays is to identify queries and apply specific optimizations to them.

In this paper, we describe novel compiler optimizations in the GraalVM Compiler that optimize queries on columnar arrays. At JIT compile time, we identify loops that access potentially columnar arrays and duplicate them in order to specifically optimize accesses to columnar arrays. Additionally, we describe a new approach for creating columnar arrays from arrays consisting of complex objects by performing multi-level storage transformation. We demonstrate our approach via an implementation for JavaScript Date objects.

Our work shows that automatic transformation of arrays to columnar storage is feasible even for small workloads and that more complex arrays of objects could benefit from a multi-level transformation. Furthermore, we show how we can optimize methods that handle arrays in different states by the use of duplication. We evaluated our work on microbenchmarks and established data analytics workloads (TPC-H) to demonstrate that it significantly outperforms previous efforts, with speedups of up to 10x for particular queries. Queries additionally benefit from multi-level transformation, reaching speedups of around 2x. Additionally, we show that we do not cause significant overhead on workloads not suitable for storage transformation.

We argue that automatically created columnar arrays could aid developers in data-centric applications as an alternative approach to using dedicated APIs on manually created columnar arrays. Via automatic detection and optimization of queries on potentially columnar arrays, we can improve performance of data processing and further enable its use in common—particularly dynamic—programming languages.

Thu 16 Mar

Displayed time zone: Osaka, Sapporo, Tokyo change

09:00 - 10:30
Research Papers 4Research Papers at Faculty of Engineering Building 2, Room 212
Chair(s): Joel Jakubovic University of Kent
09:00
30m
Talk
Control Flow Duplication for Columnar Arrays in a Dynamic CompilerVol. 7
Research Papers
Sebastian Kloibhofer Johannes Kepler University Linz, Lukas Makor JKU Linz, David Leopoldseder Oracle Labs, Daniele Bonetta Oracle Labs, Lukas Stadler Oracle Labs, Austria, Hanspeter Mössenböck JKU Linz
Link to publication
09:30
30m
Talk
Gradual Soundness: Lessons from Static PythonVol. 7remote
Research Papers
Kuang-Chen Lu Brown University, USA, Ben Greenman Brown University, USA, Carl Meyer Meta, Dino Viehland Meta, Aniket Panse Meta, Shriram Krishnamurthi Brown University, United States
Link to publication
10:00
30m
Talk
Symphony: Expressive Secure Multiparty Computation with CoordinationVol. 7remote
Research Papers
Ian Sweet Galois, Inc., David Darais Galois, David Heath University of Illinois Urbana-Champaign, Ryan Estes University of Vermont, William Harris Galois, Inc., Michael Hicks University of Maryland; Amazon
Link to publication