JSON Tiles: Fast Analytics on Semi-Structured Data

JSON Tiles: Fast Analytics on Semi-Structured Data [PDF]

Dominik Durner, Viktor Leis, Thomas Neumann
SIGMOD Honorable Mention Award
ACM SIGMOD 2021 International Conference on Management of Data (SIGMOD 2021)

Developers often prefer flexibility over upfront schema design, making semi-structured data formats such as JSON increasingly popular. Large amounts of JSON data are therefore stored and analyzed by relational database systems. In existing systems, however, JSON's lack of a fixed schema results in slow analytics. In this paper, we present JSON tiles, which, without losing the flexibility of JSON, enables relational systems to perform analytics on JSON data at native speed. JSON tiles automatically detects the most important keys and extracts them transparently -- often achieving scan performance similar to columnar storage. At the same time, JSON tiles is capable of handling heterogeneous and changing data. Furthermore, we automatically collect statistics that enable the query optimizer to find good execution plans. Our experimental evaluation compares against state-of-the-art systems and research proposals and shows that our approach is both robust and efficient.

