E-Locus - Institutional Repository of the University of Crete

Home Search

Results - Details

Search command : Author="Πρατικάκης" And Author="Πολύβιος"

Current Record: 7 of 86

[Add to Basket]

Identifier

000463037

Title

TeraHeap for G1 efficient caching for latency-sensitive applications

Alternative Title

TeraHeap στον G1 για αποτελεσματική προσωρινή αποθήκευση σε εφαρμογές ευαίσθητες σε καθυστέρηση

Author

Χαραλάμπους, Μαρία Χ.

Thesis advisor

Πρατικάκης, Πολύβιος

Reviewer

Μπίλας, Άγγελος
Μαγκούτης, Κωνσταντίνος

Abstract

Big data analytic frameworks like Apache Spark, handle the vast amount of data by moving objects outside the JVM managed heap (o⇥-heap) onto a fast storage device. However, this strategy leads to high serialization/deserialization (S/D) costs and high garbage collection (GC) overhead, when o⇥-heap objects are relocated back into the managed heap for processing. TeraHeap is a mechanism that manages to eliminate these overheads, by extending the JVM to use a second, high-capacity heap (H2) that is memory-mapped over a fast storage device and coexists alongside the regular heap (H1). TeraHeap eliminates the S/D cost with the use of memory-mapped I/O, and reduces the GC cost by avoiding GC scans over the secondary heap. TeraHeap achieves this by (1) marking candidate objects for placement in the H2 and indicating when to move them, (2) tracking live objects in the H1 that are referenced from H2, (3) reclaiming dead objects in H2. Originally TeraHeap was implemented in the Parallel Scavenge Collector, where large GC pauses are allowed because the main concern is the application’s throughput. However, this does not perform well with real-time applications, due to its long pauses. Garbage-First (G1) Collector is for latency-sensitive applications, where the GC pauses are small and they meet a soft real-time goal with high probability while achieving high throughput. In this thesis, we imported the TeraHeap mechanism in G1 GC. We aim to solve the o⇥-heap problem of big data, in latency-sensitive applications that need quick responses without long GC pauses. Importing TeraHeap in G1 introduces unique challenges not encountered by Parallel Scavenge, highlighting the design di⇥erences between the two collectors. These challenges encompass (1) concurrent heap marking alongside the application threads, (2) G1’s use of evacuation rather than compaction for small pauses during heap collection, and (3) the incremental collection approach applied to the old generation. Our evaluation shows that for the same DRAM size, TeraHeap improves performance by up to 72% compared to native Spark. However, there is still room for further work in refining this import process, given its demonstrated complexity and non-trivial nature.

Language

English

Subject

Big data

Garbage collection

JVM

Teraheap

Μεγάλος όγκος δεδομένων

Issue date

2024-03-22

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses