Your browser does not support JavaScript!

Home    TeraCache: efficient Spark caching over fast storage devices  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000438707
Title TeraCache: efficient Spark caching over fast storage devices
Alternative Title Tera Cache: αποτελεσματική αποθήκευση ενδιάμεσων δεδομένων στο SPARK σε συσκευές γρήγορης αποθήκευσης
Author Κολοκάσης, Ιάκωβος
Thesis advisor Πρατικάκης, Πολύβιος
Ζακκάκ, Φοίβος
Reviewer Μπίλας, Άγγελος
Μαγκούτης, Κώστας
Abstract Many analytics computations are dominated by iterative processing stages, executed until a convergence condition is met. To accelerate such workloads while keeping up with the exponential growth of data and the slow scaling of DRAM capacity, Spark employs off-heap caching of intermediate results. However, off-heap caching requires serialization and deserialization (serdes) of data, that add significant overhead especially with growing datasets. This thesis proposes TeraCache, an extension of the Spark data cache that avoids the need of serdes by keeping all cached data on-heap but off-memory, using memorymapped I/O (mmio). To achieve this, TeraCache extends the original JVM heap with a managed heap that resides on a memory-mapped fast storage device and is exclusively used for cached data. Preliminary results show that the TeraCache prototype can speed up Machine Learning (ML) workloads that cache intermediate results by up to 37% compared to the state-of-the-art serdes approach.
Language English
Issue date 2021-03-26
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 552

Digital Documents
No preview available

Download document
View document
Views : 6