Domain Specific Memory Management for Large Scale Data Analytics
Shyamshankar, Panchapakesan Chitra
MetadataShow full item record
Hardware trends over the last several decades have lead to shifting priorities with respect to performance bottlenecks in the implementations of dataflows typically present in large-scale data analytics applications. In particular, efficient use of main memory has emerged as a critical aspect of dataflow implementation, due to the proliferation of multi-core architectures, as well as the rapid development of faster-than-disk storage media. At the same time, the wealth of static domain-specific information about applications remains an untapped resource when it comes to optimizing the use of memory in a dataflow application. We propose a compilation-based approach to the synthesis of memory-efficient dataflow implementations, using static analysis to extract and leverage domain-specific information about the application. Our program transformations use the combined results of type, effect, and provenance analyses to infer time- and space- effective placement of primitive memory operations, precluding the need for dynamic memory management and its attendant costs. The experimental evaluation of implementations synthesized with our framework shows both the importance of optimizing for memory performance, as well as significant benefits of our approach, along multiple dimensions. Finally, we also demonstrate a framework for formally verifying the soundness of these transformations, laying the foundation for their use as a component of a more general implementation synthesis ecosystem.