Loading…

Object Intersection Captures on Interactive Apps to Drive a Crowd-sourced Replay-based Compiler Optimization

Traditional offline optimization frameworks rely on representative hardware, software, and inputs to compare different optimizations on. With application-specific optimization for mobile systems though, the idea of a representative testbench is unrealistic while creating offline inputs is non-trivia...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on architecture and code optimization 2022-09, Vol.19 (3), p.1-25
Main Authors: Mpeis, Paschalis, Petoumenos, Pavlos, Hazelwood, Kim, Leather, Hugh
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Traditional offline optimization frameworks rely on representative hardware, software, and inputs to compare different optimizations on. With application-specific optimization for mobile systems though, the idea of a representative testbench is unrealistic while creating offline inputs is non-trivial. Online approaches partially overcome these problems but they might expose users to suboptimal or even erroneous code. Therefore, our mobile code is poorly optimized, resulting in wasted performance and energy and user frustration. In this article, we introduce a novel compiler optimization approach designed for mobile applications. It requires no developer effort, it tunes applications for the user’s device and usage patterns, and it has no negative impact on the user experience. It is based on a lightweight capture and replay mechanism. Our previous work [ 46 ] captures the state accessed by any targeted code region during its online stage. By repurposing existing OS capabilities, it keeps the overhead low. In its offline stage, it replays the code region but under different optimization decisions to enable sound comparisons of different optimizations under realistic conditions. In this article, we propose a technique that further decreases the storage sizes without any additional overhead. It captures only the intersection of reachable objects and accessed heap pages. We compare this with another new approach that has minimal runtime overheads at the cost of higher capture sizes. Coupled with a search heuristic for the compiler optimization space, our capture and replay mechanism allows us to discover optimization decisions that improve performance without testing these decisions directly on the user. Finally, with crowd-sourcing we split this offline evaluation effort between several users, allowing us to discover better code in less time. We implemented a prototype system in Android based on LLVM combined with a genetic search engine and a crowd-sourcing architecture. We evaluated it on both benchmarks and real Android applications. Online captures are infrequent and introduce ~5 ms or 15 ms on average, depending on the approach used. For this negligible effect on user experience, we achieve speedups of 44%  on average over the Android compiler and 35% over LLVM -O3. Our collaborative search is just 5% short of that speedup, which is impressive given the acceleration gains. The user with the highest workload concluded the search 7 \( \times \) faster.
ISSN:1544-3566
1544-3973
DOI:10.1145/3517338