Loading…

Leveraging Cloud Infrastructure for Troubleshooting Edge Computing Systems

Modern cloud-based applications (e.g., Face book, Dropbox) serve a wide range of edge clients (e.g., laptops, smart phones). The clients' characteristics vary significantly in terms of hardware (e.g., high end desktop vs. resource constrained smart phones), operating systems (e.g., Linux, Andro...

Full description

Saved in:
Bibliographic Details
Main Authors: Fagan, Michael, Khan, Mohammad Maifi Hasan, Wang, Bing
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Modern cloud-based applications (e.g., Face book, Dropbox) serve a wide range of edge clients (e.g., laptops, smart phones). The clients' characteristics vary significantly in terms of hardware (e.g., high end desktop vs. resource constrained smart phones), operating systems (e.g., Linux, Android, Mac OS, Windows), network connections (e.g., wireless vs. wired, 3G vs. 2G), and software versions (e.g., Firefox 12 vs. Firefox 13), just to name a few. Unfortunately, due to misconfiguration, outdated software, faulty hardware, or other reasons, many edge systems operate at suboptimal performance. Poor performance and root cause identification is extremely challenging for the client of the cloud system. To address this challenge, the troubleshooting service presented in this paper leverages such heterogeneity to identify and debug performance problems on edge devices. First, by looking at many runs across many different clients, the service groups clients in different clusters based on performance. Next, the service enables logging on remote clients to collect run time traces, and subsequently identifies the root cause by analyzing logs automatically. We leverage high level features such as machine/OS type along with more low level kernel level statistics such as I/O rate and system calls. To demonstrate our system we first introduce a configuration bug that was artificially injected in a recently built cluster by changing the TCP buffer size. Next, we present two real-life bugs, one I/O inefficiency bug relating to network transfers on Android, and another misconfiguration bug in VirtualBox, that were identified using our tool.
ISSN:1521-9097
2690-5965
DOI:10.1109/ICPADS.2012.67