Loading…
Dependability analysis for characterizing Google cluster reliability
Summary Cloud solutions are emerging as a new suitable way of transforming traditional IT data centers to highly available and reliable computing resources for hosting critical applications and data. However, software and hardware failures are a common problem in cloud datacenters that can lead to h...
Saved in:
Published in: | International journal of communication systems 2019-11, Vol.32 (16), p.n/a |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Summary
Cloud solutions are emerging as a new suitable way of transforming traditional IT data centers to highly available and reliable computing resources for hosting critical applications and data. However, software and hardware failures are a common problem in cloud datacenters that can lead to harmful damages. In this paper, we analyze the physical server failures in the Google cloud datacenter. We study the Google cluster properties to investigate the relationship among physical servers' failure rate and jobs failure events. The failure rate of Google cluster executed jobs and servers is taken into consideration during a 29‐day period.
We present a reliability model for Google cluster physical machines using the continuous time Markov chains according to this observation. We attempt to analyze the obtained model through SHARPE software packages to improve the understanding of failure events in the Google cloud cluster. We also explore the cluster availability based on parameters like steady‐state availability, steady‐state unavailability, mean time to failure, and mean time to repair in the Google cluster.
The objective of this paper is to study the Google cluster properties to investigate the relationship among physical servers' failure and jobs failure. A reliability model for Google cluster physical machines is represented. We attempt to analyze the obtained model through SHARPE software packages to improve the understanding of failure events in the Google cluster. The results show that there is a strong correlation between the machine and job failures rate. |
---|---|
ISSN: | 1074-5351 1099-1131 |
DOI: | 10.1002/dac.4127 |