World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Improving Resilience of Software Systems: A Case Study in 3D-Online Game System

    https://doi.org/10.1142/S0218194017500012Cited by:1 (Source: Crossref)

    Resilience is the property that enables a system to continue operating properly when one or more faults occur. Nowadays, as software systems become more and more complex, their hardware execution platforms also become more heterogenous with larger scale. Software systems may fail due to some faults such as node breakdown, communication failure, or data processing failure. In this paper, we propose a ring-based resilience mechanism, which implements fault detection and recovery. (1) To solve the problem that the central server may have high burden of network traffic, we design a ring-based heartbeat algorithm for crash fault detection. (2) We also design a light-weight recovery mechanism to recover from crash faults as compared with the current system-specific mechanisms. To evaluate our mechanism, we use a 3D-online game system as a case study. By injecting faults, we test the effectiveness and overhead of the proposed mechanism. Compared with other mechanisms, the experimental results show that our mechanism can support resilience very well and is better at dealing with the crash fault caused by high cluster workload with acceptable overhead.