Multi-Threaded Servers with High Service Time Variation for Layered Queueing Networks
Distributed application systems are often implemented as layers of software services. The services provide functions, which we refer to as service entries, that can have significantly different demands on resources such as CPUs and on requests for service from other service entries. Significantly different demands within service entries lead to high service time variation for a service. Such services are typically deployed within application server containers each having some bound on its level of concurrency refered to as its maximum threading level. We have modeled such systems using a layered queueing network approach. Each queue represents a first-come-first-served multi-threaded server with multiple entries that may have high service time variation. This chapter describes a simple and intuitive residence time expression for such queues. Simulation results show that the technique is both fast and accurate when compared with other techniques from the literature.