Loading [MathJax]/jax/output/CommonHTML/jax.js
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  Bestsellers

  • articleNo Access

    A Comprehensive Survey on IoT-Aided Pest Detection and Classification in Agriculture Using Different Image Processing Techniques

    Insect and rodents constantly cause trouble to the farmers leading to different kinds of diseases in the crop. Controlling as well as crop maintenance becomes a highly essential task for the farmers to ensure the health of the crop. However, they cause various social as well as environmental issues. Excessive pesticide usage may affect the contamination of soil and water, and also, it becomes highly toxic to plants. Hence, bugs and insects become more cautious against plants along with constant exposure, which pushes the farmer to utilize heavy pesticides. However, genetic seed manipulation is mainly used to provide high robustness against pest attacks, and they are highly expensive for practical execution. Implementation of the Internet-of-Things (IoT) in the agricultural domain has brought an enhanced improvement in on-field pest management. Several pest detections, as well as classification models, have been implemented in prior works, and they are based on effective techniques. The main purpose of this survey paper is to provide a literature review of IoT-aided pest detection and classification using different images. The datasets used in different pest detection and classification, the simulated platforms, and performance measures are analyzed. Further, the recent trends of machine learning and deep learning methods in this field are reviewed and categorized. Thus, the given survey impacts the economy for analyzing pest detection in the early stage, which provides better crop production, and also maximizes the protection of crops. Moreover, it helps to minimize human errors, and also it provides the best efforts to increase the automated monitoring system for large fields.

  • articleNo Access

    MULTITASKING ON THE CRAY Y-MP: AN EXPERIMENT WITH A 2-D NAVIER-STOKES CODE

    This paper presents the results of an experiment to study the performance of multitasking techniques on the CRAY Y-MP using a single algorithm. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. Three implementations of multitasking on the CRAY Y-MP are considered. These are: macrotasking (parallelism at the subroutine level), microtasking (parallelism at the do-loop level), and autotasking (automatic multitasking). The three techniques are briefly described. The implementation of the algorithm is discussed in relation to these techniques and the results for four problem sizes are presented. The timing results for the three techniques using few processors are in general comparable. For the eight processor case, this algorithm achieved a speedup of seven and a processing rate of over one GFLOPS using macrotasking. The results are also compared to earlier results on the CRAY-2.

  • articleNo Access

    A HIERARCHICAL MULTIPROCESSOR STRUCTURE BASED ON MULTISTAGE NETWORKS

    In this paper, a class of hierarchically structured multistage interconnection networks is proposed. It is shown that the routing schemes of the proposed networks are simple. In the environment with high locality of accesses, performance analysis of the proposed networks shows that the acceptance (free of collision) probability is increased and the average path length (delay) is less than that of conventional multistage interconnection networks, such as delta networks. In general, the performance of the proposed networks is superior to the conventional multistage interconnection networks when the accesses locality ≥ 0.5. The cost of the proposed networks is analyzed and is found to be comparable to that of conventional ones.

  • articleNo Access

    STUDIES ON THE PERFORMANCE OF A PARALLEL ITERATIVE ALGORITHM ON TRANSPUTER ARRAYS

    This paper discusses studies on the performance of a parallel iterative algorithm implemented on an array of transputers connected in a mesh configuration. The iterative algorithm under consideration is the finite difference method for the solution of partial differential equations. Analytical expressions for the execution times of the various steps of the algorithm are derived by studying its computation and communication characteristics. These expressions are validated by comparing the theoretical results of the performance with the experimental values obtained on a transputer array. Then the analytical model is used to estimate the performance of the algorithm for varying number of transputers in the array and for varying grid sizes. An important objective of this paper is to study the influence of the convergence detection overhead on the performance of the algorithm. We present an approach to minimize the overhead. Convergence detection is one of the dominant factors that affects the performance of the algorithm, since it involves a substantial amount of computation and communication. In order to reduce this overhead, the proposed algorithm checks convergence once in every certain number of iterations, kc. The method of determining an optimal value of kc is given. Further, the time taken for convergence detection is estimated for the best case, worst case, and average case situations.

  • articleNo Access

    AN OPTICALLY INTERCONNECTED DISTRIBUTED SHARED MEMORY SYSTEM: ARCHITECTURE AND PERFORMANCE ANALYSIS

    This paper introduces an optically interconnected distributed shared memoryformula system. The distributed shared memory formula approach integrates both shared memory and distributed memory system ideas to extract the strengths of each while balancing their respective weaknesses. The formula system is a formula system based on a photonic network to support the high communication requirement of formula. The formula employs wavelength division multiple access on the photonic network, enabling multiple channels to be formed on a single optical fiber. A result of the high communication capacity is the simplification of the global address mapping problem. This simplified uniform address allocation scheme is introduced. The advantages of the proposed approach are examined through a performance analysis based on a closed queueing network which has been validated through extensive simulation. The performance of the formula system is evaluated in terms of transaction time of a memory request and system throughput. The impact of variations in the number of channels and processors in the system on these metrics is studied. The effect of variations in memory and channel service times are also evaluated.

  • articleNo Access

    DESIGN AND PERFORMANCE ANALYSIS OF MULTISTAGE INTERCONNECTION NETWORKS USING A RECURSIVE MULTICAST ALGORITHM

    In this paper, we study issues of the multicast communication in the multistage interconnection networks (MINs) for large-scale multicomputers. In addition to point-to-point communication among processing nodes, efficient collective communication is critical to the performance of multicomputers. Multicast communication in which the same packet is delivered from a source node to an arbitrary number of destination nodes is fundamental in supporting collective communication primitives including broadcast, reduction, and barrier synchronization operations. This paper presents a new approach to support multicast communication, on the basis of a restricted address encoding scheme which constructs a short fixed-size multicast header, and a recursive scheme that recycles a multicast packet one or more times through the MIN to send it to the desired destination nodes. We propose a recursive multicast algorithm which provides deadlock-freedom for multiple multicast packets in MIN-based multicomputers. We also present performance model for the unbuffered MIN using the multicast algorithm and analyze its performance in terms of the network throughput, where several multicast communications are considered. The proposed algorithm can be easily applied to wormhole or virtual cut-through MIN-based multicomputers.

  • articleNo Access

    PERFORMANCE ANALYSIS OF GRID ARCHITECTURE VIA QUEUEING THEORY

    With the remarkable development of grid technologies, many grid architectures have been proposed which have evolved greatly in recent years. One of the major challenges for grid technologies is to build the scientific and technological base for share, collaboration, large-scale distributed systems. Theories and models of grid architectures are crucial to this attempt as well as to underpinning the foundations for constructing grid systems working effectively. On the basis of a coarse-grain classification of grid applications, we present a novel taxonomy of grid architectures: interaction-intensive architecture and computation-intensive architecture. In this paper, two new grid performance metrics are proposed to measure performance of grid systems. Then, we model grid architectures mathematically via queueing system M/M/1 and M/M/1/K; mean waiting time and total spending time are given. Furthermore, we investigate the performance of two grid architectures and compare them with each other; meanwhile a series of simulations are carried out to demonstrate the analysis results. Finally, we obtain some scientific principles guiding the grid architecture design and optimization.

  • articleNo Access

    STORAGE OVERHEAD OF O-TREES, B-TREES AND PREFIX B-TREES: A COMPARATIVE ANALYSIS

    The O-tree is an indexing mechanism closely related to B-trees and Prefix B-trees. In this paper we derive the expected storage overhead of O-Trees under the standard Bernoulli model of randomness, compare the results with the expected overhead of B-trees and Prefix B-trees under the same conditions, and assess the expected relative improvements.

  • articleNo Access

    DETECTING SECONDARY BOTTLENECKS IN PARALLEL QUANTUM CHEMISTRY APPLICATIONS USING MPI

    Profiling tools such as gprof and ssrun are used to analyze the run-time performance of a scientific application. The profiling is done in serial and in parallel mode using MPI as the communication interface. The application is a quantum chemistry program using Hartree Fock theory and Pulays DIIS method. An extensive set of test cases is taken into account in order to reach uniform conclusions. A known problem with decreased parallel scalability can thus be narrowed down to a single subroutine responsible for the reduction in Speed Up. The critical module is analyzed and a typical pitfall with triple matrix multiplications is identified. After overhauling the critical subroutine re-examination of the run-time behavior shows significantly improved performance and markedly improved parallel scalability. The lessons learned here might be of interest to other people working in similar fields with similar problems.

  • articleNo Access

    BENCHMARKING CLUSTERS OF WORKSTATIONS THROUGH PARALLEL SORTING AND BSP LIBRARIES

    We present our experiences in developping and tuning the performance at the user level, of (in core) parallel sorting on homogeneous and non homogeneous clusters with the use of the two available BSP (Bulk Synchronous Parallel model) libraries: BSPLib from Oxford university (UK) and PUB7 from the university of Paderborn (Germany). The paper is mainly about the communication performances of these two libraries and, in more general terms, it compares and summarizes the programming facilities and differences between them.

  • articleNo Access

    PERFORMANCE ANALYSIS OF MULTI-STAGE INTERCONNECTION NETWORKS WITH DETERMINISTIC SERVICE TIMES

    In this paper we present a performance model for a constant service time, globally synchronized multistage interconnection network that is an improvement on earlier work in this field. The events at the network input are assumed to have a geometric distribution in time. As the events are combined within the network, it can happen that two events arrive simultaneously at one of the network server elements, called a double arrival rate stream (DAR) server. Analytical methods are described to derive the response time and inter-departure time distribution of the double arrival rate constant service time server, which has a queuing property (DAR/D/1). Due to the queuing effect of the servers in the first stage of the network, their outputs show a "distorted" geometric distribution, which is known to introduce analytical difficulties in the performance evaluation of the subsequent stages. A method is presented to derive the response time of servers in the second stage, when the network has a uniform workload distribution. It is shown that the knowledge of the response times of the first and second stage of the network is sufficient to predict the total response time of a large network with an accuracy better than 3%, even under a heavy workload.

  • articleNo Access

    PREEMPTIVE JOB SCHEDULING POLICY FOR DISTRIBUTIVELY-OWNED WORKSTATION CLUSTERS

    Cluster computing has come to prominence as a cost-effective parallel processing tool for solving many complex computational problems. In this paper, we propose a new timesharing opportunistic scheduling policy to support remote batch job executions over networked clusters to be used in conjunction with the Condor Up-Down scheduling algorithm. We show that timesharing approaches can be used in an opportunistic setting to improve both mean job slowdowns and mean response times with little or no throughput reduction. We also show that the proposed algorithm achieves significant improvement in job response time and slowdown as compared to exiting approaches and some recently proposed new approaches.

  • articleNo Access

    A PERFORMANCE EVALUATION OF THE NEHALEM QUAD-CORE PROCESSOR FOR SCIENTIFIC COMPUTING

    In this work we present an initial performance evaluation of Intel's latest, second-generation quad-core processor, Nehalem, and provide a comparison to first-generation AMD and Intel quad-core processors Barcelona and Tigerton. Nehalem is the first Intel processor to implement a NUMA architecture incorporating QuickPath Interconnect for interconnecting processors within a node, and the first to incorporate an integrated memory controller. We evaluate the suitability of these processors in quad-socket compute nodes as building blocks for large-scale scientific computing clusters. Our analysis of intra-processor and intra-node scalability of microbenchmarks, and a range of large-scale scientific applications, indicates that quad-core processors can deliver an improvement in performance of up to 4x over a single core depending on the workload being processed. However, scalability can be less when considering a full node. We show that Nehalem outperforms Barcelona on memory-intensive codes by a factor of two for a Nehalem node with 8 cores and a Barcelona node containing 16 cores. Further optimizations are possible with Nehalem, including the use of Simultaneous Multithreading, which improves the performance of some applications by up to 50%.

  • articleNo Access

    MODELING THE PERFORMANCE OF DIRECT NUMERICAL SIMULATION ON PARALLEL SYSTEMS

    Direct Numerical Simulation (DNS) is an important application area that is expected to use large fractions of future large-scale simulations. In this work we develop, validate and use a performance model of the combustion code, DNS3D, to explore achieved performance on current parallel systems. The performance model is developed from a thorough analysis of the application. Its key computation characteristics are coupled with the performance characteristics of the system using an parameterized analytical model. The model is validated on three parallel systems: a muti-core AMD Opteron based system with an Infiniband fat-tree network, an IBM Power5+ system with an HPS fat-tree network, and an IBM Power7 system with a direct connect network. The performance model is shown to achieve high prediction accuracy on all three systems. We illustrate how the model can be used to explore impact of changes in either the system or the application. It is used to both analyze the achieved performance on these systems as well as to explore the possible benefits of further optimizing DNS3D's main computational kernel of one-dimensional FFTs, or in possibly overlapping communication with computation.

  • articleNo Access

    ASYMPTOTIC PEAK UTILISATION IN HETEROGENEOUS PARALLEL CPU/GPU PIPELINES: A DECENTRALISED QUEUE MONITORING STRATEGY

    Widespread heterogeneous parallelism is unavoidable given the emergence of General-Purpose computing on graphics processing units (GPGPU). The characteristics of a Graphics Processing Unit (GPU)—including significant memory transfer latency and complex performance characteristics—demand new approaches to ensuring that all available computational resources are efficiently utilised. This paper considers the simple case of a divisible workload based on widely-used numerical linear algebra routines and the challenges that prevent efficient use of all resources available to a naive SPMD application using the GPU as an accelerator. We suggest a possible queue monitoring strategy that facilitates resource usage with a view to balancing the CPU/GPU utilisation for applications that fit the pipeline parallel architectural pattern on heterogeneous multicore/multi-node CPU and GPU systems. We propose a stochastic allocation technique that may serve as a foundation for heuristic approaches to balancing CPU/GPU workloads.

  • articleNo Access

    A PERFORMANCE ANALYSIS OF THREE GENERATIONS OF BLUE GENE

    IBMs Blue Gene supercomputer architecture has evolved through three successive generations each providing increased levels of power-efficiency and system densities. From the original Blue Gene/L to P to Q, a higher level of integration has enabled higher single-core performance, larger concurrency per compute node, and a higher level of system integration. Although these changes have brought with them a higher overall system peak-performance, no study has examined in detail the evolution of performance across system generations. In this work we make two significant contributions that of providing a comparative performance analysis across Blue Gene generations using a consistent set of tests, and also in providing a validated performance model of the NEK-Bone proxy application from the DOE CESAR Exascale Co-Design Center. The combination of empirical analysis and the predictive capabilities of the NEK-Bone performance model enable us to not only directly compare measured performance but also allow for a comparison of system configurations that cannot currently be measured. We provide insights into how the changing architectural performance characteristics of Blue Gene have impacted on the application performance, as well as providing insight into what future systems may be able to achieve.

  • articleNo Access

    Iso-Quality of Service: Fairly Ranking Servers for Real-Time Data Analytics

    We present a mathematically rigorous iso-Quality-of-Service (QoS) metric which relates the achievable quality of service (QoS) for a real-time analytics service with workload specific and use case specific performance and output quality requirements to the energy cost of offering the service by different server architectures. Using a new iso-QoS evaluation methodology, we scale server resources to meet QoS targets and directly rank the servers in terms of their energy-efficiency and by extension cost of ownership. Our metric and method are platform-independent and enable fair comparison of datacenter compute servers with significant architectural diversity, including micro-servers. We deploy our metric and methodology to compare three servers running financial option pricing workloads on real-life market data. We find that server ranking is sensitive to data inputs and desired QoS level and that although scale-out micro-servers can be up to two times more energy-efficient than conventional heavyweight servers for the same target QoS, they are still six times less energy efficient than high-performance computational accelerators.

  • articleNo Access

    PERFORMANCE EVALUATION OF MULTICAST WORMHOLE ROUTING IN 2D-TORUS MULTICOMPUTERS

    Most recently, multicast communication has been included in multicomputer systems. Multicast communication refers to the delivery of the same message from a source node to an arbitrary number of destination nodes. The performance of multicomputers is highly dependent on the message routing scheme and the communication mechanism. Wormhole routing is the most promising switching technique used in the new generation multicomputers. In this paper, we propose a multicast wormhole routing algorithm for 2D-torus multicomputers. The multicast routing algorithm is based on a path-like routing scheme and offers deadlock-free routing. Our simulation which compared the 2D-mesh multicomputers shows that the multipath routing algorithm for 2D-torus multicomputers can reduce the communication traffic and the maximum distance in multicast communication.

  • articleNo Access

    A GENERALIZED STOCHASTIC PETRI-NET MODEL FOR PERFORMANCE ANALYSIS AND ALLOCATION OPTIMIZATION OF A PARTICULAR REPAIR SYSTEM

    This paper presents a generalized stochastic Petri-net model for performance analysis and allocation optimization of a repair system with interchangeable inventory. A repair system with interchangeable-inventory is an important type of repair-service, which features components that are assembled circularly with interchangeable-component inventory. Despite improved efficiency, major difficulties in model formulation and performance analysis arise due to a complex fork/join structure and the presence of interchangeable-component inventory. In this study, by applying a generalized stochastic Petri-net to model the fork/join structure and interchangeable inventory, a system with one overhaul center and one interchangeable-component repair shop was defined in the Petri-net by places, transitions and tokens. A performance analysis with single and multiple parameters and simulation experiments was performed according to the real field data of high-speed railway locomotives overhaul. With appropriate weights on each portion of the system resources, the optimal design scenario for the allocation of a specific repair system was achieved to control the overall expenditure.

  • articleNo Access

    Performance Analysis of Joining the Shortest Queue Model Among a Large Number of Queues

    Consider a queueing network with a large number N nodes, in which each queue has a dedicated input stream, and, in addition, there is an extra input stream, balancing the network load by directing its arrivals to the shortest queue(s). A mean field interaction model is set up to study the performance of this network in terms of limiting results. One of our results shows that the stationary behavior of any of the queues is approximated by that of the M/M/1 queue with a modified arrival rate when the queue length is around zero.