Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Initially introduced as special-purpose accelerators for graphics applications, graphics processing units (GPUs) have now emerged as general purpose computing platforms for a wide range of applications. To address the requirements of these applications, modern GPUs include sizable hardware-managed caches. However, several factors, such as unique architecture of GPU, rise of CPU–GPU heterogeneous computing, etc., demand effective management of caches to achieve high performance and energy efficiency. Recently, several techniques have been proposed for this purpose. In this paper, we survey several architectural and system-level techniques proposed for managing and leveraging GPU caches. We also discuss the importance and challenges of cache management in GPUs. The aim of this paper is to provide the readers insights into cache management techniques for GPUs and motivate them to propose even better techniques for leveraging the full potential of caches in the GPUs of tomorrow.
Aggressive miniaturization has led to severe performance and signal integrity issues in copper-based interconnects in the nanometric regime. As a consequence, development of a proper analytical model for such interconnects is extremely important. In this work, an ABCD parameter matrix-based model is presented for fast and accurate estimation of crosstalk delay and noise for identically coupled copper-based nanointerconnect systems. Using the proposed model, the crosstalk delay and noise are estimated in copper based nanointerconnects for intermediate and global interconnects at the future integrated circuit technology nodes of 21 and 15 nm, respectively. Proposed model has been compared with SPICE and it is found that this model is almost 100% accurate as SPICE with respect to both the crosstalk delay as well as noise. Moreover, this model is as much as ~ 63 and ~ 155 times faster, respectively. From the crosstalk delay and noise analysis of unrepeated interconnects, it is observed that both delay and noise contribution will increase in scaled technology nodes. The same trend is observed also for the repeated interconnects. Also more number of repeaters and higher repeater sizes will be needed for delay minimization as we scale deeper. So as far as crosstalk induced effects are concerned, the copper interconnects will face a huge challenge to overcome in nanometer technology nodes.
In this paper, we extend an information-theoretic approach of computer performance evaluation to supercomputers. This approach is based on the notion of computer Capacity which can be estimated relying solely on the description of computer architecture. We describe the method of calculating Computer Capacity for supercomputers including the influence of the architecture of communication network. The suggested approach is applied to estimate the performance of three of the top 10 supercomputers (according to TOP500 June-2016 list) which are based on Haswell processors. For greater objectivity of results, we compared them relatively to values of another supercomputer which is based an Ivy Bridge processors (this microarchitecture differs from Haswell). The obtained results are compared with values of TOP500 LINPACK benchmark and theoretical peak and we arrive at conclusions about the applicability of the presented theoretical approach (nonexperimental) for performance evaluation of real supercomputers. In particular, it means that the estimations of the computer capacity can be used at the design stage of the development of supercomputers.
The process to locate objects in an image passes through different phases. At the forefront of these phases, and most importantly, is the edge detection. If edges in an image are identified accurately, all of the objects will be located correctly for further processing phases. Noisy images contain high-frequency contents which might be interfered with image edges that makes edge detection more difficult. In this paper, a software comparative analysis of the performance of three different edge detectors, namely, Roberts, Prewitt and Sobel, is presented. The comparative analysis is performed to check the performance robustness of the edge detectors when noise level fluctuates in the image. In addition, an embedded hardware (HW) system is developed to implement the three detectors on the Zedboard FPGA prototyping board. The purpose of this implementation is to have an embedded system for on-the-move applications where portability is desired. To exploit the new features of the Xilinx Zynq-7000 series, we partition the implementation into (1) hardware part (running on logic gates of FPGA) and (2) software (SW) part (running on ARM processor of FPGA). This heterogeneous HW/SW implementation allows for high accurate results with high speed and efficient area. Furthermore, a hardware comparative analysis of the speed and area of the detectors is presented. The evaluation is performed by using different images (with their ground truths) downloaded from the BSDS500 dataset. The tools used for FPGA implementation are MATLAB and Microsoft Visual Studio (as software tools), Vivado High-level synthesis (HLS) and Software Development Kit (SDK) (as hardware tools). The experimental results show that the Roberts detector achieves better edge detection when the noise level is higher than 40%. It is also faster and requires less capacity of logic gates among the other detectors employed in this study.
Although dynamic programming (DP) is an optimization approach used to solve a complex problem fast, the time required to solve it is still not efficient and grows polynomially with the size of the input. In this contribution, we improve the computation time of the dynamic programming based algorithms by proposing a novel technique, which is called “SDP: Segmented Dynamic programming”. SDP finds the best way of splitting the compared sequences into segments and then applies the dynamic programming algorithm to each segment individually. This will reduce the computation time dramatically. SDP may be applied to any dynamic programming based algorithm to improve its computation time. As case studies, we apply the SDP technique on two different dynamic programming based algorithms; “Needleman–Wunsch (NW)”, the widely used program for optimal sequence alignment, and the LCS algorithm, which finds the “Longest Common Subsequence” between two input strings. The results show that applying the SDP technique in conjunction with the DP based algorithms improves the computation time by up to 80% in comparison to the sole DP algorithms, but with small or ignorable degradation in comparing results. This degradation is controllable and it is based on the number of split segments as an input parameter. However, we compare our results with the well-known heuristic FASTA sequence alignment algorithm, “GGSEARCH”. We show that our results are much closer to the optimal results than the “GGSEARCH” algorithm. The results are valid independent from the sequences length and their level of similarity. To show the functionality of our technique on the hardware and to verify the results, we implement it on the Xilinx Zynq-7000 FPGA.
In this paper, we propose a resource/performance tradeoff algorithm for task scheduling of parallel reconfigurable architectures. First, it uses unlimited resources to generate an optimal scheduling algorithm. Then, a relaxation algorithm is applied to satisfy the number of resources under increasing minimum performance. To demonstrate the performance of the proposed algorithm, we not only compare the existing methods with standard benchmarks but also implement on physical systems. The experimental results show that the proposed algorithms satisfy the requirements of the systems with limited resources.
In deep learning, convolutional neural networks (CNNs) are a class of artificial neural networks (ANNs), most commonly applied to analyze visual imagery. They are also known as Shift-Invariant or Space-Invariant Artificial Neural Networks (SIANNs), based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation-equivariant responses known as feature maps. Recently, various architectures for CNN based on FPGA platform have been proposed because it has the advantages of high performance and fast development cycle. However, some key issues including how to optimize the performance of CNN layers with different structures, high-performance heterogeneous accelerator design, and how to reduce the neural network framework integration overhead need to be improved. To overcome and improve these problems, we propose dynamic cycle pipeline tiling, data layout optimization, and a pipelined software and hardware (SW–HW)-integrated architecture with flexibility and integration. Some benchmarks have been tested and implemented on the FPGA board for the proposed architecture. The proposed dynamic tiling and data layout transformation improved by 2.3 times in the performance. Moreover, with two-level pipelining, we achieve up to five times speedup and the proposed system is 3.8 times more energy-efficient than the GPU.
Managers recognize that software development teams need to be developed. Although technical skills are necessary, non-technical (NT) skills are equally, if not more, necessary for project success. Currently, there are no proven tools to measure the NT skills of software developers or software development teams. Behavioral markers (observable behaviors that have positive or negative impacts on individual or team performance) are successfully used by airline and medical industries to measure NT skill performance. This research developed and validated a behavior marker system through an empirical study conducted at the Software Factory where users of the behavior marker tool rated video clips of software development teams. The initial results show that the behavior marker tool can be reliably used with minimal training.
The deployment change of SaaS (Software as a Service) software will influence its response time, which is an important performance metric. Therefore, studying the impact of deployment change on the response time of SaaS software could contribute to performance improvement of the software. However, there are few performance analysis methods which can directly analyze the relationship between deployment change and response time of SaaS software. In this paper, we propose an approach which provides the impact analysis of specific deployment change operations on response time of SaaS software explicitly. Specifically, we present an evaluation method for the response time of SaaS software in specific deployment scheme by leveraging queueing theory. With mathematical derivation based on the proposed evaluation method, we qualitatively analyze the variation trend of response time with respect to deployment change. Furthermore, we study the relationship between two specific types of deployment change operations and response time variation of SaaS software, which is used to propose a response time improvement method based on deployment change. Finally, the effectiveness of the analysis conclusions and the proposed method in this paper is validated by practical cases, which indicates that adjusting deployment scheme according to the conclusions obtained in this paper can be helpful in improving the response time of SaaS software.
Background: The current primary focus of programming language benchmarking studies in the literature is performance with less attention on safety. However, this context has a research gap because the software industry has focused more on software safety than performance to safeguard clients. This paper attempts to address this research gap by benchmarking languages in both safety and performance. Furthermore, this study includes Rust, a relatively new language with promising safety and performance features. Methods: This paper compares six prominent programming languages (in alphabetical order: C, C++, Go, Java, Python and Rust) to determine which is the best in terms of safety and performance using quantitative and qualitative methods through actual testing of code and analysis of existing information. Results: The comparisons show that Rust was the safest language, outperforming all the other languages. Regarding performance, Rust, C and C++ performed comparably to each other and generally outperformed Go, Java and Python. Conclusion: It is possible to achieve a superior balance of software safety and performance with, at worst, a minimal performance drop; as Rust clearly demonstrates.
With the aging of the population, the prevalence of eye diseases and thus of vision impairment is increasing. The TV watching habits of people with vision impairments are comparable to normally sighted people,1 however their vision loss prevents them from fully benefiting from this medium. For over 20 years we have been developing video image-enhancement techniques designed to assist people with visual impairments, particularly those due to central retinal vision loss. A major difficulty in this endeavor is the lack of evaluation techniques to assess and compare the effectiveness of various enhancement methods. This paper reviews our approaches to image enhancement and the results we have obtained, with special emphasis on the difficulties encountered in the evaluation of the benefits of enhancement and the solutions we have developed to date.
Overworking behaviors appear in multi-agent systems specially when there are unbalanced communication patterns. This occurs when an agent receives many messages in a short period of time in comparison to other agents. As the agent pays attention to the large amount of messages, it worsens its performance, causing the system's performance to worsen as well, while other agents with similar services can be idle. The reasons for these behaviors are varied and depend on the nature of the messages. This article presents a measurement approach that detects unbalanced overworking situations in multi-agent systems with a new metric. Moreover, this approach includes a set of recommendations that determine possible common causes of this defective behavior depending on the relations of different measurement values. These recommendations also propose a solution for repairing each cause, suppressing these unbalanced overworking situations. A tool has been developed for allowing designers to measure the agents' communications, to obtain the diagnosis, and to know the recommended solutions. The current work has been applied in two case studies, whose results advocate that the suppression of unbalanced overworking situations is strongly related to the improvement of performance in multi-agent systems. Furthermore, the experience of this approach in nine different problem domains is presented as support for the causes and solutions of the set of recommendations.
Big Data analytics is presently one of the most emerging areas of research for both organizations and enterprises. The requirement for deployment of efficient machine learning algorithms over huge amounts of data led to the development of parallelization frameworks and of specialized libraries (like Mahout and MLlib) which implement the most important among these algorithms. Moreover, the recent advances in storage technology resulted in the introduction of high-performing devices, broadly known as Solid State Drives (SSDs). Compared to the traditional Hard Drives (HDDs), SSDs offer considerably higher performance and lower power consumption. Motivated by these appealing features and the growing necessity for efficient large-scale data processing, we compared the performance of several machine learning algorithms on MapReduce clusters whose nodes are equipped with HDDs, SSDs, and devices which implement the latest 3D XPoint technology. In particular, we evaluate several dataset preprocessing methods like vectorization and dimensionality reduction, two supervised classifiers, Naive Bayes and Linear Regression, and the popular k-Means clustering algorithm. We use an experimental cluster equipped with the three aforementioned storage devices under different configurations, and two large datasets, Wikipedia and HIGGS. The experiments showed that the benefits which derive from the usage of SSDs depend on the cluster setup and the nature of the applied algorithms.
Performance, reliability and safety are relevant factors when analyzing or designing a computer system. Many studies about on performance are based on monitoring and analyzing data from a computer system. One of the most useful pieces of data is the Load Average (LA) that which shows the load average of the system in the last minute, the sequence of in the last five minutes and the sequence of in the last fifteen last minutes. There are a lot ofmany studies of the system performance based on the load average. This is shown by mean means of monitoring the commands of the operative system, but sometimes they are sometimes difficult to understand and far of removed from human intuition. The aim of this paper is to show demonstrate a new procedure that allows us to determine the stability of a computer system from a list of load average sample data. The idea is shown as an algorithm based in statistic analysis, the aggregation of information and its formal specification. The result is an evaluation of the stability of the load and the computer system by monitoring but without adding any overhead to the system. In addition, the procedure can be used as a software monitor for risk prevention of on any vulnerable system.
One critical proposition in normative strategic management research is that an accurate perception of the environment by top managers is a prerequisite to attaining better organizational performance. However, recent entrepreneurship studies suggest that entrepreneurs are often leading or even causing environmental changes, and thus they may perceive greater industrial instability than there actually is. In this project, we examine if an over-perception of industrial instability exists among entrepreneurs. If it does, which perceptual mode (accurate perception versus over-perception) will benefit firm performance? We conducted the study in a highly volatile environment — China — and found that entrepreneurial orientation (EO) had an inverse U-shape relationship with perceptual acuity of industrial instability, indicating that a greater level of EO indeed led to an over-perception of industrial instability. However, we found that although perceptual acuity of industrial instability improved firm sales, it was negatively associated with organizational effectiveness evaluated by top managers. Additional analyses were conducted and implications were provided in the end.
Apart from starting, growing and/or sustaining a business, owner-managers in small firms have the responsibility to balance business goals and managerial priorities, with ownership control tendencies and family values (as in the case of the prolific family enterprise) in a fashion that can comfort all business stakeholders. Understanding the inter-relationship among owner-manager's leadership style, intra-organisational management systems, and business growth can enable us to develop insights into how small business leaders masterfully construct a management approach that is conducive to sustainable performance. This empirical paper draws evidence from a large-scale postal survey (5710 respondents) of small and medium sized enterprises (SMEs) in the UK and explores the association of small business managerial style and performance. Logistic regression analysis reveals that the managerial style of entrepreneurs is influenced by a series of demographic and situational factors. Moreover, owner-managed businesses characterised by delegation of authority appear to achieve higher growth in sales and operationalise in a more professional way. The paper concludes with a discussion of the implications. The role of managerial style in interpreting business's growth performance will complement the leadership literature.
This paper examines the performance of Silicon Valley ventures with Asian-American founding teams. We review some challenges faced by these ventures, compare their performance with that of other ventures, and analyze the impact of strategic partnerships on their performance. Our results indicate that firms founded by Asian American entrepreneurs tend to require more time to reach initial public offering (IPO) status than do other ventures in Silicon Valley. Our results further show that, despite needing this extra time, Asian American-founded ventures significantly outperformed their counterparts in 12-month post-IPO share price gain. This superior short-term post-IPO performance suggests that Asian American firms, particularly those that lacked relationships with U.S.-based strategic investors, might have been undervalued prior to and at IPO.
In this study, we examine the ways in which three strategic orientations associated with the firm — entrepreneurial orientation (EO), market orientation (MO), and learning orientation (LO) — affect performance. We hypothesize and test a moderated-mediation framework relating these constructs using a panel dataset of large US retail firms over five years. Our primary finding is that LO mediates the impact of MO on performance and this relationship is strengthened under the moderating influence of EO. In other words, LO acts as a conduit through which the performance effect of MO arises and EO complements MO in terms of impact on performance.
This explorative study examines the performance, socio-demographics and sustainability orientation of owners of micro, small and medium-sized enterprises’ (MSMEs) in Tanzania, a Least Developed Country (LDC). Based on a literature review, a pre-study conducted with experts, and a main study of 168 MSMEs-owners in Morogoro, the analyses show that sustainability orientation is made up of four factors of which clearly one social factor and one environmental factor. Furthermore, in light of the literature, performance and socio-demographic factors are regressed on sustainability orientation and appear not to have a significant influence. However, at a granular level, there are certain effects observed from performance and socio-demographics on sustainability orientation. In contrast to previous research, this study presents fine-grained insights into how performance and sustainability orientation are developed and how the former determines the latter. In doing so, this study sheds light on entrepreneurship in the informal economy of an LDC, refines the understanding of the sustainability orientation of MSME-owners, and endorses the position that “one size does not fit all” regarding the applicability of Western constructs to LDC settings.
This research work investigates the simultaneous interaction of entrepreneurial orientation dimensions of the firm with the leader’s psychological factors for a better explanation of SMEs’ performance. The universal and contingency approach are reductionist and do not allow us to achieve the research objective. So, to study the simultaneous interaction between a great number of variables, we are mobilizing for this purpose the configurational approach and particularly, the perspective of alignment (fit) as “configuration”. Indeed, we are pursuing a quantitative methodological approach and conducting a field survey through a research questionnaire distributed to 100 industrial Tunisian SMEs.
The results of this research reveal a taxonomy of four configurations of industrial Tunisian SMEs significantly different from each other and which have different effects on performance. The configurations are named, the “Leader”, the “Creative”, the “Ambitious” and the “Conservative”.