Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  Bestsellers

  • articleNo Access

    Power Grid Service Data Management Model Based on Four-Cross Tree Method

    Quadtree is a widely used data structure for representing and managing two-dimensional spatial data. This paper proposes a grid business data management model utilizing the Quadtree method to address the challenges in handling grid data. The model integrates Quadtree’s spatial data structure with the specific needs of grid business data management to enhance storage efficiency and query performance. The implementation involves several key steps: First, the grid data is partitioned into distinct regions based on spatial characteristics. Second, a Quadtree is constructed to organize these regions hierarchically. Third, efficient data storage and querying mechanisms are developed based on this structure. Experimental results indicate that the proposed model significantly improves data management for grid systems, providing enhanced support for grid operation and management through increased efficiency in data storage and retrieval.

  • articleOpen Access

    TECHNOLOGIES FOR LARGE DATA MANAGEMENT IN SCIENTIFIC COMPUTING

    In recent years, intense usage of computing has been the main strategy of investigations in several scientific research projects. The progress in computing technology has opened unprecedented opportunities for systematic collection of experimental data and the associated analysis that were considered impossible only few years ago.

    This paper focuses on the strategies in use: it reviews the various components that are necessary for an effective solution that ensures the storage, the long term preservation, and the worldwide distribution of large quantities of data that are necessary in a large scientific research project.

    The paper also mentions several examples of data management solutions used in High Energy Physics for the CERN Large Hadron Collider (LHC) experiments in Geneva, Switzerland which generate more than 30,000 terabytes of data every year that need to be preserved, analyzed, and made available to a community of several tenth of thousands scientists worldwide.

  • articleNo Access

    ON UPDATE-LAST SCHEMES

    We introduce the notion of Update-Last scheme as a distributed method of storing an index, and derive exact bounds on their space complexity.

  • articleNo Access

    The New EventStore Data Management System For The CLEO-c Experiment

    We discuss the new CLEO-c EventStore: a pluggable system that scales from personal needs up to collaboration-wide managed data. Its indexing and versioning features allow physicists to quickly access raw, reconstructed and MC data. It does simple bookkeeping of desired datasets and guarantees the reproducibility of various details of physics analysis. A wide range of physics queries and a variety of file formats are supported.

  • articleNo Access

    WHAT YOUR NEXT EXPERIMENT'S DATA WILL LOOK LIKE: EVENT STORES IN THE LARGE HADRON COLLIDER ERA

    Each new generation of collider experiments confronts the challenge of delivering an event store having at least the performance and functionality of current-generation stores, in the presence of an order of magnitude more data and new computing paradigms (object orientation just a few years ago; grid and service-based computing today). The ATLAS experiment at the Large Hadron Collider, for example, will produce 1.6-megabyte events at 200 Hz–an annual raw data volume of 3.2 petabytes. With derived and simulated data, the total volume may approach 10 petabytes per year.

    Scale, however, is not the only challenge. In the Large Hadron Collider (LHC) experiments, the preponderance of computing power will come from outside the host laboratory. More significantly, no single site will host a complete copy of the event store–data will be distributed, not simply replicated for convenience, and many physics analyses will routinely require distributed (grid) computing.

    This paper uses the emerging ATLAS computing model to provide a glimpse of how next-generation event stores are taking shape, touching on key issues in navigation, distribution, scale, coherence, data models and representation, metadata infrastructure, and the role(s) of databases in event store management.

  • articleNo Access

    Queueing Analysis of Continuous Queries for Uncertain Data Streams Over Sliding Windows

    With the rapid development of data collection methods and their practical applications, the management of uncertain data streams has drawn wide attention in both academia and industry. System capacity planning and Quality of service (QoS) metrics are two very important problems for data stream management systems (DSMSs) to process streams efficiently due to unpredictable input characteristics and limited memory resource in the system. Motivated by this, in this paper, we explore an effective approach to estimate the memory requirement, data loss ratio, and tuple latency of continuous queries for uncertain data streams over sliding windows in a DSMS. More specifically, we propose a queueing model to address these problems in this paper. We study the average number of tuples, average tuple latency in the queue, and the distribution of the number of tuples and tuple latency in the queue under the Poisson arrival of input data streams in our queueing model. Furthermore, we also determine the maximum capacity of the queueing system based on the data loss ratio. The solutions for the above problems are very important to help researchers design, manage, and optimize a DSMS, including allocating buffer needed for a queue and admitting a continuous uncertain query to the system without violation of the pre-specified QoS requirements.

  • articleNo Access

    Identifying Most Preferential Skyline Product Combinations

    Nowadays, department stores and online merchants usually develop some price promotion strategies to attract customers and increase their purchase intention. Therefore, it is significant for customers to pick out attractive products and obtain the maximum discount rate. Admittedly, the skyline query is a most useful tool to find out attractive products. However, it does little to help select the product combinations with the maximum discount rate. Motivated by this, we identify an interesting problem, a most preferential skyline product (MPSP) combination discovering problem, which is NP-hard, for the first time in the literature. This problem aims to report all skyline product combinations having the maximum discount rate. Since the exact algorithm for the MPSP is not scalable to large or high-dimensional datasets, we design an approximate algorithm that guarantees the accuracy of the results. The experiment results demonstrate the efficiency and effectiveness of our proposed algorithms.

  • articleNo Access

    NEW CHALLENGES FACING INTEGRATIVE BIOLOGICAL SCIENCE IN THE POST-GENOMIC ERA

    The future of biology will be increasingly driven by the fundamental paradigm shift from hypothesis-driven research to data-driven discovery research employing the growing volume of biological data coupled to experimental testing of new discoveries. But hardware and software limitations in the current workflow infrastructure make it impossible or intractible to use real data from disparate sources for large-scale biological research. We identify key technological developments needed to enable this paradigm shift involving (1) the ability to store and manage extremely large datasets which are dispersed over a wide geographical area, (2) development of novel analysis and visualization tools which are capable of operating on enormous data resources without overwhelming researchers with unusable information, and (3) formalisms for integrating mathematical models of biosystems from the molecular level to the organism population level. This will require the development of algorithms and tools which efficiently utilize high-performance compute power and large storage infrastructures. The end result will be the ability of a researcher to integrate complex data from many different sources with simulations to analyze a given system at a wide range of temporal and spatial scales in a single conceptual model.

  • articleFree Access

    Computer Network Data Management Model Based on Edge Computing

    Data reliability and confidence in the data are very important issues, especially when the system integrates fraud or false information. The misusing of data collected may create serious problems. With the fast development of computing techniques, much data are collected from various terminals and industrial devices. Edge computing operates by driving data, software and computer resources from the centralized network to its extremes, allowing pieces of knowledge to lie on distributed cloud networks. Its target customers continue to use commercial Internet application software for every internet customer. Edge computing is used to provide delay-free customer experience assistance for features of the Internet of Things (IoT) services on the edge of the user network. The document identifies an IoT computing platform collaborating with the edge competitive data management latency (CDML) tool. This approach separately categorizes edge layer requests and response data over time using demand-density driven optimization. A difference-based optimization optimizes the frame limits for simultaneous request processing and exact allocation of data. The architectural efficiency of edge computing can be assessed by comparing latency, bandwidth usage, and overhead. Furthermore, estimating the availability, credibility and confidentiality of security solutions within each party would take into consideration security concerns in edge computing and propose a safety assessment process for IoT networks with edge computing. This procedure is finally validated using appropriate tests, and the resulting findings are examined to demonstrate the method’s accuracy. Experimental data are used to validate methods to request maintenance and processing, response time, resource utilization and contract period. In comparison to current approaches, the results of the proposed CDML are measured with a percentage of 97.90%.

    The proposed system enhances the request and response comparison ratio 97.5%, analyzing request performance ratio 98.1%, response with time analysis ratio of 98.3%, data allocation approach analysis ratio 97.7%.

  • articleNo Access

    SUPPORTING MULTI-SYNCHRONOUS GROUPWARE: DATA MANAGEMENT PROBLEMS AND A SOLUTION

    It is common that, in a long-term asynchronous collaborative activity, groups of users engage in occasional synchronous sessions. In this paper, we analyze the data management requirements for supporting this common work practice in typical collaborative activities and applications. We call the applications that support such work practice multi-synchronous applications. This analysis shows that, as users interact in different ways in each setting, some applications have different requirements and need to rely on different data sharing techniques in synchronous and asynchronous settings. We present a data management system that allows to integrate a synchronous session in the context of a long-term asynchronous interaction, using the suitable data sharing techniques in each setting and an automatic mechanism to convert the long sequence of small updates produced in a synchronous session into a large asynchronous contribution. We exemplify the use of our approach with two multi-synchronous applications.

  • articleNo Access

    Mapping Discovery Over Revealing Schemas

    In a world of wide-scale information sharing, data are described in different formats, i.e. data structures, values and schemas. Querying such sources entails techniques that can bridge the data formats. Some existing techniques deal with schema mapping and view complementary aspects of the problem. Important ones, consider producing all the possible mappings for a pair of schemas, insinuating accompanying semantics in the mappings and adapting correct mappings as schemas evolve. In this work, we consider the problem of discovering mappings as schemas of autonomous sources are gradually revealed. Using as an example setting an overlay of peer databases, we present a schema mapping solution that discovers correct mappings as peer schemas are gradually revealed to remote peers. Mapping discovery is schema-centric and incorporates new semantics as they are unveiled. Mapping experience is reused and possible mappings are ranked so that the best choice is presented to the user. The experimental study confirms the suitability of the proposed solution to dynamic settings of heterogeneous sources.

  • articleNo Access

    Cooperative Data Management and its Application to Mobile Computing

    Mobile computing and workgroup computing are emerging technologies which have so far been treated independently. Current approaches to support cooperative work neglect the special characteristics of mobile environments like limited bandwidth or temporary disconnection. On the other hand, approaches to support disconnected operation rely on the assumption that the degree of data-sharing is low which is obviously not appropriate for cooperative work. In this paper, we utilize the COACT cooperative transaction model to provide support for parallel activities in mobile environments. We present a system architecture that is able to cope with the special characteristics of mobile environments and a formal framework for the consistent information exchange between mobile users. The paper shows how the COACT history merge algorithm reduces conflicts by exploiting operation semantics and offering consistent sequences of operations. We believe that our new approach is a viable solution to the growing demand for cooperation in mobile environments.

  • articleNo Access

    BIG DATA PROCESSING: BIG CHALLENGES AND OPPORTUNITIES

    With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective management and processing of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing techniques from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including definition of big data, big data management platform, big data service models, distributed file system, data storage, data virtualization platform and distributed applications. Following the MapReduce parallel processing framework, we introduce some MapReduce optimization strategies reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.

  • articleNo Access

    AI-Assisted Dynamic Modeling for Data Management in a Distributed System

    Data management is an administrative mechanism that involves the acquisitions, validations, storage, protection, and processing of data needed by its users to ensure that data are accessible, reliable, and timely. It is a challenging task to manage protections for information properties. With the emphasis on distributed systems and Internet-accessible systems, the need for efficient information security management is increasingly important. In the paper, artificial intelligence-assisted dynamic modeling (AI-DM) is used for data management in a distributed system. Distributed processing is an effective way to enhance the efficiency of database systems. Therefore, each distributed database structure’s functionality depends significantly on its proper architecture in implementing fragmentation, allocation, and replication processes. The proposed model is a dynamically distributed internet database architecture. This suggested model enables complex decision-making on fragmentation, distribution, and duplication. It provides users with links from anywhere to the distributed database. AI-DM has an improved allocation and replication strategy where no query performance information is accessible at the initial stage of the distributed database design. AI-DM findings show that the proposed database model leads to the reliability and efficiency of the enhanced system. The final results are obtained by analyzing the dynamic modeling ratio is 87.6%, increasing decision support ratio is 88.7%, the logistic regression ratio is 84.5%, the data reliability ratio is 82.2%, and the system ratio is 93.8%.

  • articleNo Access

    A Descriptive Framework for Information Retrieval Using Crawler Based Clustering and Effective Search Algorithm

    Information Retrieval is the most predominant topic in the field of Information Systems as the generation of data over various systems and channels is growing every day. The proposed system which works on the offline basis is designed for the purpose of organizing data in a defined manner and for the purpose of increasing relevancy in the retrieval of the required information from this largely generated data. In the proposed system two novel algorithms, Dynamic Path Selection Clustering (DPSC) algorithm for clustering and the Rearward Binary Window Match (RBWM) algorithm for search process are introduced to overcome the difficulty in data organization and search. The evaluation of the entire system is done and the results are compared along with the results of the existing techniques.

  • articleNo Access

    A Comprehensive Data and Project Management Platform: A Case Study of a Central Bank

    This study considers an efficient way to manage different aspects of a Central Bank’s Data Warehousing project using a custom built data and project management platform. Although the key aspect of managing a Data Warehouse is of course data management, but simultaneously the management of on-going projects related to data integration, website, internal and external human resources, etc. are also important. Data Warehousing in Central Banks is strategically different with other Commercial Bank/Corporate Data Warehousing project in terms of functionality/purpose. A typical Data Warehouse of a Commercial Bank or Corporate body primarily consists of the data generated out of their business operations. On the other hand, a Central Bank collects data from such Commercial Banks, Corporate bodies or Regulatory bodies for regulatory or supervisory purpose along with data on various subject domains allied to central banking compiled within the organisation. As a result, management of such diverse data sources is a major challenge for every central bank. Further, the traditional project management concepts are not applicable as-it-is in the context of a Central Bank. Keeping in view these particularities, and the shortcomings of available commercial project management tools, a comprehensive Information Management System has been built in Reserve Bank of India. The development was done in-house and based on open source platform. The salient features of this system are:

    • (1)An all-in-one monitoring platform to monitor different aspects of Data Warehousing and Data Management project through a single window.
    • (2)Scalability.
    • (3)Cross-platform usability.

    This study presents the background, design and development of the system in detail.

  • articleNo Access

    Towards a Simplified View of Data Management Maturity Models

    During the last 30 years, a proliferation of Data Management Maturity Models has been observed. Most of this proliferation was driven by consulting companies that used existing models and tried to differentiate their offerings by renaming the concepts. This has created confusion that led organisations to have difficulty selecting a Maturity Model and applying it. This paper proposes a simpler and more Integrative Framework to help the organisation assess and sustainably enhance its data management. To do so, in-depth academic papers and professional documents have been gathered following a structured approach. The outcome of the paper is an Integrative Framework and a Data Maturity Evaluation and Enhancement Process that tries to simplify Data Management Maturity Models.

  • articleNo Access

    The Main Big Data Solution Pillars: How to Effectively Model and Manage the Massive Data Deluge?

    In today’s data-driven world, the volume of information produced daily is staggering. Without a robust data engineering strategy, companies face the risk of prolonged delays, decreased productivity, dissatisfied customers, and strained business relationships. Effective data management and data modelling are critical for transforming this vast amount of information into valuable insights that drive business growth and provide a competitive edge. By gathering and analysing data through these methods, businesses can make informed decisions that significantly impact their growth and success. Data modelling and data management are both critical components of working with data, but they focus on different aspects of handling and utilising data within an organisation. Understanding the distinction between these two areas is crucial for effectively managing data within an organisation and ensuring that data systems are well-designed and properly maintained. By delving into the specifics of data modelling and data management, this paper aims to provide a comprehensive understanding of how these practices can be leveraged to enhance organisational efficiency, productivity, and decision-making. We provide a comprehensive and insightful exploration of data modelling and data management while highlighting their critical roles in modern business environments.

  • articleNo Access

    An Adaptive Parallel PI-Skyline Query for Probabilistic and Incomplete Database

    In the context of large quantities of information, the skyline query is a particularly useful tool for data mining and decision-making. However, the massive amounts of information on the Internet are frequently incomplete and uncertain due to data randomness, transmission errors, and many other reasons. Therefore, an efficient skyline query algorithm over an incomplete uncertain database is imperative. To address this issue, this paper proposes an efficient algorithm to apply skyline query on probabilistic incomplete data. The algorithm is based on U-Skyline model to avoid disadvantages of traditional P-Skyline model. The proposed methods introduce some novel concepts including transferred tuples, leading tuples and the new dominance relationship between probabilistic incomplete data. Besides, it is a parallel processing algorithm. Extensive experiments demonstrate the effectiveness and efficiency of the proposed algorithms.

  • articleNo Access

    Spatial Data Management in IoT Systems: Solutions and Evaluation

    As the Internet of Things (IoT) systems gain in popularity, an increasing number of Big Data sources are available. Ranging from small sensor networks designed for household use to large fully automated industrial environments, the IoT systems create billions of measurements each second making traditional storage and indexing solutions obsolete. While research around Big Data has focused on scalable solutions that can support the datasets produced by these systems, the focus has been mainly on managing the volume and velocity of these data, rather than providing efficient solutions for their retrieval and analysis. A key characteristic of these data, which is, more often than not, overlooked, is the spatial information that can be used to integrate data from multiple sources and conduct multi-dimensional analysis of the collected information. We present here the solutions currently available for the storage and indexing of spatial datasets produced by the IoT systems and we discuss their applicability in real-world scenarios.