Search name | Searched On | Run search |
---|---|---|
Keyword: Data Science (23) | 30 Mar 2025 | Run |
You do not have any saved searches
The harmonic balance (HB) method is efficient in predicting the limit cycle oscillations for many dynamic systems including very high dimensional systems in fluid dynamics. Recently, some variations and extensions of the HB method have been developed and employed in the computation of large systems. This study focuses on a new development of the HB method. The new approach, the hybrid harmonic balance method (HHB), is developed based on the ideas in the conventional HB approach and the high dimensional harmonic balance (HDHB) approach. To demonstrate the applications and advantages, the new method is employed on a prototypical dynamic system in comparison with the results from both the HB and HDHB approaches. It is theoretically proved that when twice the number of harmonics in HB are included in the HHB derivation the results from HHB achieve the accuracy of the HB method for the cubic nonlinearity. Numerical simulations reveal that HHB combines the advantages of both the conventional HB and HDHB methods: Ease of implementation for very high dimensional systems regardless of the complexity of the nonlinearities and generation of meaningful solutions while the spurious solutions (from HDHB) are diminished. This new method is more accurate when compared with the HDHB method and much more computationally efficient when compared with the conventional HB method.
Parkinson’s disease (PD) is a progressive neurodegenerative illness that frequently affects phonation, articulation, fluency, and prosody of speech. Speech impairment is a major sign of PD which can be employed for the earlier identification of the disease and provide proper treatment. Besides, the machine learning (ML) models can be commonly employed for PD detection and classification by the use of speech data. Since the speech data has the features of maximum data redundancy, high aliasing, and small sample sizes, dimensionality reduction (DR) techniques become essential for effective PD diagnosis. Therefore, this paper presents a new DR with weighted voting ensemble classification (DR-WVEC) model for PD diagnosis. The presented DR-WVEC model operates on different stages such as pre-processing, DR, classification, and voting process. Primarily, the speech data undergoes min–max normalization process in order to normalize the speech data. Besides, linear discriminant analysis (LDA) technique is applied for reducing the dimensionality of the features. In addition, an ensemble of two ML models, namely extreme learning machine (ELM) and Adaboost models, is employed for classification. Finally, a weighted voting-based classification process is carried out where the integration of two ML models takes place and the highest outcome is chosen as the final results. In order to assess the effective PR diagnostic outcome, an extensive set of simulations were carried out on Parkinson’s telemonitoring dataset. The obtained experimental results reported the betterment of the DR-VWEC technique over the other compared methods in terms of different aspects.
This paper shows the added value of using the existing specific domain knowledge to generate new derivated variables to complement a target dataset and the benefits of including these new variables into further data analysis methods. The main contribution of the paper is to propose a methodology to generate these new variables as a part of preprocessing, under a double approach: creating 2nd generation knowledge-driven variables, catching the experts criteria used for reasoning on the field or 3rd generation data-driven indicators, these created by clustering original variables. And Data Mining and Artificial Intelligence techniques like Clustering or Traffic light Panels help to obtain successful results. Some results of the project INSESS-COVID19 are presented, basic descriptive analysis gives simple results that even though they are useful to support basic policy-making, especially in health, a much richer global perspective is acquired after including derivated variables. When 2nd generation variables are available and can be introduced in the method for creating 3rd generation data, added value is obtained from both basic analysis and building new data-driven indicators.
Complex systems, as interwoven miscellaneous interacting entities that emerge and evolve through self-organization in a myriad of spiraling contexts, exhibit subtleties on global scale besides steering the way to understand complexity which has been under evolutionary processes with unfolding cumulative nature wherein order is viewed as the unifying framework. Indicating the striking feature of non-separability in components, a complex system cannot be understood in terms of the individual isolated constituents’ properties per se, it can rather be comprehended as a way to multilevel approach systems behavior with systems whose emergent behavior and pattern transcend the characteristics of ubiquitous units composing the system itself. This observation specifies a change of scientific paradigm, presenting that a reductionist perspective does not by any means imply a constructionist view; and in that vein, complex systems science, associated with multiscale problems, is regarded as ascendancy of emergence over reductionism and level of mechanistic insight evolving into complex system. While evolvability being related to the species and humans owing their existence to their ancestors’ capability with regards to adapting, emerging and evolving besides the relation between complexity of models, designs, visualization and optimality, a horizon that can take into account the subtleties making their own means of solutions applicable is to be entailed by complexity. Such views attach their germane importance to the future science of complexity which may probably be best regarded as a minimal history congruent with observable variations, namely the most parallelizable or symmetric process which can turn random inputs into regular outputs. Interestingly enough, chaos and nonlinear systems come into this picture as cousins of complexity which with tons of its components are involved in a hectic interaction with one another in a nonlinear fashion amongst the other related systems and fields. Relation, in mathematics, is a way of connecting two or more things, which is to say numbers, sets or other mathematical objects, and it is a relation that describes the way the things are interrelated to facilitate making sense of complex mathematical systems. Accordingly, mathematical modeling and scientific computing are proven principal tools toward the solution of problems arising in complex systems’ exploration with sound, stimulating and innovative aspects attributed to data science as a tailored-made discipline to enable making sense out of voluminous (-big) data. Regarding the computation of the complexity of any mathematical model, conducting the analyses over the run time is related to the sort of data determined and employed along with the methods. This enables the possibility of examining the data applied in the study, which is dependent on the capacity of the computer at work. Besides these, varying capacities of the computers have impact on the results; nevertheless, the application of the method on the code step by step must be taken into consideration. In this sense, the definition of complexity evaluated over different data lends a broader applicability range with more realism and convenience since the process is dependent on concrete mathematical foundations. All of these indicate that the methods need to be investigated based on their mathematical foundation together with the methods. In that way, it can become foreseeable what level of complexity will emerge for any data desired to be employed. With relation to fractals, fractal theory and analysis are geared toward assessing the fractal characteristics of data, several methods being at stake to assign fractal dimensions to the datasets, and within that perspective, fractal analysis provides expansion of knowledge regarding the functions and structures of complex systems while acting as a potential means to evaluate the novel areas of research and to capture the roughness of objects, their nonlinearity, randomness, and so on. The idea of fractional-order integration and differentiation as well as the inverse relationship between them lends fractional calculus applications in various fields spanning across science, medicine and engineering, amongst the others. The approach of fractional calculus, within mathematics-informed frameworks employed to enable reliable comprehension into complex processes which encompass an array of temporal and spatial scales notably provides the novel applicable models through fractional-order calculus to optimization methods. Computational science and modeling, notwithstanding, are oriented toward the simulation and investigation of complex systems through the use of computers by making use of domains ranging from mathematics to physics as well as computer science. A computational model consisting of numerous variables that characterize the system under consideration allows the performing of many simulated experiments via computerized means. Furthermore, Artificial Intelligence (AI) techniques whether combined or not with fractal, fractional analysis as well as mathematical models have enabled various applications including the prediction of mechanisms ranging extensively from living organisms to other interactions across incredible spectra besides providing solutions to real-world complex problems both on local and global scale. While enabling model accuracy maximization, AI can also ensure the minimization of functions such as computational burden. Relatedly, level of complexity, often employed in computer science for decision-making and problem-solving processes, aims to evaluate the difficulty of algorithms, and by so doing, it helps to determine the number of required resources and time for task completion. Computational (-algorithmic) complexity, referring to the measure of the amount of computing resources (memory and storage) which a specific algorithm consumes when it is run, essentially signifies the complexity of an algorithm, yielding an approximate sense of the volume of computing resources and seeking to prove the input data with different values and sizes. Computational complexity, with search algorithms and solution landscapes, eventually points toward reductions vis à vis universality to explore varying degrees of problems with different ranges of predictability. Taken together, this line of sophisticated and computer-assisted proof approach can fulfill the requirements of accuracy, interpretability, predictability and reliance on mathematical sciences with the assistance of AI and machine learning being at the plinth of and at the intersection with different domains among many other related points in line with the concurrent technical analyses, computing processes, computational foundations and mathematical modeling. Consequently, as distinctive from the other ones, our special issue series provides a novel direction for stimulating, refreshing and innovative interdisciplinary, multidisciplinary and transdisciplinary understanding and research in model-based, data-driven modes to be able to obtain feasible accurate solutions, designed simulations, optimization processes, among many more. Hence, we address the theoretical reflections on how all these processes are modeled, merging all together the advanced methods, mathematical analyses, computational technologies, quantum means elaborating and exhibiting the implications of applicable approaches in real-world systems and other related domains.
Mixed Methods for Establishing Content Validity: A Value-Added Approach?
Data Science Driving Innovation.
In the context of computing and informatics, Cognitive Diversity (CD) has been proposed to characterize the degree of dissimilarity between multiple scoring systems (MSS). As such, CD serves a role in informatics analogous to that of Pearson’s Correlation in classical statistics. Here we review MSS and explore CD’s utility in relation to the notions of correlation and distance in machine learning, ensemble methods, rank aggregation, and combinatorial fusion in both parametric score space and non-parametric rank space. Finally, we survey applications of CD in combining MSS in a variety of domains in science, technology, society, business, and management. Our study provides a new data science framework for discovery in data-rich environments.
In recent times, the healthcare industry has been generating a significant amount of data in distinct formats, such as electronic health records (EHR), clinical trials, genetic data, payments, scientific articles, wearables, and care management databases. Data science is useful for analysis (pattern recognition, hypothesis testing, risk valuation) and prediction. The major, primary usage of data science in the healthcare domain is in medical imaging. At the same time, lung cancer diagnosis has become a hot research topic, as automated disease detection poses numerous benefits. Although numerous approaches have existed in the literature for lung cancer diagnosis, the design of a novel model to automatically identify lung cancer is a challenging task. In this view, this paper designs an automated machine learning (ML) with data science-enabled lung cancer diagnosis and classification (MLDS-LCDC) using computed tomography (CT) images. The presented model initially employs Gaussian filtering (GF)-based pre-processing technique on the CT images collected from the lung cancer database. Besides, they are fed into the normalized cuts (Ncuts) technique where the nodule in the pre-processed image can be determined. Moreover, the oriented FAST and rotated BRIEF (ORB) technique is applied as a feature extractor. At last, sunflower optimization-based wavelet neural network (SFO-WNN) model is employed for the classification of lung cancer. In order to examine the diagnostic outcome of the MLDS-LCDC model, a set of experiments were carried out and the results are investigated in terms of different aspects. The resultant values demonstrated the effectiveness of the MLDS-LCDC model over the other state-of-the-art methods with the maximum sensitivity of 97.01%, specificity of 98.64%, and accuracy of 98.11%.
Recent advances in data science are opening up new research fields and broadening the range of applications of stochastic dynamical systems. Considering the complexities in real-world systems (e.g., noisy data sets and high dimensionality) and challenges in mathematical foundation of machine learning, this review presents two perspectives in the interaction between stochastic dynamical systems and data science.
On the one hand, deep learning helps to improve first principle-based methods for stochastic dynamical systems. AI for science, combining machine learning methods with available scientific understanding, is becoming a valuable approach to study stochastic dynamical systems with the help of observation data. On the other hand, a challenge is the theoretical explanations for deep learning. It is crucial to build explainable deep learning structures with the help of stochastic dynamical systems theory in order to demonstrate how and why deep learning works.
In this review, we seek better understanding of the mathematical foundation of the state-of-the-art techniques in data science, with the help of stochastic dynamical systems, and we further apply machine learning tools for studying stochastic dynamical systems. This is achieved through stochastic analysis, algorithm development, and computational implementation. Topics involved with this review include Stochastic Analysis, Dynamical Systems, Inverse Problems, Data Assimilation, Numerical Analysis, Optimization, Nonparametric Statistics, Uncertainty Quantification, Deep Learning, and Deep Reinforcement Learning. Moreover, we emphasize available analytical tools for non-Gaussian fluctuations in scientific and engineering modeling.
The availability of massive data about sports activities offers nowadays the opportunity to quantify the relation between performance and success. In this study, we analyze more than 6000 games and 10 million events in six European leagues and investigate this relation in soccer competitions. We discover that a team’s position in a competition’s final ranking is significantly related to its typical performance, as described by a set of technical features extracted from the soccer data. Moreover, we find that, while victory and defeats can be explained by the team’s performance during a game, it is difficult to detect draws by using a machine learning approach. We then simulate the outcomes of an entire season of each league only relying on technical data and exploiting a machine learning model trained on data from past seasons. The simulation produces a team ranking which is similar to the actual ranking, suggesting that a complex systems’ view on soccer has the potential of revealing hidden patterns regarding the relation between performance and success.
Some prediction methods use probability to rank their predictions, while some other prediction methods do not rank their predictions and instead use p-values to support their predictions. This disparity renders direct cross-comparison of these two kinds of methods difficult. In particular, approaches such as the Bayes Factor upper Bound (BFB) for p-value conversion may not make correct assumptions for this kind of cross-comparisons. Here, using a well-established case study on renal cancer proteomics and in the context of missing protein prediction, we demonstrate how to compare these two kinds of prediction methods using two different strategies. The first strategy is based on false discovery rate (FDR) estimation, which does not make the same naïve assumptions as BFB conversions. The second strategy is a powerful approach which we colloquially call “home ground testing”. Both strategies perform better than BFB conversions. Thus, we recommend comparing prediction methods by standardization to a common performance benchmark such as a global FDR. And where this is not possible, we recommend reciprocal “home ground testing”.
Communities globally experience devastating effects, high monetary loss and loss of lives due to incidents of flood and other hazards. Inadequate information and awareness of flood hazard make the management of flood risks arduous and challenging. This paper proposes a hybridized analytic approach via unsupervised and supervised learning methodologies, for the discovery of pieces of knowledge, clustering and prediction of flood severity levels (FSL). A two-staged unsupervised learning based on k-means and self-organizing maps (SOM) was performed on the unlabeled flood dataset. K-means based on silhouette criterion discovered top three representatives of the optimal numbers of clusters inherent in the flood dataset. Experts’ judgment favored four clusters, while Squared Euclidean distance was the best performing distance measure. SOM provided cluster visuals of the input attributes within the four different groups and transformed the dataset into a labeled one. A 5-layered Adaptive Neuro Fuzzy Inference System (ANFIS) driven by hybrid learning algorithm was applied to classify and predict FSL. ANFIS optimized by Genetic Algorithm (GA) produced root mean squared error (RMSE) of 0.323 and Error Standard Deviation of 0.408 while Particle Swarm Optimized ANFIS model produced 0.288 as the RMSE, depicting 11% improvement when compared with GA optimized model. The result shows significant improvement in the classification and prediction of flood risks using single ML tool.
As an emergent field of inquiry, Data Science serves both the information technology world and the applied sciences. Data Science is a known term that tends to be synonymous with the term Big-Data; however, Data Science is the application of solutions found through mathematical and computational research while Big-Data Science describes problems concerning the analysis of data with respect to volume, variation, and velocity (3V). Even though there is not much developed in theory from a scientific perspective for Data Science, there is still great opportunity for tremendous growth. Data Science is proving to be of paramount importance to the IT industry due to the increased need for understanding the insurmountable amount of data being produced and in need of analysis. In short, data is everywhere with various formats. Scientists are currently using statistical and AI analysis techniques like machine learning methods to understand massive sets of data, and naturally, they attempt to find relationships among datasets. In the past 10 years, the development of software systems within the cloud computing paradigm using tools like Hadoop and Apache Spark have aided in making tremendous advances to Data Science as a discipline [Z. Sun, L. Sun and K. Strang, Big data analytics services for enhancing business intelligence, Journal of Computer Information Systems (2016), doi: 10.1080/08874417.2016.1220239]. These advances enabled both scientists and IT professionals to use cloud computing infrastructure to process petabytes of data on daily basis. This is especially true for large private companies such as Walmart, Nvidia, and Google. This paper seeks to address pragmatic ways of looking at how Data Science — with respect to Big-Data Science — is practiced in the modern world. We also examine how mathematics and computer science help shape Big-Data Science’s terrain. We will highlight how mathematics and computer science have significantly impacted the development of Data Science approaches, tools, and how those approaches pose new questions that can drive new research areas within these core disciplines involving data analysis, machine learning, and visualization.
I will make an argument of who will benefit from this special issue on data science and related topics.
As the capabilities of machine learning have developed, more researchers and health care providers are beginning to consider applications for health informatics to improve health care outcomes and address issues of health equity. The Centers for Medicare and Medicaid Services considers 30-day readmission rates to the Emergency Department (ED) to be an “outcome of care” measure. Such measures show how well a hospital is doing in preventing complications, educating patients about their care needs, and helping patients make a smooth transition from the hospital to home or other care facilities. While certain readmissions are medically necessary, hospitals usually aim to decrease the rate of 30-day ED readmissions by decreasing the number of avoidable unplanned revisits. This work is an evidential study that demonstrates the positive impact of integrating Community Health Workers (CHWs) and Social Determinants of Health (SDoH) in decreasing the 30-day unplanned hospital ED readmissions at Sinai Chicago. Using data from the Sinai Urban Health Institute, we compare predicting the readmissions of patients with and without data pertaining to CHWs and SDoH, characterize the improvement in predictions, and discuss lessons learned in the process. We show that when CHWs engage with patients, the predictive accuracy of the classifier is higher by 13.0%–15.2%. Importantly, we show that the features (data characteristics) related to the CHWs are important to the classification, pointing to the importance of the program. We optimize the classifier for engaged patients and demonstrate the improvement in predictive capabilities of the classier using multiple metrics and sets of features. We use our results to make recommendations for improving patient care, discuss limitations and future work. Notably, our work points directly to the human connection between patients and CHWs as an important feature predictive of readmission rate.
This paper introduces natural language processing into the study of central banking. It studies the evolution of the ECB’s communication through time, considering its three subsequent presidents (W. Duisenberg, J. C. Trichet and M. Draghi) and the pre- and post-2008 financial crisis era. It helps understand the history of the ECB since its inception. From a methodological standpoint, we study the evolution of the ECB’s speeches. The speech analysis is based on text classification and sentiment/polarity analyses. For that purpose, we have built a unique dataset of the ECB’s speeches. We have coded algorithms to run the text analysis through time. They help us capture the evolution in the ECB’s understanding of the actual economic situation and also measure — for instance — the stress level at the ECB through a polarity analysis through time.
In network telemetry systems, nodes produce vast number of configuration files based on how they are configured. Steps were taken to process these files into databases to help the work of the developers, testers and customer support to focus on the development and testing and to be able to give advice to the customers about how to configure the nodes. However, the processing of these data in relational database manager system is slow, hard to query and the storage takes huge disk space.
In this paper, we are presenting a way to store the data produced by these nodes in graph database, changing from relational database to NoSQL environment. With our approach, one can easily represent and visualize the network of machines.
In the end, we are going to compare the inserting, querying time and storage size in different database manager systems. The results could also be used for other types of configuration data too from other kinds of machines to show the connection between them and query them easily.
In intelligent information systems data play a critical role. The issue of missing data is one of the commonplace problems occurring in data collected in the real world. The problem stems directly from the very nature of data collection. In this paper, the notion of handling missing values in a real-world application of computational intelligence is considered. Two experimental campaigns were conducted, evaluating different approaches to the missing values imputation on Random Forest-based classifiers, trained using modern cybersecurity benchmarks datasets: CICIDS2017 and IoT-23. In result of the experiments it transpired that the chosen algorithm for data imputation has a severe impact on the results of the classifier used for network intrusion detection. It also comes to light that one of the most popular approaches to handling missing data — complete case analysis — should never be used in cybersecurity.
In this article, I explore the synergy between Large Language Models (LLMs) and computational chemistry in the context of digital reticular chemistry and propose a workflow leveraging these technologies to advance research and discovery in the field. I argue that understanding the intricacies of new tools is imperative before integrating them into applications, and that the proposed workflow, though robust, merely offers a glimpse into the expansive potential and applications of this field.
The first chapter provides an overview of our research on connecting the various aspects of technology that now shapes the Sharing as well as the Digital Economies. It underscores the integration of IoT, blockchain, and AI into a decentralized intelligence system that has profound possibilities to employ data in innovative ways. The explosion of data has the potential to transform how we view our current processes and possibilities. How we use and analyze data has become significantly important to our economics, and we discuss its transformative adoption in our industries and policy to enable its development. As we grapple with technology’s inherent risks, we describe how it can be used as a moral agent. It also describes the structure of the book, where each chapter elaborates on a specific application in a particular economic sector.
Metrologists are increasingly being faced with challenges in statistical data analysis and modeling, data reduction, and uncertainty evaluation, that require an ever more demanding and comprehensive analytical and computational toolkit as well as a strategy for communication of more complex results. For example, conventional assumptions of Gaussian (or normal) measurement errors may not apply, which then necessitates alternative procedures for uncertainty evaluation.
This contribution, aimed at metrologists whose specialized knowledge is in a particular area of science, and whose prior study of topics in probability or statistics will have been merely introductory, provides illustrative examples and suggestions for self-study. These examples aim to empower metrologists to attain a working level of concepts, statistical methods, and computational techniques from these particular areas, to become self-sufficient in the statistical analysis and modeling of their measurement data, and to feel comfortable evaluating, propagating, and communicating the associated measurement uncertainty.
The contribution also addresses modern computational requirements in measurement science. Since it is becoming clear to many metrologists that tools like Microsoft Excel, Libreoffice Calc, or Apple’s Numbers often are in-sufficiently flexible to address emerging needs, or simply fail to provide required specialized tools, this contribution includes accompanying R code with detailed explanations that will guide scientists through the use of a new computing tool.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.