Advanced Search

Narrow Results

Results: 1 - 5of5

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
Keyword: Data Pre-processing (5)	30 Mar 2025	Run
Keyword: Feature Engineering (10)	30 Mar 2025	Run
Keyword: CR Categories C.1.2 (1)	30 Mar 2025	Run
Keyword: Ribbon Surface-knot (1)	30 Mar 2025	Run
Keyword: Interaction Quality (1)	30 Mar 2025	Run

articleNo Access
Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web Intelligence
- Albert Weichselbraun,
- Daniel Streiff, and
- Arno Scharl
International Journal on Artificial Intelligence Tools01 Apr 2015
Preview Abstract
Linking named entities to structured knowledge sources paves the way for state-of-the-art Web intelligence applications which assign sentiment to the correct entities, identify trends, and reveal relations between organizations, persons and products. For this purpose this paper introduces Recognyze, a named entity linking component that uses background knowledge obtained from linked data repositories, and outlines the process of transforming heterogeneous data silos within an organization into a linked enterprise data repository which draws upon popular linked open data vocabularies to foster interoperability with public data sets. The presented examples use comprehensive real-world data sets from Orell Füssli Business Information, Switzerland's largest business information provider. The linked data repository created from these data sets comprises more than nine million triples on companies, the companies' contact information, key people, products and brands. We identify the major challenges of tapping into such sources for named entity linking, and describe required data pre-processing techniques to use and integrate such data sets, with a special focus on disambiguation and ranking algorithms. Finally, we conduct a comprehensive evaluation based on business news from the New Journal of Zurich and AWP Financial News to illustrate how these techniques improve the performance of the Recognyze named entity linking component.
articleNo Access
West Nile Virus Prediction Based on Data Mining
- Wei Meng
Journal of Interconnection Networks31 Jan 2022
Preview Abstract
This paper performed some exploratory data visualization on this data set. The nature and representation of input data was found out and the preliminary feature selection was conducted in this step. And this paper performed data preprocessing and feature engineering on this data set, which had critical importance of the accuracy of prediction results. The paper built multiple regression models on the missing values prediction in the testing set. The paper implemented various data mining algorithms to build predictive models, including Gaussian Naive Bayes classifier, K-Nearest Neighbors (K-NN) algorithm, Multi-layer Perceptron (MLP), Logistic regression, random forest and XGBoost. After the experiments, XGBoost classifier could give the best result among all the models.
articleNo Access
Diabetes mellitus prediction: An efficient pipeline of data imputation and oversampling
- Neha Rajawat,
- Bharat Singh Hada,
- Soniya Lalwani, and
- Rajesh Kumar
International Journal of Modeling, Simulation, and Scientific Computing10 Jun 2022
Preview Abstract
Diabetes is a chronic disease which indicates the high level of body glucose level. As per the World Health Organization (WHO), 422 million people were diabetic until 2014. This paper develops an accurate classification machine learning model and an efficient usage of data pre-processing pipeline to improve overall accuracy. For the purpose, six algorithms: Support Vector Machine with Linear kernel (Linear-SVM), Support Vector Machine with RBF kernel (RBF-SVM), K-Nearest Neighbor (KNN), Artificial Neural Network (ANN), Decision Tree and Random Forest are used for classification purpose and their comparative accuracy is analyzed. Data Imputation, Oversampling and Feature scaling techniques are the constituents of Data preprocessing pipeline. Experiments are performed on a well-known dataset of National Institute of Diabetes and Digestive and Kidney Diseases, the PIMA diabetes dataset. The data preprocessing techniques, data imputation and Synthetic Minority Oversample Technique (SMOTE) analysis improved classification accuracy from 77% on raw data, to 88.12% (on Random Forest Classifier) and 91% (on ANN Classifier), respectively. Furthermore, a new feature generation approach is applied and performance is analyzed using the SVM model. Original data attributes BMI and Insulin are replaced with new features BMI_NORMAL and INSULIN_NORMAL, respectively. The significant improvement by proposed technique is confirmed by statistical testing followed by post-hoc analysis.
articleNo Access
The Modeling and Simulation of Data Clustering Algorithms in Data Mining with Big Data
- Weiru Chen,
- Jared Oliverio,
- Jin Ho Kim, and
- Jiayue Shen
Journal of Industrial Integration and Management01 Mar 2019
Preview Abstract
Big Data is a popular cutting-edge technology nowadays. Techniques and algorithms are expanding in different areas including engineering, biomedical, and business. Due to the high-volume and complexity of Big Data, it is necessary to conduct data pre-processing methods when data mining. The pre-processing methods include data cleaning, data integration, data reduction, and data transformation. Data clustering is the most important step of data reduction. With data clustering, mining on the reduced data set should be more efficient yet produce quality analytical results. This paper presents the different data clustering methods and related algorithms for data mining with Big Data. Data clustering can increase the efficiency and accuracy of data mining.
chapterNo Access
Noise-filtration in convolutional neural network using rough set based KNN for image classification
Developments of Artificial Intelligence Technologies in Computation and Robotics13 Aug 2020
Preview Abstract
The Convolutional Neural Networks (CNN’s) have made incredible progress in numerous research areas. However, the exponential development of digital images causes over-burdening due to irrelevant features, heavy redundancy and noisy data. Hence, affecting the processing speed of the CNN and its classification accuracy. In this study, a novel reduction algorithm using rough set theory with no information loss has been proposed as a data pre-processor for CNN. The proposed algorithm is reducing the data by feature reduction and noisy sample reduction. The Rough set could sufficiently choose the noisy boundary samples to take out based on KNN rules, having mislabeled classes. Experiments have been performed to demonstrate the proposed which can increase the overall performance of convolutional neural networks.

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Narrow Results

Filters Applied

Publication Type

Article Type

Publication Date

Author

Publication/Book Series

Subjects

Access

Consolidating Heterogeneous Enterprise Data for Named Entity Linking and Web Intelligence

West Nile Virus Prediction Based on Data Mining

Diabetes mellitus prediction: An efficient pipeline of data imputation and oversampling

The Modeling and Simulation of Data Clustering Algorithms in Data Mining with Big Data

Noise-filtration in convolutional neural network using rough set based KNN for image classification