[in Journal: Journal of Bioinformatics and Computational Biology] AND [Keyword: Genomic] : Search

Anywhere

Advanced Search

SEARCH GUIDE

Results: 1 - 1of1

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
[in Journal: Asia-Pacific Biotech News] AND [Keyword: Glucose] (2)	27 Mar 2025	Run
[in Journal: Journal of Bioinformatics and Computational Biology] AND [Keyword: Gen... (1)	27 Mar 2025	Run

articleNo Access
CLASSIFICATION OF LARGE MICROARRAY DATASETS USING FAST RANDOM FOREST CONSTRUCTION
Journal of Bioinformatics and Computational Biology01 Apr 2011
Preview Abstract
Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.