World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

DATA FUSION IN SEVERAL ALGORITHMS

    https://doi.org/10.1142/S1793536913500143Cited by:6 (Source: Crossref)

    Data fusion consists of the process of integrating several datasets with some common variables, and other variables available only in partial datasets. The main problem of data fusion can be described as follows. From one source, having X0 and Y0 datasets (with N0 observations by multiple x and y variables, n and m of those, respectively), and from another source, having X1 data (with N1 observations by the same nx-variables), we need to estimate the missing portion of the Y1 data (of size N1 by m variables) in order to combine all the data into one set. Several algorithms are considered in this work, including estimation of weights proportional to the distances from each ith observation in the X1 "recipients" dataset to all observations in the X0 "donors" dataset. Or we can use a sample balancing technique with the maximum effective base performed by applying ridge-regression for the Gifi system of binaries obtained from the x-variables for the best fit of the "donors" X0 data to the margins defined by each respondent in the "recipients" X1 dataset. Then the weighted regressions of each y in the Y0 dataset by all variables in the X0 are constructed. For each ith observation in the dataset X0, these regressions are used for predicting the y-variables in the Y1 "recipients" dataset. If X and Y are the same n variables from different sources, the dual partial least squares technique and a special regression model with dummies defining each of the three available sets are used for prediction of the Y1 data.