Please login to be able to save your searches and receive alerts for new content matching your search criteria.
We describe an approach to data analytics on large systems using a suite of robust parallel algorithms running on both clouds and HPC systems. We apply this to cases where the data is defined in a vector space and when only pairwise distances between points are defined. We introduce improvements to known algorithms for functionality, features and performance but review state of the art as this is not broadly familiar. Visualization is valuable for steering complex analytics and we discuss it for both the non vector semi-metric case and for clustering high dimension vector spaces. We exploit deterministic annealing which is heuristic but has clear general principles that can give reasonably fast robust algorithms. We apply methods to several life sciences applications.
In this chapter, we will show that the asymptotic theory for linear regression models with IID observations carries over to ergodic stationary linear time series regression models with Martingale Difference Sequence (MDS) disturbances. Some basic concepts in time series analysis are introduced, and some tests for serial correlation are described.
Conditional probability distribution models have been widely used in economics and finance. In this chapter, we introduce two closely related popular methods to estimate conditional distribution models—Maximum Likelihood Estimation (MLE) and Quasi-MLE (QMLE). MLE is a parameter estimator that maximizes the model likelihood function of the random sample when the conditional distribution model is correctly specified, and QMLE is a parameter estimator that maximizes the model likelihood function of the random sample when the conditional distribution model is misspecified. Because the score function is an MDS and the dynamic Information Matrix (IM) equality holds when a conditional distribution model is correctly specified, the asymptotic properties of MLE is analogous to those of the OLS estimator when the regression disturbance is an MDS with conditional homoskedasticity, and we can use the Wald test, LM test and Likelihood Ratio (LR) test for hypothesis testing, where the LR test is analogous to the J · F test statistic. On the other hand, when the conditional distribution model is misspecified, the score function has mean zero, but it may no longer be an MDS and the dynamic IM equality may fail. As a result, the asymptotic properties of QMLE are analogous to those of the OLS estimator when the regression disturbance displays serial correlation and/or conditional heteroskedasticity. Robust Wald tests and LM tests can be constructed for hypothesis testing, but the LR test can no longer be used, for a reason similar to the failure of the F-test statistic when the regression disturbance displays serial correlation and/or conditional heteroskedasticity. We discuss methods to test the MDS property of the score function, and the dynamic IM equality, and correct specification of a conditional distribution model.