This book furthers new and exciting developments in experimental designs, multivariate analysis, biostatistics, model selection and related subjects. It features articles contributed by many prominent and active figures in their fields. These articles cover a wide array of important issues in modern statistical theory, methods and their applications. Distinctive features of the collections of articles are their coherence and advance in knowledge discoveries.
https://doi.org/10.1142/9789812567765_fmatter
The following sections are included:
https://doi.org/10.1142/9789812567765_0001
Kai-Tai Fang was born in 1940 in Taizhou, Jiangsu province, China. He received his secondary education at the renowned Yangzhou High School in Jiangsu. In 1957, Kai-Tai entered Peking University to read mathematics and in 1963 he undertook graduate studies at the Institute of Mathematics, Academia Sinica, Beijing.
https://doi.org/10.1142/9789812567765_0002
Some names of significance popped up during my interview with Professor Kai-Tai Fang. Significant, because they are the names of those whose academic interactions with Kai-Tai have sparked off fruitful collaborations that have impacted on the development of the statistical field in different ways. I took the liberty of contacting some of these scholars, who themselves are key players in the global statistical circle, to invite them to say a few words about Kai-Tai at the onset of his retirement. The invitation was met with immediate and enthusiastic responses.
https://doi.org/10.1142/9789812567765_0003
The following sections are included:
https://doi.org/10.1142/9789812567765_0004
A pivotal concept in quasi-Monte Carlo (QMC) methods is the total variation, in the sense of Hardy and Krause, of a multidimensional function. The Hardy and Krause variation in turn is based on the Vitali variation. This paper presents an account of variation for QMC. It includes a perhaps surprising find that a bounded function arising in financial valuation has infinite variation. Also included is a previously unpublished low variation function extension method due to Sobol'.
https://doi.org/10.1142/9789812567765_0005
We investigate the higher order power properties for a very general class of empirical discrepancy statistics. This class includes the Cressie-Read discrepancy statistics and, in particular, the empirical likelihood ratio statistic. Under the criterion of average local power along spherical contours, it is seen that these competing statistics can be discriminated at the third order of comparison and that the Pearsonian chi-square statistic tends to have an edge over others.
https://doi.org/10.1142/9789812567765_0006
The goal of the paper is two-fold. We first survey the available methods for modelling multivariate volatility processes. We then propose a new and simple method with numerical illustration.
https://doi.org/10.1142/9789812567765_0007
Estimation, testing and robustness are three common problems in two-level structural equation models. Liang & Bentler (2004a) developed the EM approach to estimation of model parameters and studied the asymptotic properties of the estimators. In a series of articles, Yuan and Bentler studied the problems of testing model fit and the robustness of tests and standard errors. This paper reviews some recent advances in research on the three problems, and provides some numerical comparisons between our methodology and selected existing ones.
https://doi.org/10.1142/9789812567765_0008
Copulas, as dependence measures of random variables, have wide applications in distribution theory, medical research, multivariate survival analysis, risk management, and other fields. The basic properties of copulas have been studied extensively in the literature. However, their geometric and topological properties, which are very important for properly characterizing the dependence patterns of random variables, have not as yet caught statisticians’ attention. In this paper, we shall study the geometric structures of copulas and the local dependence pattern of random variables. Important classes of copulas, such as the polynomial copulas, the piecewise linear and quadratic copulas are also investigated.
https://doi.org/10.1142/9789812567765_0009
The growth curve model (GCM) has been widely used in longitudinal studies and repeated measures. Most existing approaches for statistical inference in the GCM assume a specific structure on the within-subject covariances, e.g., compound symmetry, AR(1) and unstructured covariances. This specification, however, may select a suboptimal or even wrong model, which in turn may affect the estimates of regression coefficients and/or bias standard errors of the estimates. Accordingly, statistical inferences of the GCM may be severely affected by mis-specification of covariance structures. Within the framework of the GCM in this paper we propose a data-driven approach for modelling the within-subject covariance structures, investigate the effects of mis-specification of covariance structures on statistical inferences and study the possible heterogeneity of covariances between different treatment groups.
https://doi.org/10.1142/9789812567765_0010
In the paper Kotz-type elliptical distributions are examined. Expressions of multivariate moments and cumulants in matrix form are given as well as kurtosis characteristics. It is proved that marginal distribution of a Kotz distribution is a mixture of a normal and a Kotz distribution. Also basic formulae for matrix Kotz distribution are given and a simulation rule based on the Metropolis-Hastings algorithm is presented.
https://doi.org/10.1142/9789812567765_0011
Uniform design has become a standard tool in experimental design over the last decade. Uniform design is particularly powerful when the specific form of the response to differing factor levels is unknown. Different criteria such as L2-discrepancy and modifications thereof like the centered and wrap-around L2-discrepancies are used to assess the uniformity of U-type designs. Recent advantages in the analysis of these criteria allow for efficient calculation and the derivation of lower bounds. In this contribution, these results are used to construct two-step sequential U-type designs. The construction is based on the optimization heuristic threshold accepting. The results are compared with theoretical lower bounds and ad hoc heuristics for the generation of sequential designs.
https://doi.org/10.1142/9789812567765_0012
Granularity and perfect balance are defined and discussed for multiple factor designs. The granularity of a design is related to its discrepancy, an important concept in uniform experimental design. It indicates how fine a structure in the dependence of the response on the factors can be resolved. The balance of a design is similar to the resolution of fractional factorial designs, but it is defined for a much broader class of designs. The granularities and balance of various designs, including simple random designs, orthogonal arrays, digital nets, and integration lattices are compared. Two applications, the simple pendulum and blood glucose monitoring, are used to illustrate how granularity and balance can identify good designs.
https://doi.org/10.1142/9789812567765_0013
Computer models can describe complicated physical phenomena. To use these models for scientific investigation, however, their generally long running times and mostly deterministic nature require a specially designed experiment. Standard factorial designs are inadequate; in the absence of one or more main effects, their replication cannot be used to estimate error but instead produces redundancy. A number of alternative designs have been proposed, but many can be burdensome computationally. This paper presents a class of Latin hypercube designs developed from the rotation of factorial designs. These rotated factorial designs are easy to construct and preserve many of the attractive properties of standard factorial designs: they have equally-spaced projections to univariate dimensions and yield uncorrelated regression effect estimates (orthogonality). They also rate comparably to maximin Latin hypercube designs by the minimum interpoint distance criterion used in the latter's construction.
https://doi.org/10.1142/9789812567765_0014
Discrepancy is a kind of important measure used in experimental design. Recently, a so-called discrete discrepancy has been applied to evaluate the uniformity of factorial designs. In this paper, we review some recent advances on application of the discrete discrepancy to several common experimental designs and summarize some important results.
https://doi.org/10.1142/9789812567765_0015
When an orthogonal array (OA) of n rows is used as the design matrix in an experiment, n is the number of runs. In an OA of q levels, n is an integer multiple of q2. In an experiment, if the number of runs cannot be set exactly equal to the number of rows of an OA because of constraints in resources or other reasons, the experimenter may use a design matrix formed by omitting some rows of an OA. If such a design matrix is used, the number of observed response obtained may not be enough for estimation of all the effects corresponding to columns of the orthogonal array. A lean design is a design matrix formed by deleting some rows and columns of an OA, which still allows efficient estimation of the effects of the factors corresponding to the remaining columns of the OA. In this article, the authors discuss lean designs of 2 and 3 levels, and provide D-optimal OA's from which lean designs can be formed.
https://doi.org/10.1142/9789812567765_0016
The problem of interest is to estimate an integral (or the total) of a random field from observations at a finite number of sampling points. The quality of an estimator is measured by mean square error (MSE). Sampling points are appropriately selected so that the resulting MSE is as close to zero as possible. In this article, we begin by introducing the basic elements of estimation of integrals of random fields, including a review of the number-theoretic method. We then show how the number-theoretic method is related to the design of sampling points along with the sample mean estimator.
https://doi.org/10.1142/9789812567765_0017
Sensitivity analysis plays an important role to help engineers gain knowledge of complex model behaviors and make informed decisions regarding where to spend engineering effort. In design under uncertainty, probabilistic sensitivity analysis (PSA) is performed to quantify the impact of uncertainties in random variables on the uncertainty in model outputs. One of the most challenging issues for PSA is the intensive computational demand for assessing the impact of probabilistic variations. An efficient approach to PSA is presented in this article. Our approach employs the Kolmogorov-Smirnov (KS) distance to quantify the importance of input variables. The saddlepoint approximation approach is introduced to improve the efficiency of generating cumulative distribution functions (CDFs) required for the evaluation of the KS distance. To further improve efficiency, optimized uniform samples are used to replace the direct Monte Carlo simulations for determining the cumulant generating function (CGF) in saddlepoint approximation. Efficient construction of a uniform design necessary to generate the “best” samples in a multidimensional space is presented. Our approach is illustrated with a structural design problem. It has the potential to be the most beneficial for high dimensional engineering design problems that involve expensive computer simulations.
https://doi.org/10.1142/9789812567765_0018
This paper elaborates the kernel selection problem in the majorization framework by Zhang, Fang, Li & Sudjianto (2004) for experimental designs. For designs with qualitative factors, the row-wise coincidence distribution and its raw, central and factorial moments are studied. Under the effects hierarchy principle, two protocols are recommended to employ power and exponential kernels, which are shown equivalent to some classical criteria for fractional factorial designs and uniform designs, respectively. In addition, an extension of majorization framework is given to uniform designs with quantitative factors under wraparound discrepancy criterion.
https://doi.org/10.1142/9789812567765_0019
Variable selection is fundamental in high-dimensional statistical modeling. Many authors have proposed various variable selection criteria and procedures for linear regression models (Miller (2002)). Variable selection for survival data analysis poses many challenges because of complicated data structure, and therefore receives much attention in the recent literature. In this article, we will review various existing variable selection procedures for survival analysis. We further propose a unified framework for variable selection in survival analysis via a nonconcave penalized likelihood approach. The nonconcave penalized likelihood approach distinguishes from the traditional variable selection procedures in that it deletes the non-significant covariates by estimating their coefficients as zero. With a proper choice of the penalty function and the regularization parameter, we demonstrate the resulting estimate possesses an oracle property, namely, it performs as well as if the true submodel were known in advance. We further illustrate the methodology by a real data example.
https://doi.org/10.1142/9789812567765_0020
Since the pioneer work of Thomas & Grunkemeier (1975) and Owen (1988), empirical likelihood has been developed as a powerful nonparametric inference approach and become popular in statistical literature. There are many applications of empirical likelihood in survival analysis. In this paper, we present an overview of recent developments of empirical likelihood methods for survival data. In particular, we discuss empirical likelihood results for a general mean functional of the distribution function, a functional of the hazard function, the Cox proportional hazards model, and a semiparametric accelerated failure time model.
https://doi.org/10.1142/9789812567765_0021
In cancer drug development, demonstrated efficacy in tumor xenograft experiments on severe combined immunodeficient mice who are grafted with human tumor tissues or cells is an important step to bring a promising compound to human. These experiments also demonstrated a good correlation in efficacy with clinical outcomes. A key outcome variable is tumor volumes measured over a period of time, while mice are treated with certain treatment regimens. To analyze such data from xenograft experiments and evaluate the efficacy of a new drug, some statistical methods have been developed in literature. However, a mouse may die during the experiment or may be sacrificed when its tumor volume reaches a threshold. A tumor may be suppressed its tumor burden (volume) may become undetectable for some time but regrow and its tumor burden (volume) may become (e.g., < 0.01cm3) undetectable at times. Thus, incomplete repeated measurements arise. Because of the small sample sizes in these experiments, asymptotic inferences are usually questionably. In addition, were the tumor-bearing mice not treated, the tumors would keep growing until the mice die or are sacrificed. This intrinsic growth of tumor in the absence of treatment constrains the parameters in the statistical model and causes further difficulties in statistical analysis. In this paper, we review the recent advance in statistical inference accounting for these statistical challenges. Furthermore, we develop a multivariate random effects model with constrained parameters for multiple tumors in xenograft experiments. A real xenograft study on the antitumor agent exemestane, an aromatase inhibitor, combined with tamoxifen against the postmenopausal breast cancer is analyzed using the proposed methods.
https://doi.org/10.1142/9789812567765_0022
We consider estimation of secondary parameters following a group sequential test, with stopping regions determined by testing hypotheses concerning a set of primary parameters. We derive statistics that are jointly sufficient for the primary and secondary parameters and show that the maximum likelihood estimators remain unchanged but no longer possess unbiasedness and minimum variance. We construct bias-reduced and unbiased estimators for the vector of secondary parameters and show them to substantially reduce the bias and improve the precision of estimation.
https://doi.org/10.1142/9789812567765_0023
We consider the problem of model (or variable) selection in the linear regression model based on M-estimation and cross-validation with an added penalty term for penalizing overfitting. Under some conditions, the new criterion is shown to be strongly consistent in the sense that with probability one, for all large n, the criterion chooses the smallest true model. The penalty function denoted by Cn depends on the sample size n and is chosen to ensure the consistency in the selection of the smallest true model. There are various choices of Cn suggested in the literature on model selection. In this paper we show by simulation that for small and medium size samples, a particular choice of Cn based on observed data, which makes it random, provides satisfactory performance compared with fixed choices of Cn.
https://doi.org/10.1142/9789812567765_0024
Donoho & Johnstone (1994) threshold a sequence of random variables, e.g., wavelet coefficients using a same thresholding parameter. In this paper, we attempt to extend their thresholding technologies to threshold a sequence of random variables using order-dependent thresholding parameters. Some insights about the thresholding estimates are discussed. Applications of the proposed methodologies to regression splines are investigated. A simple extension to nonparametric additive models is briefly discussed. Illustrations are made via applying the methodologies to two real data sets. Simulations are conducted to assess the methodologies empirically.
https://doi.org/10.1142/9789812567765_0025
For a power transformation family (xλ – 1)/λ, Box & Cox (1964) proposed to use the pseudo maximum likelihood estimation method to determine a value of λ. This approach is intended to transform the non-normal data to normal data and then construct the t-test based on the transformed data in the two-sample problem. In this article, a new procedure of estimating λ, hence a new transformed t-test, is proposed. The idea of the new approach is to select directly a value of λ such that the transformed t-test is as close to the t-distribution as possible. Simulation results show that the new transformed t-test is appropriate to use a t-distribution to determine a threshold and much more powerful than the Box-Cox transformed t-test when the model is nearing a transformed normal model, while two approaches are comparable if the model is far-fetched for transformation to normality.
https://doi.org/10.1142/9789812567765_0026
Suppose that (X′,U′)′/σ has a spherically symmetric distribution about (θ′,0′)′, where X and U are p × 1 and m × 1 random vectors, respectively, θ′ = (θ1,…,θp) is an unknown vector and σ is an unknown scale. Under the loss function f(∥δ−θ ∥2/σ2), where f(t) is a nondecreasing concave function of t, Brandwein & Strawderman (1991a) have investigated conditions under which estimators of the form X + αU′Ug(X) dominate X. Their technique requires that tα f′(t) is a nondecreasing function of t for some α ∈ (0, (p − 2)/2). Because of this assumption, their bound on a depends on a which is related to the loss function f. This paper, without making the monotone assumption on the function tα f′(t), investigates the dominance conditions of the estimators X + αU′Ug(X) and obtains a bound of a which is independent of the loss function. Examples related to this problem are also considered.
https://doi.org/10.1142/9789812567765_bmatter
The following sections are included: