This book is a collection of selected refereed papers presented at the International Conference on Statistics, Combinatorics and Related Areas, and the Eighth International Conference of the Forum for Interdisciplinary Mathematics. It includes contributions from eminent statisticians such as Joe Gani, Clive Granger, Chris Heyde, R Nishii, C R Rao, P K Sen and Sue Wilson. By exploring and investigating deeper, these papers enlarge the reservoir in the represented areas of research, such as bioinformatics, estimating functions, financial statistics, generalized linear models, goodness of fit, image analysis, industrial data analysis, multivariate statistics, neural networks, quasi-likelihood, sample surveys, statistical inference, stochastic models, and time series.
https://doi.org/10.1142/9789812776372_fmatter
PREFACE.
CONTENTS.
https://doi.org/10.1142/9789812776372_0001
In this paper, we have proposed efficient unbiased estimators of three types - direct, synthetic and composite - for the domain mean (or total), and compared them with certain customary ones to establish their viability, superior performance and practicability.
https://doi.org/10.1142/9789812776372_0002
We consider the problem of constructing nonlinear regression models, using multilayer perceptrons and radial basis function network with the help of the technique of regularization. Crucial issues in the model building process are the choices of the number of basis functions, the number of hidden units and a regularization parameter. We consider the properties of nonlinear regression modeling based on neural networks, and investigate the performance of model selection criteria from an information-theoretic point of view.
https://doi.org/10.1142/9789812776372_0003
The Pearson chisquared test of fit for discrete distributions may be improved by partitioning the test statistic into useful components. Examples will be given to demonstrate the improvement possible. A brief simulation study will look at the small sample distribution of components for testing for the Poisson distribution. This paper updates material in Rayner and Best (1989).
https://doi.org/10.1142/9789812776372_0004
In traditional finance theory, there is the underlying assumption that risk in both cash and futures markets is constant over time, thus disregarding the possible dynamic (time-varying) nature of the distribution of the asset returns. The implication of such an assumption is that the resultant hedge ratio will be constant through the hedging period. This article examines the criteria under which different hedging strategies are optimal from the perspective of a bona-fide hedger. The ability of dynamic approaches to minimise risk is compared to the static procedures of naive and conventional hedging. It is shown that if the hedge ratios are unstable, allowance for such stochastic movements will significantly increase hedging effectiveness by reducing the volatility of the hedged portfolio. Noting that the forecasted hedge ratio provides a constant alternative to the conventional hedge, a hedging rule is determined which enables a comparison of two constant hedge ratios, bypassing the need for transaction cost considerations. It is shown that the conventional hedge does not necessarily provide the most effective constant hedge.
https://doi.org/10.1142/9789812776372_0005
The composition of a transect through a two-phase particle has a probability distribution which depends on the shape, composition and phase structure within the particle, as well as the type of randomness used to generate the transect. This paper will illustrate how the transect composition can be simulated in Mathematica for some simple geometric models. In particular, the liberation index suggested by Davy (1984) will be evaluated for two-dimensional models, thereby extending the results of Coleman (1991).
https://doi.org/10.1142/9789812776372_0006
The solution to many problems in applied probability requires the evaluation of Riemann-Stieltjes integrals involving the convolution of cumulative distribution functions. Closed form expressions for the solution are very rare indeed. In this paper we examine the evaluation of the Stieltjes integral, which plays a fundamental role in the numerical solution of Volterra-Stieltjes integral equations that appear frequently in renewal theoretic problems. The evaluation of Stieltjes integrals on which this paper concentrates is thus, it is argued, germane to the solution of such problems. A generalised trapezoidal rule is utilised and a priori error bounds are determined in the current development.
https://doi.org/10.1142/9789812776372_0007
This paper gives some comparisons of statistical clustering and classification methods applied to transcriptional profiling data obtained from a microarray experiment. A number of such techniques are illustrated using a well known data set on the yeast genome.
https://doi.org/10.1142/9789812776372_0008
Boosting, bagging and other ensemble classifiers are all based on the idea of fitting multiple classification rules and then applying a voting scheme to determine a consensus. In some methods, the individual classifiers are trained sequentially, so that the occurrence of misclassification errors in earlier iterations can be used to adapt later iterations. It is therefore of interest to consider the sequences of correct and incorrect classifications for individual observations. The edge, or in other words the voting weight assigned to incorrect classes, can be evaluated for each observation after each iteration. A scatter plot based on the mean and variance of the edge over all iterations turns out to be a useful diagnostic tool.
https://doi.org/10.1142/9789812776372_0009
Numerical evaluation of the Cauchy principal value (CPV) integral, is encountered in many areas of Applied Mathematics such as the crack problems in plane elasticity, the singular eigenfunction method in neutron transport, in airfoil theory, in electromagnetic scattering in waveform encoding of binary signals, in visualization of cardiovascular velocity maps and in demultiplexing of interferometrically interrogated fiber Bragg Gratings etc. In the present paper, by the use of Hermite-Hadamard inequality for convex functions, we establish some error bounds for the approximation of the CPV integral of a differentiable function whose derivative is convex. Some numerical experiments are performed as well.
https://doi.org/10.1142/9789812776372_0010
This paper extends some earlier random allocation models where either susceptibles or infectives, but not both, among needle sharing IVDUs are nonhomogeneous. We begin with a brief review of earlier results, illustrated by an example of vaccination. We then consider the case where there are two types of susceptibles and two types of infectives. This leads to unwieldy algebra; a model with successive waves of infectives exchanging needles with two types of susceptibles is outlined.
https://doi.org/10.1142/9789812776372_0011
Fractionally integrated processes are a class of linear processes which lay between stationary autoregressive and unit root series in terms of their properties. Recent work on transformations of random walks have not been completely extended to I(d) series, but some results are available and are presented. It is noted that if the input shocks to the models have a positive mean, the process will have a non-linear trend in mean. Most economic "examples" of I(d) processes are positive series and so should have such trends, but as these are not observed, it follows that long-memory models hardly ever occur in economic. An alternative is stationary models with breaks.
https://doi.org/10.1142/9789812776372_0012
We focus on a test for a two-component Normal mixture model f by assessing its "closeness", in terms of L2 distance, to a Normal distribution fN. If the L2 estimate is in a specified "indifference zone", the two distributions are asserted to be "close enough" and by parsimony fN is preferred over f. This approach is applicable in both Bayesian and frequentist frameworks and is easily generalised to a test for more than two components in a mixture. Results are extended to model selection using predictive densities under a Bayesian approach and are illustrated by example.
https://doi.org/10.1142/9789812776372_0013
A decision analysis framework for risk models for health state transitions and valuations is described. Emphasis is placed on the sensitivity of choices among treatments to the absolute level of risk and the risk profile of the individual. The uses of Clinical Trials databases to inform decision aids is considered. We employ Markov models with Q-TWiST valuation of health states histories and examine the requirements for effective valuations of treatment alternatives for the individual.
https://doi.org/10.1142/9789812776372_0014
Moberg, et al. (1980) adaptive M-estimation procedure depends heavily on the measures of skewness and kurtosis. The sensitivity of Moberg et al. procedure to the skewness measure was shown using a particular bimodal distribution for errors in linear regression. Properties of this particular distribution have been exploited and investigated; for this type of errors distribution we developed an alternative procedure for choosing the appropriate classification function. Results and performance of the procedure are presented via simulation.
https://doi.org/10.1142/9789812776372_0015
In epidemiological studies, observed data are often collected subject to misclassification errors. In this paper, we discuss the Bayesian estimation for contingency tables with misclassification errors. Employing the exact Bayesian computations to obtain posterior means as estimates, we are faced with computational difficulties. In order to find the posterior distribution, we apply the data augmentation(DA) algorithm to misclassified categorical data.
https://doi.org/10.1142/9789812776372_0016
The aim of this paper is to establish the well-posedness of the global solution of an initial value problem for the damped Boussinesq equation
https://doi.org/10.1142/9789812776372_0017
The Wald statistic for testing equality of corresponding variances from multiple covariance matrices is introduced. The test is asymptotically optimal and derived under the assumption of independent and Normally distributed parent populations. The empirical size of the test in small samples is investigated via a Monte Carlo study. Use of the statistic is illustrated with a motivating example.
https://doi.org/10.1142/9789812776372_0018
Consider contextual data fusion of multispectral spatial imagery with different spatial resolutions. We introduce a new method for image enhancement based on Markov random fields so as to take contextual information into account. The low-resolution data are corrected by the high-resolution data and by the clustering result of the pixels. Our method is applied to geo-spatial data sensed by an artificial satellite, and shows a good performance.
https://doi.org/10.1142/9789812776372_0019
We consider Markov chains of order d that, satisfy a conditional constraint of the form E(aϑ (Xi-1, Xi) | Xi-1) = 0, where Xi-1 = (Xi-1, …, Xi-d). These comprise quasi-likelihood models and nonlinear and conditionally heteroscedastic autoregressive models with martingale innovations. Estimators for ϑ can be obtained from estimating equations . We review different criteria for choosing good weights Wϑ(Xi-1). They usually lead to weights that depend on unknown features of the transition distribution and must be estimated. We compare the approach via estimating functions with other ways of constructing estimators for ϑ, and discuss efficiency of the estimators in the sense of Hájek and LeCam. Analogous comparisons may be made for regression models.
https://doi.org/10.1142/9789812776372_0020
This paper introduces a class of density estimators having both of parametric part and nonparametric factor. A plug-in parametric estimator is seen as an initial guess of the true density, and the proposed estimator is built up by adding a nonparametric adjustment factor and the initial estimator. Asymptotic theory is developed, and comparisons with the traditional kernel estimator and a multiplicative estimator are also reported.
https://doi.org/10.1142/9789812776372_0021
Box and Draper (1974) reported five m-factor D-optimal minimum-point second order designs for m = 2, …, 5. The m coordinates of each of the n design points lie in the interval [-1,1]. This paper describes a method of constructing these designs and compares new designs with those of Box and Draper.
https://doi.org/10.1142/9789812776372_0022
The mathematical foundations of statistics as a separate discipline were laid by Fisher, Neyman and Wald during the second quarter of the last century. Subsequent research in statistics and the courses taught in the universities are mostly based on the guidelines set by these pioneers. Statistics is used in some form or other in all areas of human endeavor from scientific research to optimum use of resources for social welfare, prediction and decision-making. However, there are controversies in statistics, especially in the choice of a model for data, use of prior probabilities and subject-matter judgments by experts. The same data analyzed by different consulting statisticians may lead to different conclusions.
What is the future of statistics in the present millennium dominated by information technology encompassing the whole of communications, interaction with intelligent systems, massive data bases, and complex information processing networks? The current statistical methodology based on simple probabilistic models developed for the analysis of small data sets appears to be inadequate to meet the needs of customers for quick on line processing of data and making the information available for practical use. Some methods are being put forward in the name of data mining for such purposes. A broad review of the current state of the art in statistics, its merits and demerits, and possible future developments will be presented.
https://doi.org/10.1142/9789812776372_0023
A generic approach to the comparison of treatments in many experimental designs is to exhibit the data in a contingency table that is at least partially ordered, construct a model for the table of counts, and derive an omnibus test statistic and its components. For several important experimental designs the components are asymptotically chisquared distributed and assess moment differences between treatments. Often the initial components are the basis for well-known nonparametric rank tests such as the Kruskal-Wallis and Spearman tests. Further decompositions of the components to obtain LSD comparisons are easily derived. We usually recommend combining the higher moment components into a residual. The approach is demonstrated for three designs.
https://doi.org/10.1142/9789812776372_0024
The maximum likelihood estimation of correlation matrix under order restrictions among correlations is treated. Two maximization process:(A) maximization with respect to correlation matrix and (B) maximization with respect to variance are considered. For the maximization process (A), we generate uniformly distributed random correlation matrices on the hypothesis space by Gibbs sampling. In the maximization process (B), we show that the maximum point is the fixed point of the iterative application of a certain non-linear function and the convergence of the process (B) is proved. A simulation result is given which compares the relative errors of the m.l.e. and other competitors.
https://doi.org/10.1142/9789812776372_0025
Consider a production line in which a production item is failed with probability p and it is then repaired. We consider two repair disciplines I and II. For each discipline we find the p.g.f. of the steady state size of the system at the moment of departure of the customer. For discipline II, we show that the closed form of the p.g.f. depends on solving a functional equation that is the Laplace-Stieltjes transform of the service time in the main queue. Through an example, we compare the two disciplines in terms of specified performance measure.
https://doi.org/10.1142/9789812776372_0026
In bioinformatics with special emphasis on computational biology, genomic science, polygenic models, and computational sequence analysis, principles of molecular genetics (biology) provide room for stochastics to comprehend the basic differences between mathematical exactness and biological diversity. With a large number of sites or loci having mostly categorical (qualitative) responses and imprecise dependence patterns, standard (discrete or continuous) multivariate statistical modeling and analysis may encounter roadblocks of various kinds. Limitations of likelihoods and their variants are appraised and contrasted with the inadequacy of the knowledge discovery and data mining approach that exploits mainly computational algorithms. Alternative approaches that take into account underlying biological implications to a greater (and parametrics to a lesser) extent are appraised in the light of validity and robustness considerations.
https://doi.org/10.1142/9789812776372_0027
Monitoring processes with data collected from spatial systems is a common need in industry. This paper outlines an approach commonly used in geostatistics, namely Universal Kriging (Cressie, 1993), for modelling spatial trends. The fitted spatial models together with their standard errors are then used to establish control limits for monitoring changes in spatial trends. Also the QQ-plot and related tests are used to signal processes that are out-of-control. These methods are applied to the process of constructing concrete road pavements.
https://doi.org/10.1142/9789812776372_0028
A key element in the design of a repeated sample survey is the rotation pattern, which affects the variability of the time series of survey estimates and the seasonally adjusted and trend estimates produced from them. This paper considers the choice of rotation pattern for seasonally adjusted and trend estimates obtained from a repeated survey, using X11 based methods.
https://doi.org/10.1142/9789812776372_0029
This paper is concerned with the interactions between periodic tidal waves and coastal aquifers. The analysis is based on a set of equations governing the hydrodynamic interactions near coastal regions consisting of a modified Fokker-Planck equation for modelling the transport of salts, a Boussinesq equation for groundwater flow and a wave equation for tidal waves. The modified Fokker-Planck equation is featured with a time- and scale-dependent dispersivity, and solutions of the Boussinesq equation are obtained subject to phase-modulated tidal waves as the boundary condition which is a solution of the wave equation. The phase-modulated periodic solution of the wave equation is shown to satisfactorily simulate the uneven twin peaks of semi-diurnal tides. Fourier series solutions of the Boussinesq equation subject to periodic boundary conditions reveal that the tidal waves damp towards land, and the half amplitude of the tide above the mean sea level is greater than that below it. Seawater-freshwater interface is also analysed using the modified Fokker-Planck equation.
https://doi.org/10.1142/9789812776372_0030
The paper describes stage 1 of the development of a Health Module, within a dynamic microsimulation model simulating individuals over the lifecourse. The enhanced model accounts for Australians' health and mortality by socio-economic status. In developing the Module we found that household-based health surveys were unsatisfactory, as they excluded the very sick who resided in institutions. We thus chose the disability surveys which cover institutions as well as households. Individuals' disability status was used as a proxy for their health status, and disability and life expectancy were linked mathematically. Amongst the available indicators of socio-economic status, the geographically based index of socio-economic disadvantage was found to underestimate health inequalities, while an individual-level index based on family cash incomes was found to overestimate these. To obtain more accurate estimates, we propose indexes accounting for both cash income and assets. The paper also describes a range of analyses that could be attempted once that Module is completed. Such studies could for example compare the demographic, labour force, financial and distributional impacts of policies that aimed to lower health inequalities - and do that more accurately and at a much greater level of detail than was possible previously.
https://doi.org/10.1142/9789812776372_0031
Multivariate abundances are commonly collected in ecology and related disciplines. The main difficulties in modelling such data are that the number of variables is usually larger than the number of samples, and that the distribution of measurement variables is typically strongly right-skewed with a positive weight at zero. We consider methods of hypothesis testing for multivariate abundances, where permutation tests (permuting rows of the data matrix) are applicable. A number of such tests have been proposed in the ecological literature. We suggest alternative methods, and compare the different approaches using P-values from 20 datasets extracted from the ecological literature. All P-values were calculated using permutation tests. Statistics based on generalised linear models performed poorly in unbalanced designs, but other approaches provided comparable or more encouraging results than the test statistics presently used in ecology.
https://doi.org/10.1142/9789812776372_0032
A linear time invariant state space model is proposed for the production and decay of two epimers (R) and (S) of a hopane released from oil bearing rock during laboratory pyrolysis. Concentrations of R and S are measured over time. The parameters to be estimated are: the initial amounts of precursors X, for R, and Y, for S; the rate constants for the production of R and S; the rate constants for the decay of R and S; and the rate constants for the two-way epimerization between both X and Y and R and S. It is shown that the model is locally identifiable. The parameters are estimated by numerical integration of the rate equations, alternated with a derivative free least squares constrained, optimisation routine. Asymptotic standard errors and covariances of parameters are given and compared with those obtained from a resampling approach (parametric bootstrap). An alternative fitting procedure, based on estimating derivatives of the concentrations of R and S by fitting splines, is implemented and compared with that based on integration of the rate equations. The rate constants are important for elucidation of the reaction pathways, and the estimates of initial concentrations of X and Y have potential for inferring the yields of oil bearing rock.
https://doi.org/10.1142/9789812776372_0033
The prevalent paradigm for the analysis of common human disease assumes that a single gene is largely responsible for individual disease risk. The consequence of examining each gene as though it were solely responsible for conferring disease risk when in fact that risk is contingent upon interactions with other disease loci has not been considered. So, a general genetic model to analyse data for two marker loci has been developed. Based on this model it is shown that results can vary markedly depending on the parameters associated with the "unidentified" disease gene. In particular it is found that if parameters associated with the second gene were to vary between studies, then the conclusions from those studies may also vary. This is a theoretically broad conclusion with important implications for interpreting different results from individual genome studies and comparing results between studies.
https://doi.org/10.1142/9789812776372_0034
In the past years statistical process control methodologies have been widely used in industry for process monitoring. However, the typical assumption that the process data are statistically independent is often invalid. This article discusses different approaches for dealing with process autocorrelation when using process control charts and process capability indices.