Open Access

Forecasts of Residential Real Estate Price Indices for Ten Major Chinese Cities through Gaussian Process Regressions

Bingzi Jin

https://orcid.org/0009-0005-1620-7772

Advanced Micro Devices (China) Co., Ltd., Shanghai, P. R. China

E-mail Address: bingzijin@gmail.com

Search for more papers by this author

and

Xiaojie Xu

https://orcid.org/0000-0002-4452-1540

North Carolina State University, Raleigh, NC 27695, USA

E-mail Address: xxu6@alumni.ncsu.edu

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S2810943024500136Cited by:7 (Source: Crossref)

Abstract

Due to the rapid growth of the Chinese housing market over the past ten years, forecasting home prices has become a crucial issue for investors and authorities alike. In this research, utilising Bayesian optimisations and cross validation, we investigate Gaussian process regressions across various kernels and basis functions for monthly residential real estate price index projections for ten significant Chinese cities from July 2005 to April 2021. The developed models provide accurate out-of-sample forecasts for the ten price indices from May 2019 to April 2021, with relative root mean square errors varying from 0.0207% to 0.2818%. Our findings could be used individually or in combination with other projections to formulate theories about the trends in the residential real estate price index and carry out additional policy analysis.

Keywords:

Introduction

The last ten years have seen a significant expansion of the Chinese real estate sector. Issues with predicting real estate prices have undoubtedly grown to be among the top concerns for investors and politicians. Understanding real estate price trends and fluctuations is crucial because they directly affect people’s decisions about where to live and how to invest in real estate, as well as the development and execution of regulatory agencies’ policies. Of course, many consumers and providers of projections are interested in predicting real estate prices.

Many academics and professionals are interested in making accurate and reliable predictions of financial and economic time-series data. For various forecasting applications, a wide variety of their extensions and adaptations as well as certain crucial time series techniques, such as the autoregressive (AR), vector autoregressive (VAR), and vector error correction (VEC) models, have been researched (Jin and Xu, 2024b; Yang et al., 2018). Neural networks, Gaussian process regressions, support vector regressions, regression trees, random forests, nearest neighbours, deep learning, ensemble learning, boosting, and bagging are just a few examples of the many machine learning techniques that have recently been found to be effective and promising solutions to a variety of real estate price forecasting problems. These assessments, while not exhaustive, are generally consistent with a variety of empirical studies on the adoption of machine learning techniques (Alade et al., 2021) for forecasting in finance and economics (Yang et al., 2008), and the neural network model appears to be one of the most widely used methods (Jin and Xu, 2024a,n) for predicting prices for real estate properties. However, the Gaussian process regression is not thoroughly investigated for predictions of time-series data from the real estate price index.

Neal’s research on Bayesian learning for neural networks is the basis for a unique regression technique (Neal, 2012). Since the technique depends on priors over functions of Gaussian processes, it is suitable for modelling noisy data. It was shown that at the limit of an infinite network, many different kinds of neural network-based Bayesian regression models would converge to Gaussian processes (Neal, 2012; Jin and Xu, 2024h,i). The Gaussian process has been effectively used in regressions to simulate both noisy (Jin and Xu, 2024l) and noise-free (Neal, 1997; Jin and Xu, 2024m) data. In their study, Brahim-Belhouari and Vesin (2001) compared radial basis function neural networks and Gaussian processes for forecasting issues involving stationary time-series data and found that Bayesian learning produces better prediction outcomes. Brahim-Belhouari and Bermak’s (2004) research suggests that it is advantageous to examine various covariance functions, which is the approach used in this study, and that prediction techniques based on Gaussian processes may be used to successfully resolve forecasting issues for non-stationary time-series data. Radial basis function neural networks are outperformed by Gaussian process regressions, according to the research by Brahim-Belhouari and Bermak (2004). Additionally, the precise matrix operations used to integrate the prior and noise models are what give the Gaussian process formulation its value and advantage (Brahim-Belhouari and Bermak, 2004). Similar to how we would utilise model averaging, Brahim-Belhouari and Bermak (2004) also suggested using the strategy of multi-model forecasting using Gaussian process predictors. A new study suggests that Gaussian process regressions might be used to accurately forecast steel prices for the resource industry.

Forecasting efforts have also been observed for residential real estate prices using classic econometric models and, more recently, machine learning techniques. For example, semiparametric models are more useful for forecasting and evaluating residential housing prices, according to Gençay and Yang’s (1996a,b) comparison of parametric and semiparametric classical econometric models. Glennon et al. (2018) discover that utilising several models increases the accuracy of property valuations based on house price indexes. Clapp and Giaccotto (1992) show that the evaluated value approach is more effective than the repeat sales methodology by creating a mechanism for reducing the impact of measurement mistakes connected with it. According to Kaboudan and Sarkar (2007), projections derived from equations calculated from city-wide disaggregated data have an advantage over those derived from local average pricing equations. Mei and Fang (2017) create a dynamic state forecasting model of the average selling price using multiple regression and trend analysis. Levesque (1994) uses an airport noise case study to examine the breakdown of residential property values. The performance of the AR integrated moving average model is examined by Hepşen and Vatansever (2011). Principal components analysis is used by Baroni et al. (2005) to create a repeat sales index that predicts apartment prices. Guo (2020) examines future price stability using both linear and non-linear regression models. For machine learning approaches, Paris (2008) investigates artificial neural networks to predict alterations in national and local price indices for the UK residential real estate market. Chi (2017) suggests using a spatial back-propagation neural network to estimate the price of residential real estate. In order to estimate demand for residential development, Bee-Hua (2000) proposes combining neural networks with evolutionary algorithms. Štubňová et al. (2020) discover that neural networks outperform regression models for estimating residential real estate market prices. For predicting residential unit rent prices, Seya and Shiroi (2021) compare the deep neural network with nearest neighbour Gaussian process and conclude that the former has greater potential. In order to estimate construction cost from economic variables and indices, Rafiei and Adeli (2018) suggest using an unsupervised deep Boltzmann machine (DBM) learning approach, a softmax layer to extract pertinent features from the input data, and a three-layer back-propagation neural network (or support vector machine) to transform the trained unsupervised DBM into a supervised regression network. The extra-trees regression technique and adial basis function-based support vector regression algorithm are both effective in modelling the fine-scale spatiotemporal distribution of residential land values, according to Zhang et al. (2021). Yoo et al. (2012) examine the hedonic modelling of residential property sales prices using cubist, random forest, and conventional ordinary least squares, and discover that the random forest produces the most accurate results. Researchers, Hong et al. (2020), Dimopoulos and Bakas (2019), and Dimopoulos et al. (2018), all show how machine learning models may be used to increase the accuracy of real estate mass assessments. Machine learning models are also helpful for residential land evaluations, according to Ai et al. (2020). Picchetti (2017) demonstrates how the gradient tree boosting approach may be utilised to get around the problem of sample heterogeneity in hedonic geospatial residential property price assessments. Different machine learning methods have demonstrated promising accuracy in the literature for real estate price forecasts. Based upon different empirical evidence, the mean absolute percentage forecast errors have ranged from below 1% to above 10%, considering that various time-series data have varied features and some are more difficult to forecast than others.

To continue this theme, we focus on Gaussian process regressions for residential real estate price index estimations for ten major Chinese cities between July 2005 and April 2021, a period during which the real estate market saw rapid growth. This is, to the best of our knowledge, the first forecast study that employs Gaussian process regressions to examine residential real estate price indices in the Chinese market. In past research, the Gaussian process regression approach was frequently used to investigate the Boston housing related issues from the standpoint of property valuation. Given the prominence of the residential real estate market in China, there may not be much motivation required. Forecasts of residential real estate price indices should thus be crucial and potentially difficult for investors and policymakers, since having a thorough grasp of pricing trends may benefit in decision making. There are several forecasting methodologies, including econometric and machine learning-based models. The Gaussian process regression technique is chosen because to the non-linear patterns presented by the residential real estate price indices under consideration here, as well as the literature’s recognised value and promise for real estate price forecasting. To train the Gaussian process regression, a Bayesian optimisation strategy using a range of basis functions and kernels, as well as the cross validation technique, will be employed. Because most earlier study focused on a single location when evaluating other types of real estate assets, our findings may help to a better understanding of applying machine learning technology to predict residential real estate pricing in the constantly expanding Chinese market. The coverage of these cities in the current work, combined with the availability of data, should represent an economically natural way to explore the forecast problem for residential real estate, given that demand and supply are most active in these major cities and that each one may have distinct price characteristics that are worth examining. Because prior research in this area has typically concentrated on older time periods when researching other forms of real estate, such as 1Q1981–4Q2002, Q12000–Q32010, 7M2013–12M2013, 6M1996–8M2014, 1M2013–2M2017, 1M2010–7M2017, 12M2010–10M2017, 1M2011–12M2017, and 1M2005–11M2018, our findings might potentially provide a more contemporary perspective on the efficacy of machine learning approaches for real estate price index estimates for the Chinese market. The current condition of uncertainty in the residential real estate market may cause price behaviour to display increased non-linearities. This study adds to the body of literature on the usefulness of the Gaussian process regression for real estate price index forecasts in dynamic environments by building Gaussian process regression models based on a more recent time period, from July 2005 to April 2021. This study also provides policymakers and investors with timely forecast tools for potential use. For many less sophisticated forecast users, machine learning methods may appear more challenging than econometric models. As a result, to assist technical predictions, we develop comparatively simple yet accurate Gaussian process regressions. Given that machine learning models, like econometric models, have the potential for overfitting or underfitting, our model development techniques have taken into account a trade-off between prediction accuracy and stabilities. We specifically undertake out-of-sample projections from May 2019 to April 2021 and get relative root mean square errors for the ten price indices that range from 0.0207% to 0.2818%. Our findings might be used individually or in combination with other forecasts to formulate theories about the trends in the residential real estate price index and carry out additional policy analysis.

Data

The data used for study come from the China Real Estate Index System (CREIS). It is an analytical tool created to reflect the status of the real estate markets and growth trends in the major Chinese cities. The platform was created in 1994 by the Real Estate Association, the Development Research Center of the State Council, and the National Real Estate Development Group Corporation. In 1995 and 2005, CREIS was audited by specialists from the Ministry of Land and Resources, the Ministry of Construction, the Real Estate Association, the Development Research Center of the State Council, the Banking Regulatory Commission, as well as various universities. Currently, CREIS publishes a number of real estate price indices on a monthly basis, including rental price indices, residential real estate sales price indices for both existing homes and newly constructed homes, price indices for villas, retail real estate price indices, and office price indices, among many others. This system has developed from a platform to include the majority of the Chinese real estate markets. In this study, we focus on investigating forecasting issues using residential real estate price indices.

Residential real estate price indices collected from CREIS cover the following ten significant Chinese cities: Wuhan, Chengdu, Hangzhou, Nanjing, Shenzhen, Guangzhou, Tianjing, Chongqing, Beijing, and Shanghai. According to CREIS, it uses phone surveys, field surveys, and web surveys to collect data. The samples used to produce the price index comprise all residential real estate in a certain city that is available for sale in a specific month. The base-period price index is based on the price of Beijing’s residential real estate in 12M2000, with an index value of 1,000. By normalising against this base-period price, residential real estate price indices for other locations and months are created.

According to CREIS, its residential real estate price index for a certain city is calculated as follows: $I_{t}^{'} = \frac{\sum P_{i}^{t} A_{i}^{t - 1}}{\sum P_{i}^{t - 1} A_{i}^{t - 1}} \cdot I_{t - 1}^{'}$ , where $I_{t}^{'}$ and $I_{t - 1}^{'}$ indicate price indices related to time t and time $t - 1$ , respectively, $A_{i}^{t - 1}$ indicates Project i’s total area of construction related to time $t - 1$ , and $P_{i}^{t}$ and $P_{i}^{t - 1}$ indicate average prices of Project i’s residential real estate related to time t and time $t - 1$ , respectively. It is important to highlight that the residential real estate price indices of the ten cities are the only data that have been evaluated in the current investigation. We do not have access to any further CREIS platform data. The time period covered by the monthly data in this analysis is 7M2005–4M2021. Figure 1 shows the visualisations of the ten price indices, their first differences, distributional plots using histograms with kernel estimations, and quantile–quantile plots, using Beijing, Shanghai, Shenzhen, and Guangzhou as examples. The ten price indices and their first differences are summarised in Table 1. None of the ten price indices, based on the p-values of the Anderson–Darling and Kolmogorov–Smirnov tests provided in Table 1, follows a normal distribution at the 1% significance level. None of the ten price indices, based on the p-values of the Jarque–Bera test shown in Table 1, follows a normal distribution at the 5% significance level. For many various forms of financial and economic time-series data, non-normality is probably not shocking (Jin and Xu, 2024d; Jin et al., 2024). The price indices of Beijing, Tianjing, Chongqing, Hangzhou, Wuhan, and Chengdu are left-skewed, and the price indices of Shanghai, Shenzhen, Guangzhou, and Nanjing are right-skewed. Platykurtic price indices are determined for all of the ten cities.

Fig. 1. Residential real estate price indices of ten major cities in China during 7M2005–4M2021.

**Table 1. Summary statistics of residential real estate price indices of ten major cities in China during 7M2005–4M2021.**
City	Series	Minimum	Mean	Median	Standard deviation	Maximum	Skewness	Kurtosis	Jarque–Bera	Anderson–Darling	Kolmogorov–Smirnov
Beijing	Price	1,227	3329.126	3478.5	1075.114	4,565	−0.437	1.943	0.006	<0.001	<0.001
	First difference	−47	17.550	7.0	29.365	121	1.193	4.294	<0.001	<0.001	<0.001
Shanghai	Price	1,567	2659.653	2509.5	630.990	3,590	0.113	1.828	0.011	<0.001	<0.001
	First difference	−45	10.624	6.0	21.173	102	1.631	6.957	<0.001	<0.001	<0.001
Tianjing	Price	909	1653.221	1660.0	302.764	2,039	−0.581	2.738	0.011	<0.001	<0.001
	First difference	−70	5.635	2.0	15.963	77	0.714	7.911	<0.001	<0.001	<0.001
Chongqing	Price	580	911.642	938.5	154.988	1,126	−0.592	2.408	0.007	<0.001	<0.001
	First difference	−27	2.741	2.0	10.430	43	0.541	5.730	<0.001	<0.001	<0.001
Shenzhen	Price	1,176	3405.211	3096.5	1185.993	4,966	0.050	1.725	0.008	<0.001	<0.001
	First difference	−25	19.963	7.0	39.994	225	2.561	10.861	<0.001	<0.001	<0.001
Guangzhou	Price	1,068	2261.089	2300.5	654.347	3,254	0.022	1.928	0.019	<0.001	<0.001
	First difference	−38	11.344	8.0	21.305	77	0.720	3.616	0.003	<0.001	<0.001
Hangzhou	Price	1,206	1913.342	1913.0	373.503	2,488	−0.163	2.131	0.034	<0.001	<0.001
	First difference	−54	6.783	3.0	18.192	108	1.422	9.539	<0.001	<0.001	<0.001
Nanjing	Price	706	1336.289	1289.0	337.566	1,828	0.007	1.841	0.013	<0.001	<0.001
	First difference	−103	5.180	3.0	13.812	59	−1.706	23.102	<0.001	<0.001	<0.001
Wuhan	Price	539	1164.584	1154.0	335.519	1,630	−0.143	1.941	0.017	<0.001	<0.001
	First difference	−15	5.751	3.0	8.961	43	1.219	4.803	<0.001	<0.001	<0.001
Chengdu	Price	611	942.289	959.0	152.381	1,181	−0.216	2.148	0.030	<0.001	<0.001
	First difference	−50	2.434	2.0	9.059	34	−0.560	8.965	<0.001	<0.001	<0.001

In the fields of finance and economics, manifestations of non-linear features at higher moments have already been extensively reported across a wide range of time-series data (Yang et al., 2008; Jin and Xu, 2024g). We apply the Brock–Dechert–Scheinkman (BDS) (Brock et al., 1996) test to examine the ten residential real estate price time-series data for any potential non-linear trends. We implement the BDS test based upon 2–10 as the embedding dimensions and 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0 multiplying a particular price index time series’ standard deviation as $ϵ$ ’s-distance utilised for assessing the proximity of different data points. We determine that the resultant p-values of the tests are all virtually zero. These findings suggest that each of the ten price indices has non-linearities. Given these facts, this work seeks to predict the ten non-normal and non-linear residential real estate price indices using Gaussian process regressions.

Method

The Gaussian process regression, a kind of probabilistic kernel model that has been demonstrated to be effective at forecasting a variety of non-linear patterns in a number of scientific disciplines (Jin and Xu, 2024c,f), is the forecasting technique being examined in this study. For the illustration of the model, the training data with an unknown distribution are indicated by ${(x_{i}, y_{i}); i = 1, 2, \dots, T}$ , the d-dimensional predictors are indicated by $x_{i} \in ℝ^{d}$ , and the target is indicated by $y_{i} \in ℝ$ . The real estate price indices for each city are projected using 20 lagged price indices as predictors. For example, to forecast the price index for the 21 month, 20 price indices from the previous 20 consecutive months will be used as predictors.

Let $y = x^{T} β + ε$ indicate a linear regression, where $ε \sim N (0, σ^{2})$ indicates the error item. Contrarily, Gaussian process regressions use explicit basis functions and latent variables to define the target variable (Jin and Xu, 2024e). The basis function might be expressed using b, and the latent variables from the Gaussian process might be expressed using $l (x_{i})$ such that they meet the requirement of the joint Gaussian distribution. The covariance function of latent variables will represent the target’s smoothness, and the basis function’s goal is to project various predictors onto the space of features (Jin and Xu, 2024j,k).

The covariance and mean are two metrics that are frequently used to define a Gaussian process (GP). We are going to express the mean using $m (x) = E (l (x))$ and the covariance using $k (x, x^{'}) = Cov [l (x), l (x^{'})]$ . Then, we are going to express the Gaussian process regression using $y = b {(x)}^{T} β + l (x)$ , where $l (x) \sim GP (0, k (x, x^{'}))$ and $b (x) \in ℝ^{p}$ . Via $θ$ , a hyper-parameter, we are going to parameterise $k (x, x^{'})$ using $k (x, x^{'} | θ)$ . When using a specific technique to train a Gaussian process regression, the following variables will normally be estimated: $σ^{2}$ , $θ$ , and $β$ . We are also going to define kernels, expressed as k’s, and basis functions, expressed as b’s, to be adopted for model training.

The two types of kernels considered in the current study are isotropic kernels and non-isotropic kernels (automatic relevance determination kernels). Both isotropic and non-isotropic kernels are examined using five distinct kernels. Equations (A.1)–(A.10) shown in the appendix offer specifications for all kernels under consideration. $σ_{l}$ is utilised to indicate isotropic kernels’ characteristic length scale, $α > 0$ is utilised to indicate the scale-mixture parameter, $σ_{f}$ is utilised to indicate the standard deviation of the signal, and $r = \sqrt{{(x_{i} - x_{j})}^{'} (x_{i} - x_{j})}$ . $σ_{l}$ and $σ_{f}$ ’s positiveness is going to be achieved with $θ = (θ_{1}, θ_{2}) = (log σ_{l}, log σ_{f})$ . Each predictor is going to have a distinct length scale for non-isotropic kernels, which is expressed as $σ_{m}$ ( $m = 1, 2, \dots, d$ ). Correspondingly, $θ$ is expressed as $θ = (θ_{1}, θ_{2}, \dots, θ_{d}, θ_{d + 1}) = (log σ_{1}, log σ_{2}, \dots, log σ_{d}, log σ_{f})$ .

In a manner similar to how different kernels are considered, this work takes into account four different basis functions, which are detailed in Eqs. (A.11)–(A.14) in the appendix. In these equations,

\begin{matrix} X & = & {(x_{1}, x_{2}, \dots, x_{n})}^{'}, X^{2} = (\begin{matrix} x_{11}^{2} & x_{12}^{2} & \dots & x_{1 d}^{2} \\ x_{21}^{2} & x_{22}^{2} & \dots & x_{2 d}^{2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{T 1}^{2} & x_{T 2}^{2} & \dots & x_{T d}^{2} \end{matrix}), and \\ B & = & {(b (x_{1}), b (x_{2}), \dots, b (x_{n}))}^{'} . \end{matrix}

Ten-fold cross validation and Bayesian optimisation, which is based on the expected improvement per second plus (EIPSP) technique, are used to estimate the model parameters. Let a GP model be expressed by $f (x)$ . The Bayesian approach evaluates corresponding $y_{i} = f (x_{i})$ by selecting $N_{s}$ randomly chosen data points of $x_{i}$ ’s inside variable boundaries. Here, the number of data points utilised for preliminary judgments is $N_{s} = 4$ . The algorithm will keep gathering data after encountering evaluation faults until it reaches $N_{s}$ successful evaluation cases. The algorithm’s first and second steps are then repeated, as shown below. The updating of $f (x)$ will be the first stage in producing the posterior distribution over $Q (f | x_{i}, y_{i} for i = 1, \dots, T)$ . The second step will include choosing a new data point (x) in order to determine the acquisition function’s ( $a (x)$ ) minimisation goal. A maximum of 100 iterations will be used. The purpose of using $a (x)$ is to evaluate x’s goodness in regards to Q. Instead of evaluating values that would elevate the objective function, expected improvement acquisition functions evaluate expected amounts of improvements to the objective function. We are going to let $μ_{Q} (x_{best})$ express the corresponding numerical value of the lowest mean and $x_{best}$ express the data point at which the lowest posterior mean is reached. We can express the expected improvement (EI) using $EI (x, Q) = E_{Q} [max (0, μ_{Q} (x_{best}) - f (x))]$ . The Bayesian strategy can offer higher advantages per unit of time by using the time-weighting scheme on the acquisition function since the amount of time required to assess the objective may vary depending on the location. Throughout optimisation processes, it is possible to keep a further Bayesian model of the amount of time needed to evaluate the objective as a function of x. In light of this, we can express the acquisition function’s EI per second (EIPS) as $EIPS (x) = \frac{{EI}_{Q} (x)}{μ_{S} (x)}$ , where $μ_{S} (x)$ expresses the posterior mean associated with this extra timing GP model. To avoid the acquisition function overutilising a specific area and preventing a local minimum of the objective, the following changes to its behaviour might be made. Let $σ_{F} (x)$ express the posterior objective’s standard deviation corresponding to x and $σ_{P N}$ express the additive noises’ posterior standard deviation meeting the requirement of $σ_{Q}^{2} (x) = σ_{F}^{2} (x) + σ_{P N}^{2}$ . We are going to express the exploration ratio using $t_{σ_{P N}} > 0$ . After each iteration, the acquisition function based on the EIPSP algorithm determines if the following data point, x, meets the requirement of $σ_{F} (x) < t_{σ_{P N}} σ_{P N}$ . The kernel function will be modified if this criterion is met by multiplying $θ$ by the number of iterations, with x being regarded as overexploiting (Bull, 2011). In essence, the EIPSP technique adjustment raises $σ_{Q}$ for data points between observations. A new data point will then be produced using the newly fitted kernel. If it turns out that the new data point is similarly overexploiting, $θ$ will be increased by ten times in subsequent trials. This strategy will be limited to five times in order to get a data point, x, that is not considered overexploited. The modified x will then be accepted as the next exploration ratio by the EIPSP algorithm. To obtain a more accurate overall response, the algorithm strikes a balance between focusing on previously investigated nearby data points and looking at new data points.

Bayesian optimisation procedures will be carried out over $σ$ , basis functions, kernels, and whether or not predictors are standardised. Forecast performance is going to be determined according to the relative root mean square error (RRMSE), which allows for comparisons of different prediction outcomes across different models or targets (Li et al., 2013). The RRMSE can be expressed as $RRMSE = \frac{\sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i}^{obs} - y_{i}^{for})}^{2}}}{\frac{1}{n} \sum_{i = 1}^{n} y_{i}^{obs}}$ , where $y^{for}$ expresses the target’s predicted numerical value, n expresses the number of observations utilised for performance assessments, and $y^{obs}$ expresses the target variable’s observed numerical value. Two additional performance metrics are adopted to assess prediction accuracy: the mean absolute error (MAE) and root mean square error (RMSE), whose units are identical to the target variable and whose magnitude is related to the target variable. The RMSE can be expressed as $RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i}^{obs} - y_{i}^{for})}^{2}}$ . The MAE can be expressed as $MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i}^{obs} - y_{i}^{for} |$ . Finally, we also consider the correlation coefficient (CC) for measuring performance, which can be expressed as $CC = \frac{\sum_{i = 1}^{n} (y_{i}^{obs} - \bar{y^{obs}}) (y_{i}^{for} - \bar{y^{for}})}{\sqrt{\sum_{i = 1}^{n} {(y_{i}^{obs} - \bar{y^{obs}})}^{2} \sum_{i = 1}^{n} {(y_{i}^{for} - \bar{y^{for}})}^{2}}}$ , where $\bar{y^{obs}}$ and $\bar{y^{for}}$ stand for averages.

Result

For each city, data from its residential real estate price indices are utilised for model training from 7M2005 to 4M2019, and for model performance testing for one-month ahead forecasts from 5M2019 to 4M2021. Figure 2 shows the outcomes of EIPSP optimisations based on training data for all price indices. These results indicate that (a) the isotropic rational quadratic kernel (Eq. (A.4)), empty basis function (Eq. (A.11)) and standardised predictors are chosen for Beijing, (b) the isotropic rational quadratic kernel (Eq. (A.4)), empty basis function (Eq. (A.11)), and non-standardised predictors are chosen for Shanghai, (c) the isotropic exponential kernel (Eq. (A.1)), linear basis function (Eq. (A.13)), and standardised predictors are chosen for Tianjing, (d) the isotropic rational quadratic kernel (Eq. (A.4)), constant basis function (Eq. (A.12)), and non-standardised predictors are chosen for Chongqing, (e) the isotropic Matern 3/2 kernel (Eq. (A.5)), constant basis function (Eq. (A.12)), and standardised predictors are chosen for Shenzhen, (f) the isotropic exponential kernel (Eq. (A.1)), empty basis function (Eq. (A.11)), and standardised predictors are chosen for Guangzhou, (g) the isotropic exponential kernel (Eq. (A.1)), constant basis function (Eq. (A.12)), and standardised predictors are chosen for Hangzhou, (h) the isotropic rational quadratic kernel (Eq. (A.4)), empty basis function (Eq. (A.11)), and standardised predictors are chosen for Nanjing, (i) the isotropic exponential kernel (Eq. (A.1)), empty basis function (Eq. (A.11)), and standardised predictors are chosen for Wuhan, and (j) the isotropic exponential kernel (Eq. (A.1)), empty basis function (Eq. (A.11)), and standardised predictors are chosen for Chengdu. For the ten GPR models created using the ten-fold cross validation for the residential real estate price index of each city, the results of parameter estimates are shown in Table 2. The initials ‘CV1’, ‘CV2’, …, and ‘CV10’ are used to denote these parameter estimations, where ‘CV’ stands for ‘cross validation’.

Fig. 2. Optimisation processes based upon the EIPSP algorithm for monthly residential real estate price indices.

**Table 2. Parameter estimates of ten GPR models for the residential real estate price index of each city.**
	CV1	CV2	CV3	CV4	CV5	CV6	CV7	CV8	CV9	CV10
Parameter	Beijing: Isotropic rational quadratic kernel, empty basis function, and standardised predictors
$σ$	8.713	8.947	8.998	8.697	9.072	8.826	8.715	8.919	8.974	8.692
$σ_{l}$	14.406	14.249	14.125	14.492	14.222	14.488	14.374	14.005	14.627	14.176
$α$	0.004	0.005	0.005	0.005	0.005	0.005	0.005	0.005	0.005	0.004
$σ_{f}$	3394.053	3376.831	3359.211	3380.081	3370.464	3362.822	3364.386	3379.430	3381.626	3371.865
Parameter	Shanghai: Isotropic rational quadratic kernel, empty basis function, and non-standardised predictors
$σ$	5.736	5.531	7.232	5.695	5.661	7.164	5.572	7.152	7.363	5.589
$σ_{l}$	9671.714	10403.556	9929.080	9429.178	9643.957	10045.575	9462.530	9846.397	10026.156	10190.834
$α$	0.002	0.002	0.002	0.002	0.002	0.003	0.002	0.002	0.002	0.002
$σ_{f}$	2855.151	2850.939	2849.535	2855.446	2841.516	2842.975	2854.876	2852.664	2849.486	2858.678
Parameter	Tianjing: Isotropic exponential kernel, linear basis function, and standardised predictors
$σ$	2.270	2.299	2.312	2.332	2.282	2.344	2.261	2.258	2.324	2.283
$β_{0}$	1717.865	1727.215	1728.874	1715.903	1727.174	1726.565	1732.113	1725.618	1726.524	1730.831
$β_{1}$	−23.855	−19.491	−22.554	−21.757	−28.676	−25.240	−17.130	−22.339	−4.123	−25.719
$β_{2}$	−5.173	20.536	6.709	−0.360	1.963	3.967	−4.806	4.840	−32.665	9.578
$β_{3}$	61.436	25.971	54.152	57.942	61.270	55.223	48.676	56.648	68.301	50.558
$β_{4}$	−28.691	−24.983	−31.967	−37.638	−31.608	−27.420	−38.024	−42.840	−17.456	−16.017
$β_{5}$	8.072	17.076	2.928	16.652	1.471	13.791	27.455	1.943	−3.757	−50.195
$β_{6}$	11.769	8.668	24.403	22.846	35.550	18.028	25.853	30.516	19.727	63.047
$β_{7}$	−32.785	−26.763	−37.517	−50.651	−48.305	−30.389	−44.989	−22.315	−37.739	−25.656
$β_{8}$	−2.616	26.624	6.064	15.176	11.833	−4.324	−6.950	2.218	−5.043	−14.760
$β_{9}$	48.837	34.885	26.952	33.456	23.624	−10.683	36.212	16.553	30.067	28.475
$β_{10}$	−16.779	−82.903	−9.891	−12.818	−12.622	28.260	−14.746	−1.969	20.852	17.430
$β_{11}$	−40.011	7.574	−44.541	−41.274	−31.914	−33.874	−19.516	−38.922	−56.235	−56.915
$β_{12}$	14.900	−7.082	14.207	6.809	15.505	−1.397	2.261	−5.796	13.798	38.316
$β_{13}$	−36.517	−3.004	−45.249	−29.111	−49.212	−29.749	−36.862	−19.959	−33.266	−83.117
$β_{14}$	59.751	30.071	81.973	55.822	69.940	46.794	61.953	55.091	66.448	65.669
$β_{15}$	−50.921	−26.616	−47.322	−37.636	−47.118	−34.808	−42.435	−45.082	−62.480	17.268
$β_{16}$	45.260	22.869	21.828	20.794	38.321	43.650	20.278	48.551	38.952	−16.148
$β_{17}$	−54.270	−32.550	−43.782	−39.035	−58.090	−43.307	−35.116	−62.919	−43.149	−65.776
$β_{18}$	33.010	40.456	46.602	31.022	35.392	32.274	15.469	37.788	21.900	45.796
$β_{19}$	−30.932	−5.555	4.554	10.706	5.001	−32.783	22.810	5.460	−17.691	−17.220
$β_{20}$	265.865	225.342	223.488	231.818	235.252	266.019	225.041	227.571	265.693	263.809
$σ_{l}$	0.233	0.507	0.398	0.392	0.320	0.390	0.336	0.401	0.266	0.182
$σ_{f}$	11.831	13.142	13.437	13.007	12.424	11.762	12.165	13.196	11.459	9.576
Parameter	Chongqing: Isotropic rational quadratic kernel, constant basis function, and non-standardised predictors
$σ$	4.140	3.910	3.473	3.950	4.206	3.835	3.685	3.582	4.019	3.635
$β$	877.736	880.853	886.232	886.659	882.419	885.206	894.449	882.460	880.916	883.589
$σ_{l}$	1079.054	948.102	956.706	1061.836	1107.167	918.901	909.187	1048.151	1037.970	953.585
$α$	0.086	0.078	0.063	0.072	0.086	0.071	0.076	0.075	0.078	0.078
$σ_{f}$	269.608	243.520	238.179	260.364	273.387	232.212	229.531	258.342	259.731	243.181
Parameter	Shenzhen: Isotropic Matern 3/2 kernel, constant basis function, and standardised predictors
$σ$	10.590	10.418	10.458	10.561	10.610	10.575	10.429	10.585	10.571	10.556
$β$	3405.083	3446.051	3449.565	3447.997	3465.798	3469.546	3461.686	3450.087	3465.318	3457.192
$σ_{l}$	7.530	4.524	4.453	4.501	4.141	4.188	4.226	4.972	4.249	4.151
$σ_{f}$	1088.517	735.484	710.153	710.102	684.236	690.440	682.685	745.662	679.481	686.742
Parameter	Guangzhou: Isotropic exponential kernel, empty basis function, and standardised predictors
$σ$	5.707	5.785	5.717	5.698	5.642	5.764	5.622	5.716	5.692	5.738
$σ_{l}$	2212.542	2171.164	2177.475	2182.294	2382.944	2191.010	2220.762	2183.514	2207.534	2156.864
$σ_{f}$	2517.003	2516.762	2516.039	2515.704	2538.658	2517.833	2509.334	2516.505	2518.143	2516.713
Parameter	Hangzhou: Isotropic exponential kernel, constant basis function, and standardised predictors
$σ$	3.081	3.158	3.169	3.158	3.121	3.205	3.090	3.070	3.158	3.142
$β$	1918.529	1919.895	1921.062	1921.204	1920.555	1920.507	1934.084	1919.661	1921.003	1921.275
$σ_{l}$	354.900	350.720	351.694	353.756	362.072	343.792	355.115	362.508	354.651	371.115
$σ_{f}$	676.403	677.372	675.998	678.294	676.735	676.884	664.110	676.710	678.894	680.808
Parameter	Nanjing: Isotropic rational quadratic kernel, empty basis function, and standardised predictors
$σ$	2.974	3.003	2.994	2.963	2.922	2.957	2.989	2.874	2.953	2.974
$σ_{l}$	18.620	18.121	18.124	18.587	18.364	19.057	18.121	18.658	18.126	18.214
$α$	0.003	0.003	0.002	0.003	0.002	0.002	0.002	0.002	0.003	0.003
$σ_{f}$	1429.918	1426.269	1426.653	1430.252	1432.148	1427.748	1428.890	1430.833	1429.406	1428.115
Parameter	Wuhan: Isotropic exponential kernel, empty basis function, and standardised predictors
$σ$	2.842	2.878	2.872	2.845	2.882	2.848	2.893	2.811	2.839	2.864
$σ_{l}$	2630.953	2601.939	2603.949	2643.151	2612.150	2659.777	2663.659	2793.757	2701.073	2606.324
$σ_{f}$	1263.865	1263.303	1263.888	1264.088	1263.440	1263.480	1264.242	1274.134	1264.117	1263.951
Parameter	Chengdu: Isotropic exponential kernel, empty basis function, and standardised predictors
$σ$	1.300	1.325	1.304	1.289	1.271	1.313	1.330	1.321	1.319	1.281
$σ_{l}$	4618.425	4517.889	4687.322	4623.863	4734.923	4675.811	4580.948	4520.558	4670.689	4759.040
$σ_{f}$	991.235	989.991	990.639	990.847	990.908	990.612	991.128	990.176	990.655	993.269

Models ‘CV1’, ‘CV2’, …, and ‘CV10’, which are the ten GPR models created for the residential real estate price index of each city, presented in Table 2, are used to predict the numerical values of the price index for the testing time period of 5M2019 to 4M2021. So, for each month of the testing period, each price index will have ten projected values. To get the final price index estimate for a particular month, the average of the ten projections is used. By successfully wiping out any potential idiosyncratic predictions created by a particular sub-model, this technique may help provide reliable and stable projections in the future. The literature has discussed the desirable qualities and benefits of equal weighing. The projected and actual residential real estate price indices for each city are compared in Fig. 3. Figure 4 representation of percentage forecast errors corresponds to Fig. 3. It is clear that projected price indices typically follow observed price indices quite closely. Additional prediction performance data in terms of the RMSE, RRMSE, MAE, and CC are summarised in Table 3 for the results in Figs. 3 and 4. In particular, RRMSEs for the ten price indices range from 0.0207% to 0.2818%. Based upon previous research, model prediction accuracy might be rated at the excellent level if $RRMSE < 10 %$ , at the good level if $10 % < RRMSE < 20 %$ , at the fair level if $20 % < RRMSE < 30 %$ , and at the poor level if $RRMSE \geq 30 %$ . According to these criteria, the GPR models constructed here have a high degree of prediction accuracy.

Fig. 3. The plot of forecasted vs. observed series for residential real estate price indices during the testing phase from 5M2019 to 4M2021.

Fig. 4. The plot of percentage forecast errors for residential real estate price indices during the testing phase from 5M2019 to 4M2021.

**Table 3. Forecast performance of the GPR models for residential real estate price indices of ten cities during the testing phase from 5M2019 to 4M2021.**
City	Testing RRMSE	Testing RMSE	Testing MAE	Testing CC
Beijing	0.095%	4.301	3.268	92.645%
Shanghai	0.087%	3.079	2.392	99.135%
Tianjing	0.042%	0.839	0.622	99.887%
Chongqing	0.282%	3.129	2.085	93.804%
Shenzhen	0.096%	4.733	3.604	88.225%
Guangzhou	0.061%	1.944	1.467	99.899%
Hangzhou	0.021%	0.506	0.427	99.981%
Nanjing	0.092%	1.665	1.347	99.433%
Wuhan	0.036%	0.579	0.432	99.833%
Chengdu	0.025%	0.291	0.181	99.961%

The findings of an error autocorrelation study conducted to evaluate the suitability of the built models are shown in Fig. 5. With a focus on normalised autocorrelations, the analysis is performed for up to 20 lags. These results exclude any blatant autocorrelations as significant autocorrelations suggest that there might exist room for improving forecast accuracy through further modelling these autocorrelations and confirm the overall validity of the models. It might be also important to note that although empirical data may be conflicting, including the AR conditional heteroskedasticity effect into a prediction model may increase its performance.

We benchmark the GPR models against the following models that use the same predictors as the GPR models: the support vector regression (SVR) model, regression tree (RT) model, and AR model. For the testing phase from 5M2019 to 4M2021, Table 4 displays comparisons of these models based on the RMSE, where it can be seen that the GPR models lead to the lowest RMSE for each city. We also perform the Diebold–Mariano (Diebold and Mariano, 2002) test for assessing significance of the difference between the GPR models and each of the benchmark models in terms of forecast accuracy. It turns out that the p-values are all below 0.01, suggesting that the GPR models lead to statistically significant better forecast performance than the benchmark models for the price index of each city.

**Table 4. Benchmark analysis: Comparisons of RMSEs for the testing phase from 5M2019 to 4M2021.**
City	GPR	SVR	RT	AR
Beijing	4.301	8.261	9.519	12.186
Shanghai	3.079	5.659	7.423	7.374
Tianjing	0.839	1.247	1.886	1.885
Chongqing	3.129	6.439	7.450	8.925
Shenzhen	4.733	5.935	7.655	13.392
Guangzhou	1.944	3.767	3.922	5.905
Hangzhou	0.506	1.013	0.869	1.517
Nanjing	1.665	2.391	2.935	4.605
Wuhan	0.579	1.073	0.947	1.314
Chengdu	0.291	0.549	0.569	0.814

Implication

For investors and governments, forecasts of residential real estate price indices are an important topic. Investors need real estate price forecasts for portfolio allocation and adjustments, strategic planning, and risk management. Real estate price forecasts are crucial to policymakers for market assessments, policy development, implementation, and modification, especially for preventing market overheating and stimulating the economy when necessary. To the best of the authors’ knowledge, forecasting and valuation methods employed by numerous investors, including those in the public sector, are often based on econometric techniques, particularly time-series methods where price indices are relevant. Additionally, professional judgments from experts are still employed too. This has a reasonable basis because econometric methods and expert assessments are presumably relatively easy to develop, use, and maintain, have been widely adopted by many forecast users for many years, and many of them might be able to offer a respectable level of prediction accuracy. It may be difficult for some policymakers and investors to consider these types of models because some decision-makers still view machine learning designs as overly complex tools for making forecasts, but it is generally agreed that these models are worth investigating for their potential, especially given that computational capabilities are becoming more and more accessible and the realistic basis for potential irregularities in price time-series data. Actually, several decision-makers and savvy investors have recently expressed an increasing interest in machine learning methods for forecasting real estate values. The research being done here continues the tradition of investigating the possibility of Gaussian process regressions to resolve forecasting problems for residential real estate price indices. With the provided approach of developing such forecast models for the ten major Chinese cities and the proven strong prediction accuracy and prediction stabilities, the results suggest that machine learning techniques are well worth investigating, possibly for a greater diversity of real estate types and wider coverage of locations.

Conclusion

The topic of residential real estate price index estimates for ten major Chinese cities is the focus of the current study. By using the Gaussian process regression approach and monthly data from 7M2005 to 4M2021, we construct forecast models. When creating prediction models using Bayesian optimisations and cross validation, we pay particular attention to four basis functions, ten kernels, and two methods to predictor standardisation. With relative root mean square errors ranging from 0.0207% to 0.2818% for the ten price indices over a two-year period from 5M2019 to 4M2021, we find that the built models produce solid out-of-sample projections. These forecast models might be used by market players and policymakers to enhance their understanding of the residential real estate industry. Future research may be interesting in examining additional Bayesian optimisation techniques in addition to the expected improvement per second plus algorithm taken into consideration here. Additionally, the forecasting process may be broadened to incorporate other cities and a variety of different real estate price indices.

Appendix: Explored Kernels and Basis Functions

In this appendix, we list all explored kernels in Eqs. (A.1)–(A.10) and basis functions in Eqs. (A.11)–(A.14) :

Isotropic Exponential: k (x_{i}, x_{j} | θ) = σ_{f}^{2} e^{- \frac{r}{σ_{l}}},

Isotropic Squared Exponential: k (x_{i}, x_{j} | θ) = σ_{f}^{2} e^{- \frac{1}{2} \frac{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})}{σ_{l}^{2}}},

Isotropic Matern 5/2: k (x_{i}, x_{j} | θ) = σ_{f}^{2} (1 + \frac{\sqrt{5} r}{σ_{l}} + \frac{5 r^{2}}{3 σ_{l}^{2}}) e^{- \frac{\sqrt{5} r}{σ_{l}}},

Isotropic Rational Quadratic: k (x_{i}, x_{j} | θ) = σ_{f}^{2} {(1 + \frac{r^{2}}{2 α σ_{l}^{2}})}^{- α},

Isotropic Matern 3/2: k (x_{i}, x_{j} | θ) = σ_{f}^{2} (1 + \frac{\sqrt{3} r}{σ_{l}}) e^{- \frac{\sqrt{3} r}{σ_{l}}},

Nonisotropic Exponential: k (x_{i}, x_{j} | θ) = σ_{f}^{2} e^{- \sqrt{\sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}}},

Nonisotropic Squared Exponential: k (x_{i}, x_{j} | θ) = σ_{f}^{2} e^{- \frac{1}{2} \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}},

\begin{matrix} Nonisotropic Matern 5/2: \\ k (x_{i}, x_{j} | θ) = σ_{f}^{2} (1 + \sqrt{5 \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}} + \frac{5}{3} \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}) \\ \times e^{- \sqrt{5 \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}}}, \end{matrix}

\begin{matrix} Nonisotropic Rational Quadratic: \\ k (x_{i}, x_{j} | θ) = σ_{f}^{2} {(1 + \frac{1}{2 α} \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}})}^{- α}, \end{matrix}

\begin{matrix} Nonisotropic Matern 3/2: \\ k (x_{i}, x_{j} | θ) = σ_{f}^{2} (1 + \sqrt{3 \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}}) e^{- \sqrt{3 \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}}}, \end{matrix}

Empty: B = Empty Matrix,

Constant: B = I_{n \times 1},

Linear: B = [1, X],

Pure Quadratic: B = [1, X, X^{2}] .

ORCID

Bingzi Jin https://orcid.org/0009-0005-1620-7772

Xiaojie Xu https://orcid.org/0000-0002-4452-1540

References

Ai, H, Q Liu, Y Jiang and J He [2020] Urban residential land price appraisal via quantifying impact factors based on deep belief networks. In Proc. 2020 12th Int. Conf. Machine Learning and Computing, pp. 29–33. https://doi.org/10.1145/3383972.3384017 Crossref, Google Scholar
Alade, IO, Y Zhang and X Xu [2021] Modeling and prediction of lattice parameters of binary spinel compounds (AM₂X₄) using support vector regression with Bayesian optimization. New Journal of Chemistry, 45, 15255–15266. https://doi.org/10.1039/d1nj01523k Crossref, Google Scholar
Baroni, M, F Barthélémy and M Mokrane [2005] A PCA Factor Repeat Sales Index (1973–2001) to Forecast Apartment Prices in Paris (France). ESSEC. Google Scholar
Bee-Hua, G [2000] Evaluating the performance of combining neural networks and genetic algorithms to forecast construction demand: The case of the Singapore residential sector. Construction Management & Economics, 18, 209–217. https://doi.org/10.1080/014461900370834 Crossref, Google Scholar
Brahim-Belhouari, S and A Bermak [2004] Gaussian process for nonstationary time series prediction. Computational Statistics & Data Analysis, 47, 705–712. https://doi.org/10.1016/j.csda.2004.02.006 Crossref, Google Scholar
Brahim-Belhouari, S and J-M Vesin [2001] Bayesian learning using Gaussian process for time series prediction. In Proc. 11th IEEE Signal Processing Workshop on Statistical Signal Processing (Cat. No. 01TH8563), pp. 433–436. IEEE. https://doi.org/10.1109/SSP.2001.955315 Crossref, Google Scholar
Brock, WA, JA Scheinkman, WD Dechert and B LeBaron [1996] A test for independence based on the correlation dimension. Econometric Reviews, 15, 197–235. https://doi.org/10.1080/07474939608800353 Crossref, Google Scholar
Bull, AD [2011] Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research, 12, 2879–2904. Google Scholar
Chi, J [2017] Spatial BP neural network model in evaluation of residential real estate price. Geospatial Information, 2, 86–90. Google Scholar
Clapp, JM and C Giaccotto [1992] Estimating price indices for residential property: A comparison of repeat sales and assessed value methods. Journal of the American Statistical Association, 87, 300–306. https://doi.org/10.1080/01621459.1992.10475209 Crossref, Google Scholar
Diebold, FX and RS Mariano [2002] Comparing predictive accuracy. Journal of Business & Economic Statistics, 20, 134–144. https://doi.org/10.1198/073500102753410444 Crossref, Google Scholar
Dimopoulos, T and N Bakas [2019] Sensitivity analysis of machine learning models for the mass appraisal of real estate: Case study of residential units in Nicosia, Cyprus. Remote Sensing, 11, p. 3047. https://doi.org/10.3390/rs11243047 Crossref, Google Scholar
Dimopoulos, T, H Tyralis, NP Bakas and D Hadjimitsis [2018] Accuracy measurement of random forests and linear regression for mass appraisal models that estimate the prices of residential apartments in Nicosia, Cyprus. Advances in Geosciences, 45, 377–382. https://doi.org/10.5194/adgeo-45-377-2018 Crossref, Google Scholar
Gençay, R and X Yang [1996a] Forecast comparisons of residential housing prices by parametric and semiparametric regression. The Canadian Journal of Economics/Revue canadienne d’Economique, 29, S515–S519. https://doi.org/10.2307/136099 Crossref, Google Scholar
Gençay, R and X Yang [1996b] A forecast comparison of residential housing prices by parametric versus semiparametric conditional mean estimators. Economics Letters, 52, 129–135. https://doi.org/10.1016/S0165-1765(96)00851-8 Crossref, Google Scholar
Glennon, D, H Kiefer and T Mayock [2018] Measurement error in residential property valuation: An application of forecast combination. Journal of Housing Economics, 41, 1–29. https://doi.org/10.1016/j.jhe.2018.02.002 Crossref, Google Scholar
Guo, Z [2020] The forecast of housing price in Xi’an based on big data analysis. Insight-Statistics, 3, 7–14. Crossref, Google Scholar
Hepşen, A and M Vatansever [2011] Forecasting future trends in Dubai housing market by using Box–Jenkins autoregressive integrated moving average. International Journal of Housing Markets and Analysis. https://doi.org/10.1108/17538271111153004 Google Scholar
Hong, J, H Choi and W-s Kim [2020] A house price valuation based on the random forest approach: The mass appraisal of residential property in South Korea. International Journal of Strategic Property Management, 24, 140–152. https://doi.org/10.3846/ijspm.2020.11544 Crossref, Google Scholar
Jin, B and X Xu [2024a] Carbon emission allowance price forecasting for China Guangdong carbon emission exchange via the neural network. Global Finance Review, 6, p. 3491. https://doi.org/10.18282/gfr.v6i1.3491 Crossref, Google Scholar
Jin, B and X Xu [2024b] Contemporaneous causality among price indices of ten major steel products. Ironmaking & Steelmaking, 51, 515–526. https://doi.org/10.1177/03019233241249361 Crossref, Google Scholar
Jin, B and X Xu [2024c] Forecasting wholesale prices of yellow corn through the Gaussian process regression. Neural Computing and Applications, 36, 8693–8710. https://doi.org/10.1007/s00521-024-09531-2 Crossref, Google Scholar
Jin, B and X Xu [2024d] Forecasts of China mainland new energy index prices through Gaussian process regressions. Journal of Clean Energy and Energy Storage, 1, p. 2450006. https://doi.org/10.1142/S2811034X24500060 Link, Google Scholar
Jin, B and X Xu [2024e] Forecasts of coking coal futures price indices through Gaussian process regressions. Mineral Economics. https://doi.org/10.1007/s13563-024-00472-9 Crossref, Google Scholar
Jin, B and X Xu [2024f] Forecasts of thermal coal prices through Gaussian process regressions. Ironmaking & Steelmaking, 51, 819–834. https://doi.org/10.1177/0301923 3241265194 Crossref, Google Scholar
Jin, B and X Xu [2024g] Machine learning-based scrap steel price forecasting for the northeast Chinese market. International Journal of Empirical Economics. https://doi.org/10.1142/S2810943024500112 Link, Google Scholar
Jin, B and X Xu [2024h] Machine learning coffee price predictions. Journal of Uncertain Systems. https://doi.org/10.1142/S1752890924500235 Link, Google Scholar
Jin, B and X Xu [2024i] Machine learning predictions of regional steel price indices for east China. Ironmaking & Steelmaking. https://doi.org/10.1177/03019233241254891 Crossref, Google Scholar
Jin, B and X Xu [2024j] Machine learning price index forecasts of flat steel products. Mineral Economics. https://doi.org/10.1007/s13563-024-00457-8 Google Scholar
Jin, B and X Xu [2024k] Office real estate price index forecasts through Gaussian process regressions for ten major Chinese cities. Advances in Computational Intelligence, 4, p. 8. https://doi.org/10.1007/s43674-024-00075-5 Crossref, Google Scholar
Jin, B and X Xu [2024l] Pre-owned housing price index forecasts using Gaussian process regressions. Journal of Modelling in Management. https://doi.org/10.1108/JM2-12-2023-0315 Crossref, Google Scholar
Jin, B and X Xu [2024m] Predictions of steel price indices through machine learning for the regional northeast Chinese market. Neural Computing and Applications, 36, 20863–20882. https://doi.org/10.1007/s00521-024-10270-7 Crossref, Google Scholar
Jin, B and X Xu [2024n] Price forecasting through neural networks for crude oil, heating oil, and natural gas. Measurement: Energy, 1, p. 100001. https://doi.org/10.1016/j.meaene.2024.100001 Crossref, Google Scholar
Jin, B, X Xu and Y Zhang [2024] Thermal coal futures trading volume predictions through the neural network. Journal of Modelling in Management. https://doi.org/10.1108/JM2-09-2023-0207 Crossref, Google Scholar
Kaboudan, M and A Sarkar [2007] A GIS framework to forecast residential home prices. In 23rd Annual Meeting of the American Real Estate Society, San Francisco, CA, USA, pp. 1–27. Google Scholar
Levesque, TJ [1994] Modelling the effects of airport noise on residential housing markets: A case study of Winnipeg International Airport. Journal of Transport Economics and Policy, 28, 199–210. Google Scholar
Li, MF, XP Tang, W Wu and HB Liu [2013] General models for estimating daily global solar radiation for different solar radiation zones in mainland China. Energy Conversion and Management, 70, 139–148. https://doi.org/10.1016/j.enconman.2013.03.004 Crossref, Google Scholar
Mei, H and H Fang [2017] A study on the real estate price forecast model in the Midwest of China-based on provincial panel data analysis. In Proc. Tenth Int. Conf. Management Science and Engineering Management, pp. 525–536. Springer. https://doi.org/10.1007/978-981-10-1837-4_45 Crossref, Google Scholar
Neal, RM (1997). Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. arXiv preprint physics/9701026. Google Scholar
Neal, RM [2012] Bayesian Learning for Neural Networks, Vol. 118. Springer Science & Business Media. Google Scholar
Paris, SD (2008). Using artificial neural networks to forecast changes in national and regional price indices for the UK residential property market. Thesis, University of South Wales, United Kingdom. Google Scholar
Picchetti, P [2017] Hedonic residential property price estimation using geospatial data: A machine-learning approach. Instituto Brasileiro de Economia, 4, 1–7. Google Scholar
Rafiei, MH and H Adeli [2018] Novel machine-learning model for estimating construction costs considering economic variables and indexes. Journal of Construction Engineering and Management, 144, 04018106. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001570 Crossref, Google Scholar
Seya, H and D Shiroi [2021] A comparison of residential apartment rent price predictions using a large data set: Kriging versus deep neural network. Geographical Analysis. https://doi.org/10.1111/gean.12283 Google Scholar
Štubňová, M, M Urbaníková, J Hudáková and V Papcunová [2020] Estimation of residential property market price: Comparison of artificial neural networks and hedonic pricing model. Emerging Science Journal, 4, 530–538. https://doi.org/10.28991/esj-2020-01250 Crossref, Google Scholar
Yang, J, X Su and JW Kolari [2008] Do Euro exchange rates follow a martingale? Some out-of-sample evidence. Journal of Banking & Finance, 32, 729–740. https://doi.org/10.1016/j.jbankfin.2007.05.009 Crossref, Google Scholar
Yang, J, Z Yu and Y Deng [2018] Housing price spillovers in China: A high-dimensional generalized VAR approach. Regional Science and Urban Economics, 68, 98–114. https://doi.org/10.1016/j.regsciurbeco.2017.10.016 Crossref, Google Scholar
Yoo, S, J Im and JE Wagner [2012] Variable selection for hedonic model using machine learning approaches: A case study in Onondaga County, NY. Landscape and Urban Planning, 107, 293–306. https://doi.org/10.1016/j.landurbplan.2012.06.009 Crossref, Google Scholar
Zhang, P, S Hu, W Li, C Zhang, S Yang and S Qu [2021] Modeling fine-scale residential land price distribution: An experimental study using open data and machine learning. Applied Geography, 129, 102442. https://doi.org/10.1016/j.apgeog.2021.102442 Crossref, Google Scholar

Vol. 04, No. 01

Metrics

Downloaded 184 times

History

Received 16 August 2024

Revised 29 October 2024

Accepted 29 October 2024

Published: 26 December 2024

Information

This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC BY) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Keywords

PDF download