The main objective of this vignette is to present the results of the estimation of spatial SUR models using three alternative tools: Two R-packages, spsur (López, Mı́nguez, and Mur 2020; Minguez, Lopez, and Mur \noop{2022}forthcoming) and spse (Piras 2018) together with the Python spatial analysis library PySAL available at https://pysal.org/. Three SUR models are estimated in this vignette. The first one, is the baseline SUR-SIM model, a SUR model without spatial effects. This model will be estimated with spsur and PySAL. The second one is the SUR-SEM model; that is, a SUR model including a spatial lag in the errors. This model will be estimated by Maximum Likelihood (Anselin 1988), using the three alternative tools. The last one is the SUR-SLM model, including a spatial lag of the dependent variable, which will be estimated by the Three Stage Least Squares (3SLS) algorithm (López, Mı́nguez, and Mur 2020; Anselin 2016). This model will be estimated with spsur and PySAL.
The comparison is performed on the Baller et al. (2001) data set1. This is a well-known dataset downloaded from the GeoDa Data and Lab collection with information about homicide rates in 3,085 continental US counties for four years (1960, 1970, 1980, and 1990). The dataset includes a large number of socio-economic characteristics for these counties. This dataset is available from the spsur R-package with the name NCOVR and from PySAL with the name of NAT.dbf.
The model selected to illustrate the results of the estimation is the same as that included in the help area of PySAL.
This section considers the simplest case of the estimation of an SUR model without spatial effects: SUR-SIM. The model specification is the same as that which appears in the help of area of PySAL and we reproduce it in the equation @ref(eq:ncovr):
The R code to estimate equation @ref(eq:ncovr) with the spsur package is:
data("NCOVR", package = "spsur")
formula.spsur <- HR80 | HR90 ~ PS80 + UE80 | PS90 + UE90
control <- list(trace = FALSE)
spsur.sim <- spsurml(formula = formula.spsur, data = NCOVR.sf,
type = "sim", control = control)
The Python code to estimate equation @ref(eq:ncovr) with PySAL is:
db = pysal.open(pysal.examples.get_path("NAT.dbf"),'r')
y_var = ['HR80','HR90']
x_var = [['PS80','UE80'],['PS90','UE90']]
bigy,bigX,bigyvars,bigXvars = pysal.spreg.sur_utils.sur_dictxy(db,y_var,x_var)
w = pysal.knnW_from_shapefile(pysal.examples.get_path("NAT.shp"), k = 10)
w.transform = 'r'
pysal_sim = SUR(bigy,bigX,w=w,iter=True,
name_bigy=bigyvars,name_bigX=bigXvars,spat_diag=True,
name_ds="nat")
Note that the Python code includes the definition of the W matrix. Following Baller et al. (2001) we choose a W matrix based on the k-nearest-neighbors, with k = 10.
Table 2.1 shows the values of the coefficients and the standard error (in parentheses). The full output of both codes is shown in the Appendix. The main result is that no relevant differences are founded. The results of the estimations are similar, both in terms of the parameters and the standard errors, and only extremely small numerical differences appears.
β̂10 | β̂11 | β̂12 | β̂12 | β̂12 | β̂12 | |
---|---|---|---|---|---|---|
spsur | 5.1794 | 0.6775 | 0.2578 | 3.7811 | 1.0243 | 0.3614 |
(0.2595) | (0.1219) | (0.0338) | (0.2531) | (0.1133) | (0.0340) | |
PySAL | 5.1842 | 0.6776 | 0.2571 | 3.7973 | 1.0241 | 0.3590 |
(0.2594) | (0.1219) | (0.0338) | (0.2531) | (0.1133) | (0.0340) |
This section presents the results of the estimation of the SUR-SEM model by Maximum Likelihood with spsur, spse and PySAL. The formal expression of SUR-SEM includes a spatial structure in the residuals of the model @ref(eq:ncovr),
where W is the N × N spatial weighting matrix and λi (i=1,2) are the parameters of spatial dependence. The W matrix has been previously defined in the PySAL code using neighborhood criteria based on the k-nearest-neighbors with k = 10. This W matrix can be imported to the R environment and transformed into an listw object. The W matrix has been standardized in PySAL.
The R code to estimate the SUR-SEM model @ref(eq:sur-sem) with spsur is:
spsur.sem <- spsurml(formula = formula.spsur, data = NCOVR.sf,
listw = listw , type = "sem", control = control)
In order to estimate the same model with spse, the sf object NCOVR.sf must be reordered to transform the data set from a data frame into another one with a structure of panel data.
data <- data.frame(
index_indiv = factor(cbind(paste0("Indv_",rep(1:3085,
each = 2)))),
year = rep(c(1980,1990),3085),
HR = c(rbind(NCOVR.sf$HR80,NCOVR.sf$HR90)),
PS = c(rbind(NCOVR.sf$PS80,NCOVR.sf$PS90)),
UE = c(rbind(NCOVR.sf$UE80,NCOVR.sf$UE90)))
With this data frame, model @ref(eq:sur-sem) can be estimated with spse:
eq <- HR ~ PS + UE
formula.spse <- list(tp1 = eq, tp2 = eq)
spse.sem <- spseml(formula.spse, data = data,
w = listw, model = "error", quiet = TRUE)
Finally, the PySAL code to estimate SUR-SEM model @ref(eq:sur-sem) is:
from spreg import SURerrorML
pysal_sem = SURerrorML(bigy,bigX,w=w,name_bigy=bigyvars,name_bigX=bigXvars,
name_ds="NAT",name_w="nat_queen")
Table 3.1 shows the coefficients and the standard errors obtained in the estimation of equation @ref(eq:sur-sem) with the three alternatives. The results are very similar, and only small differences appear when the results of spse are compared with spsur and PySAL. The use of different optimization routines is a possible source of the these small numerical differences. For example, the spsur optimizes the concentrated Log-Likelihood with the bobyqa() from minqa (Bates et al. 2014) while spse uses nlminb() from the stats package (R Core Team 2020). The full output of both codes is shown in the Appendix.
β̂10 | β̂11 | β̂12 | β̂20 | β̂21 | β̂22 | λ̂1 | λ̂2 | |
---|---|---|---|---|---|---|---|---|
spsur | 3.9998 | 1.0185 | 0.4313 | 3.1256 | 1.1626 | 0.4532 | 0.6680 | 0.6252 |
(0.4256) | (0.1414) | (0.0436) | (0.3753) | (0.1354) | (0.0408) | (0.0216) | (0.0233) | |
spse | 4.0458 | 1.0090 | 0.4247 | 3.1730 | 1.1609 | 0.4462 | 0.6550 | 0.6245 |
(0.4179) | (0.1414) | (0.0435) | (0.3754) | (0.1357) | (0.0408) | (0.0187) | (0.0197) | |
PySAL | 3.9996 | 1.0186 | 0.4314 | 3.1253 | 1.1626 | 0.4533 | 0.6680 | 0.6252 |
(0.4256) | (0.1414) | (0.0436) | (0.3753) | (0.1354) | (0.0408) | (NA) | (NA) |
The option to estimate an SUR-SLM model with the 3SLS algorithm (Anselin 2016; López, Mı́nguez, and Mur 2020) is available with spsur and PySAL. The specification of this model in our case is:
where ρi (i=1,2) are the parameters of spatial dependence. The R code to estimate SUR-SLM (@ref(eq:sur-slm)) using the 3SLS algorithm is:
spsur.slm.3sls <- spsur3sls(formula = formula.spsur, data = NCOVR.sf,
listw = listw , type = "slm", trace = FALSE)
The code to estimate the equation (@ref(eq:sur-slm)) with PySAL is,
from spreg import SURlagIV
pysal_iv = SURlagIV(bigy,bigX,w=w,w_lags=2,name_bigy=bigyvars,
name_bigX=bigXvars,name_ds="NAT",name_w="nat_queen")
Note that the instruments used by default with the function spsur3sls() are the first two spatial lags of the independent variables (see López, Mı́nguez, and Mur (2020)) while the function SURlagIV() in PySAL considers only the first one by default (see Anselin (2016)). Therefore, to obtain equivalent results, it is necessary to include the option w_lags = 2 in the PySAL code.
Table 4.1 shows the coefficients and standard error (in parentheses) of the estimation with spsur and PySAL. As in the case of the SUR-SEM estimation, minimal differences are founded. The results are practically identical. The full output of both codes is shown in the Appendix.
β̂10 | β̂11 | β̂12 | β̂20 | β̂21 | β̂22 | ρ̂1 | ρ̂2 | |
---|---|---|---|---|---|---|---|---|
spsur | 3.6000 | 0.5932 | 0.2913 | 2.3915 | 0.8871 | 0.3626 | 0.1957 | 0.2230 |
(1.7539) | (0.1687) | (0.0409) | (0.3649) | (0.1155) | (0.0417) | (0.2611) | (0.0727) | |
PySAL | 3.6000 | 0.5932 | 0.2913 | 2.3915 | 0.8871 | 0.3626 | 0.1957 | 0.2230 |
(1.7536) | (0.1687) | (0.0409) | (0.3648) | (0.1155) | (0.0417) | (0.2610) | (0.0727) |
In this vignette, a numerical check to compare the results of several spatial SUR models estimations is shown. Fortunately, some functionalities of spsur are also available in the spse package and also in PySAL so the user can choose. The well-known data set (Baller et al. 2001) is used with the objective of comparing the values of the estimated coefficients and standard errors. The results confirm that the three alternatives supply identical outputs with extremely small numerical differences.
## Call:
## spsurml(formula = formula.spsur, data = NCOVR.sf, type = "sim",
## control = control)
##
##
## Spatial SUR model type: sim
##
## Equation 1
## Estimate Std. Error t value Pr(>|t|)
## (Intercept)_1 5.179417 0.259455 19.9627 < 2.2e-16 ***
## PS80_1 0.677534 0.121932 5.5567 2.865e-08 ***
## UE80_1 0.257775 0.033814 7.6233 2.846e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## R-squared: 0.02502
## Equation 2
## Estimate Std. Error t value Pr(>|t|)
## (Intercept)_2 3.781120 0.253129 14.938 < 2.2e-16 ***
## PS90_2 1.024287 0.113331 9.038 < 2.2e-16 ***
## UE90_2 0.361394 0.034047 10.614 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## R-squared: 0.1099
##
## Variance-Covariance Matrix of inter-equation residuals:
## 45.43619 21.56823
## 21.56823 39.72187
## Correlation Matrix of inter-equation residuals:
## 1.0000000 0.5076902
## 0.5076902 1.0000000
##
## R-sq. pooled: 0.06654
## Breusch-Pagan: 795.9 p-value: (4.3e-175)
## REGRESSION
## ----------
## SUMMARY OF OUTPUT: SEEMINGLY UNRELATED REGRESSIONS (SUR)
## --------------------------------------------------------
## Data set : nat
## Weights matrix : unknown
## Number of Equations : 2 Number of Observations: 3085
## Log likelihood (SUR): -19860.068 Number of Iterations : 3
## ----------
##
## SUMMARY OF EQUATION 1
## ---------------------
## Dependent Variable : HR80 Number of Variables : 3
## Mean dependent var : 6.9276 Degrees of Freedom : 3082
## S.D. dependent var : 6.8251
##
## ------------------------------------------------------------------------------------
## Variable Coefficient Std.Error z-Statistic Probability
## ------------------------------------------------------------------------------------
## Constant_1 5.1842323 0.2593924 19.9860602 0.0000000
## PS80 0.6775792 0.1219113 5.5579678 0.0000000
## UE80 0.2570650 0.0338051 7.6043173 0.0000000
## ------------------------------------------------------------------------------------
##
## SUMMARY OF EQUATION 2
## ---------------------
## Dependent Variable : HR90 Number of Variables : 3
## Mean dependent var : 6.1829 Degrees of Freedom : 3082
## S.D. dependent var : 6.6403
##
## ------------------------------------------------------------------------------------
## Variable Coefficient Std.Error z-Statistic Probability
## ------------------------------------------------------------------------------------
## Constant_2 3.7973181 0.2531089 15.0027035 0.0000000
## PS90 1.0241120 0.1133298 9.0365598 0.0000000
## UE90 0.3589567 0.0340440 10.5438928 0.0000000
## ------------------------------------------------------------------------------------
##
##
## REGRESSION DIAGNOSTICS
## TEST DF VALUE PROB
## LM test on Sigma 1 680.168 0.0000
## LR test on Sigma 1 854.181 0.0000
##
## OTHER DIAGNOSTICS - CHOW TEST BETWEEN EQUATIONS
## VARIABLES DF VALUE PROB
## Constant_1, Constant_2 1 23.457 0.0000
## PS80, PS90 1 8.700 0.0032
## UE80, UE90 1 6.843 0.0089
##
## DIAGNOSTICS FOR SPATIAL DEPENDENCE
## TEST DF VALUE PROB
## Lagrange Multiplier (error) 2 2278.632 0.0000
## Lagrange Multiplier (lag) 2 2153.976 0.0000
##
## ERROR CORRELATION MATRIX
## EQUATION 1 EQUATION 2
## 1.000000 0.507913
## 0.507913 1.000000
## ================================ END OF REPORT =====================================
## Call:
## spsurml(formula = formula.spsur, data = NCOVR.sf, listw = listw,
## type = "sem", control = control)
##
##
## Spatial SUR model type: sem
##
## Equation 1
## Estimate Std. Error t value Pr(>|t|)
## (Intercept)_1 3.999783 0.425587 9.3983 < 2.2e-16 ***
## PS80_1 1.018523 0.141382 7.2040 6.544e-13 ***
## UE80_1 0.431340 0.043645 9.8829 < 2.2e-16 ***
## lambda_1 0.667989 0.021592 30.9374 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## R-squared: 0.3561
## Equation 2
## Estimate Std. Error t value Pr(>|t|)
## (Intercept)_2 3.125570 0.375304 8.3281 < 2.2e-16 ***
## PS90_2 1.162587 0.135392 8.5868 < 2.2e-16 ***
## UE90_2 0.453225 0.040766 11.1178 < 2.2e-16 ***
## lambda_2 0.625186 0.023276 26.8594 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## R-squared: 0.3724
##
## Variance-Covariance Matrix of inter-equation residuals:
## 30.711725 8.751191
## 8.751191 28.408108
## Correlation Matrix of inter-equation residuals:
## 1.0000000 0.2962743
## 0.2962743 1.0000000
##
## R-sq. pooled: 0.3658
## Breusch-Pagan: 270.8 p-value: (7.63e-61)
## LMM: 267.61 p-value: (3.76e-60)
##
## Simultaneous Equations Model:
##
## Call:
## spseml(formula = formula.spse, data = data, w = listw, quiet = TRUE,
## model = "error")
##
## Equation 1
## Estimate Std.Error t value Pr(>|t|)
## (Intercept) 4.045752 0.417886 9.6815 < 2.2e-16 ***
## PS 1.009015 0.141429 7.1344 9.719e-13 ***
## UE 0.424679 0.043513 9.7598 < 2.2e-16 ***
##
## Spatial autocorrelation coefficient: 0.655 Pr(>|t|) 0
##
## _______________________________________________________
##
## Equation 2
## Estimate Std.Error t value Pr(>|t|)
## (Intercept) 3.172955 0.375401 8.4522 < 2.2e-16 ***
## PS 1.160865 0.135689 8.5553 < 2.2e-16 ***
## UE 0.446204 0.040767 10.9452 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Spatial autocorrelation coefficient: 0.6245 Pr(>|t|) 0
##
## _______________________________________________________
## REGRESSION
## ----------
## SUMMARY OF OUTPUT: SEEMINGLY UNRELATED REGRESSIONS (SUR) - ML SPATIAL ERROR MODEL
## ---------------------------------------------------------------------------------
## Data set : NAT
## Weights matrix : nat_queen
## Number of Equations : 2 Number of Observations: 3085
## Log likelihood (SUR): -19860.068
## Log likel. (error) : -19344.228 Log likel. (SUR error): -19215.933
## ----------
##
## SUMMARY OF EQUATION 1
## ---------------------
## Dependent Variable : HR80 Number of Variables : 3
## Mean dependent var : 6.9276 Degrees of Freedom : 3082
## S.D. dependent var : 6.8251
##
## ------------------------------------------------------------------------------------
## Variable Coefficient Std.Error z-Statistic Probability
## ------------------------------------------------------------------------------------
## Constant_1 3.9995930 0.4255575 9.3984776 0.0000000
## PS80 1.0185647 0.1413612 7.2054039 0.0000000
## UE80 0.4313678 0.0436389 9.8849267 0.0000000
## lambda_1 0.6680433
## ------------------------------------------------------------------------------------
##
## SUMMARY OF EQUATION 2
## ---------------------
## Dependent Variable : HR90 Number of Variables : 3
## Mean dependent var : 6.1829 Degrees of Freedom : 3082
## S.D. dependent var : 6.6403
##
## ------------------------------------------------------------------------------------
## Variable Coefficient Std.Error z-Statistic Probability
## ------------------------------------------------------------------------------------
## Constant_2 3.1253307 0.3752706 8.3282066 0.0000000
## PS90 1.1625986 0.1353719 8.5881796 0.0000000
## UE90 0.4532595 0.0407597 11.1202836 0.0000000
## lambda_2 0.6252406
## ------------------------------------------------------------------------------------
##
##
## REGRESSION DIAGNOSTICS
## TEST DF VALUE PROB
## LR test on Sigma 1 256.591 0.0000
##
## OTHER DIAGNOSTICS - CHOW TEST BETWEEN EQUATIONS
## VARIABLES DF VALUE PROB
## Constant_1, Constant_2 1 3.111 0.0778
## PS80, PS90 1 0.767 0.3812
## UE80, UE90 1 0.165 0.6847
##
## ERROR CORRELATION MATRIX
## EQUATION 1 EQUATION 2
## 1.000000 0.296257
## 0.296257 1.000000
## ================================ END OF REPORT =====================================
## Call:
## spsur3sls(formula = formula.spsur, data = NCOVR.sf, listw = listw,
## type = "slm", trace = FALSE)
##
##
## Spatial SUR model type: slm
##
## Equation 1
## Estimate Std. Error t value Pr(>|t|)
## (Intercept)_1 3.600033 1.753859 2.0526 0.040150 *
## PS80_1 0.593229 0.168744 3.5156 0.000442 ***
## UE80_1 0.291323 0.040887 7.1250 1.16e-12 ***
## rho_1 0.195653 0.261078 0.7494 0.453643
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## R-squared: 0.1995
## Equation 2
## Estimate Std. Error t value Pr(>|t|)
## (Intercept)_2 2.391472 0.364855 6.5546 6.033e-11 ***
## PS90_2 0.887118 0.115519 7.6794 1.848e-14 ***
## UE90_2 0.362642 0.041738 8.6885 < 2.2e-16 ***
## rho_2 0.222994 0.072671 3.0686 0.00216 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## R-squared: 0.2807
##
## Variance-Covariance Matrix of inter-equation residuals:
## 59.15973 21.01769
## 21.01769 34.68056
## Correlation Matrix of inter-equation residuals:
## 1.0000000 0.4397986
## 0.4397986 1.0000000
##
## R-sq. pooled: 0.2385
## Breusch-Pagan: 664.2 p-value: (1.8e-146)
## REGRESSION
## ----------
## SUMMARY OF OUTPUT: SEEMINGLY UNRELATED REGRESSIONS (SUR) - SPATIAL LAG MODEL
## ----------------------------------------------------------------------------
## Data set : NAT
## Weights matrix : nat_queen
## Number of Equations : 2 Number of Observations: 3085
## ----------
##
## SUMMARY OF EQUATION 1
## ---------------------
## Dependent Variable : HR80 Number of Variables : 4
## Mean dependent var : 6.9276 Degrees of Freedom : 3081
## S.D. dependent var : 6.8251
##
## ------------------------------------------------------------------------------------
## Variable Coefficient Std.Error z-Statistic Probability
## ------------------------------------------------------------------------------------
## Constant_1 3.6000333 1.7535743 2.0529688 0.0400756
## PS80 0.5932292 0.1687164 3.5161321 0.0004379
## UE80 0.2913228 0.0408809 7.1261406 0.0000000
## W_HR80 0.1956528 0.2610360 0.7495244 0.4535412
## ------------------------------------------------------------------------------------
## Instrumented: W_HR80
## Instruments: WW_PS80, WW_UE80, W_PS80, W_UE80
##
## SUMMARY OF EQUATION 2
## ---------------------
## Dependent Variable : HR90 Number of Variables : 4
## Mean dependent var : 6.1829 Degrees of Freedom : 3081
## S.D. dependent var : 6.6403
##
## ------------------------------------------------------------------------------------
## Variable Coefficient Std.Error z-Statistic Probability
## ------------------------------------------------------------------------------------
## Constant_2 2.3914725 0.3647963 6.5556384 0.0000000
## PS90 0.8871183 0.1155004 7.6806495 0.0000000
## UE90 0.3626418 0.0417315 8.6898895 0.0000000
## W_HR90 0.2229941 0.0726587 3.0690610 0.0021473
## ------------------------------------------------------------------------------------
## Instrumented: W_HR90
## Instruments: WW_PS90, WW_UE90, W_PS90, W_UE90
##
##
## OTHER DIAGNOSTICS - CHOW TEST BETWEEN EQUATIONS
## VARIABLES DF VALUE PROB
## Constant_1, Constant_2 1 0.496 0.4811
## PS80, PS90 1 3.288 0.0698
## UE80, UE90 1 1.945 0.1631
## W_HR80, W_HR90 1 0.011 0.9147
##
## ERROR CORRELATION MATRIX
## EQUATION 1 EQUATION 2
## 1.000000 0.464012
## 0.464012 1.000000
## ================================ END OF REPORT =====================================