--- title: "spsur vs spatialreg" subtitle: "Uniequational model


" author: - Román Mínguez, University of Castilla-La Mancha (Spain) - Fernando A. López, Technical University of Cartagena (Spain) - Jesús Mur, University of Zaragoza, (Spain)


date: "2022-04-22 17:52:04


" output: bookdown::html_document2: df_print: kable highlight: tango number_sections: yes theme: flatly toc: yes toc_depth: 2 toc_float: collapsed: no smooth_scroll: yes toc_title: spsur vs spse # code_folding: hide bibliography: bibliosure.bib link-citations: yes linkcolor: red vignette: > %\VignetteIndexEntry{spsur vs spatialreg} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```r library(spsur) library(spdep) library(spatialreg) library(sf) ``` # Introduction The **spsur** R-package (@spsur_jss_forthcoming) has been designed for estimate spatial regression models in a multiequational framework. However, because of its flexibility, it is also possible to obtain useful results for uniequational models. On the oher hand, the **spatialreg** package ( @BivandPedesma2013; @BivandPiras2015), is the most adequate alternative for working with uniequational models, in a pure cross-sectional setting. The purpose of this vignette is to compare the results of **spsur** and **spatialreg** in case of *uniequational models*. As we will see, the differences between the two are negligible. In sum, the purpose of this vignette is to compare the results of **spsur** and **spatialreg** in the following issues: * Estimation of spatially independent model, or SIM models. * Lagrange Multiplier misspecification tests for SIM models. * ML estimation of SLM, SEM, SARAR models. * 3SLS estimation of SLM and SDM models. # The data set NCOVR Throughout the vignette we use the dataset NCOVR (National Consortium on Violence Research). These data were employed by @Baller2001 to analyze the incidence of homicides rates in the US counties. The dataset can be freely dowloaded from **https://geodacenter.github.io/data-and-lab/ncovr/** NCOVR contains 3085 spatial units (counties), for 4 different cross-sections (1960, 1970, 1980, 1990) and 69 variables. According to @Baller2001, the main variables of interest are: * HR: homicide rate per 100000 inhabitants * RD: resource deprivation * PS: population structure * MA: median age * DV: divorce rate (% males over 14 divorced) * UE: unemployment rate * SOUTH: dummy variable for Southern counties (South = 1) First, we can read the NCOVR dataset as a simple feature (sf) object (named NCOVR.sf), ```r data(NCOVR, package = "spsur") ``` The first three observations in NCOVR appear below: ```r NCOVR <- st_drop_geometry(NCOVR.sf) knitr::kable( head((NCOVR[1:3, 1:6])), caption = 'First observations of NCOVR dataset' ) ``` Table: First observations of NCOVR dataset |NAME |STATE_NAME |FIPS | SOUTH| HR60| HR70| |:-----------------|:----------|:-----|-----:|--------:|--------:| |Lake of the Woods |Minnesota |27077 | 0| 0.000000| 0.000000| |Ferry |Washington |53019 | 0| 0.000000| 0.000000| |Stevens |Washington |53065 | 0| 1.863863| 1.915158| Whereas the geometry of the USA counties is shown in Figure \@ref(fig:plotgeom): ```r plot(st_geometry(NCOVR.sf)) ``` Geometry of the USA counties Following @Baller2001, we consider a **W** matrix based on the criterion of 10 nearest-neighbourhood, which is immediate to obtain using the **spdep** package (@BivandPedesma2013; @BivandWong2018). The resulting weighting matrix will be called *listw*. Note that this matrix is non-symmetric and it is row-standardized. ```r # Obtain coordinates of centroids co <- sf::st_coordinates(sf::st_centroid(NCOVR.sf)) listw <- spdep::nb2listw(spdep::knn2nb(spdep::knearneigh(co, k = 10,longlat = TRUE))) ``` @Baller2001 specify a single linear model to explain the case of HR in the year 1960 with the results shown in the Table below, \begin{equation} HR_{60} = \beta_{0}+\beta_{1}RD_{60}+\beta_{2}PS_{60}+\beta_{3}MA_{60}+\beta_{4}DV_{60} +\beta_{5}UE_{60} +\beta_{6}SOUTH+\epsilon_{60} (\#eq:ols) \end{equation} ```r formula_60 <- HR60 ~ RD60 + PS60 + MA60 + DV60 + UE60 + SOUTH lm_60 <- stats::lm(formula = formula_60, data = NCOVR.sf) summary(lm_60) ``` ``` #> #> Call: #> stats::lm(formula = formula_60, data = NCOVR.sf) #> #> Residuals: #> Min 1Q Median 3Q Max #> -14.026 -2.217 -0.635 1.393 88.312 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 8.12591 0.63465 12.804 < 2e-16 #> RD60 1.79824 0.12341 14.571 < 2e-16 #> PS60 0.35871 0.09216 3.892 0.000101 #> MA60 -0.23047 0.01932 -11.931 < 2e-16 #> DV60 1.16002 0.09483 12.233 < 2e-16 #> UE60 -0.06195 0.03515 -1.762 0.078138 #> SOUTH 2.63862 0.23325 11.312 < 2e-16 #> #> (Intercept) *** #> RD60 *** #> PS60 *** #> MA60 *** #> DV60 *** #> UE60 . #> SOUTH *** #> --- #> Signif. codes: #> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 4.743 on 3078 degrees of freedom #> Multiple R-squared: 0.2966, Adjusted R-squared: 0.2952 #> F-statistic: 216.3 on 6 and 3078 DF, p-value: < 2.2e-16 ``` The model can be estimated in usual way with the function `stats::lm` (@R2019), ```r formula_60 <- HR60 ~ RD60 + PS60 + MA60 + DV60 + UE60 + SOUTH lm_60 <- lm(formula = formula_60, data = NCOVR.sf) summary(lm_60) ``` ``` #> #> Call: #> lm(formula = formula_60, data = NCOVR.sf) #> #> Residuals: #> Min 1Q Median 3Q Max #> -14.026 -2.217 -0.635 1.393 88.312 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 8.12591 0.63465 12.804 < 2e-16 #> RD60 1.79824 0.12341 14.571 < 2e-16 #> PS60 0.35871 0.09216 3.892 0.000101 #> MA60 -0.23047 0.01932 -11.931 < 2e-16 #> DV60 1.16002 0.09483 12.233 < 2e-16 #> UE60 -0.06195 0.03515 -1.762 0.078138 #> SOUTH 2.63862 0.23325 11.312 < 2e-16 #> #> (Intercept) *** #> RD60 *** #> PS60 *** #> MA60 *** #> DV60 *** #> UE60 . #> SOUTH *** #> --- #> Signif. codes: #> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 4.743 on 3078 degrees of freedom #> Multiple R-squared: 0.2966, Adjusted R-squared: 0.2952 #> F-statistic: 216.3 on 6 and 3078 DF, p-value: < 2.2e-16 ``` The same results can be obtained with the function `spsurml` from **spsur**, selecting the argument *type = "sim"* ```r ols.spsur <- spsurml(formula = formula_60, type = "sim", data = NCOVR.sf) ``` ``` #> Initial point: #> log_lik: -9176.338 #> Iteration: 1 log_lik: -9176.338 #> Time to fit the model: 0.05 seconds #> Time to compute covariances: 0.01 seconds ``` ```r summary(ols.spsur) ``` ``` #> Call: #> spsurml(formula = formula_60, data = NCOVR.sf, type = "sim") #> #> #> Spatial SUR model type: sim #> #> Equation 1 #> Estimate Std. Error t value #> (Intercept)_1 8.125915 0.634033 12.8162 #> RD60_1 1.798240 0.123294 14.5850 #> PS60_1 0.358706 0.092069 3.8960 #> MA60_1 -0.230475 0.019298 -11.9428 #> DV60_1 1.160020 0.094737 12.2446 #> UE60_1 -0.061948 0.035120 -1.7639 #> SOUTH_1 2.638618 0.233024 11.3234 #> Pr(>|t|) #> (Intercept)_1 < 2.2e-16 *** #> RD60_1 < 2.2e-16 *** #> PS60_1 9.986e-05 *** #> MA60_1 < 2.2e-16 *** #> DV60_1 < 2.2e-16 *** #> UE60_1 0.07785 . #> SOUTH_1 < 2.2e-16 *** #> --- #> Signif. codes: #> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> R-squared: 0.2966 #> #> Residual standard error: 4.739 ``` The two functions, `stats::lm` and `spsurml`, produce identical results. The output of `spsurml` is shorter than that of `lm` so, depending on the necessities of the user, he/she can choose between `lm` and `spsurml` without loss of information. If you want all the estimation details, then use `lm`. # LM tests for spatial autocorrelation The existence of omitted spatial dependence in the results of a SIM model, estimated by LS, can be tested by using the classical LM tests. The function `spdep::lm.LMtests` report the values of these Lagrange Multipliers ```r lmtest.spdep <- spdep::lm.LMtests(lm_60, listw, test = "all") print(lmtest.spdep) ``` ``` #> #> Lagrange multiplier diagnostics for spatial #> dependence #> #> data: #> model: lm(formula = formula_60, data = #> NCOVR.sf) #> weights: listw #> #> LMerr = 208.32, df = 1, p-value < 2.2e-16 #> #> #> Lagrange multiplier diagnostics for spatial #> dependence #> #> data: #> model: lm(formula = formula_60, data = #> NCOVR.sf) #> weights: listw #> #> LMlag = 232.36, df = 1, p-value < 2.2e-16 #> #> #> Lagrange multiplier diagnostics for spatial #> dependence #> #> data: #> model: lm(formula = formula_60, data = #> NCOVR.sf) #> weights: listw #> #> RLMerr = 1.1473, df = 1, p-value = 0.2841 #> #> #> Lagrange multiplier diagnostics for spatial #> dependence #> #> data: #> model: lm(formula = formula_60, data = #> NCOVR.sf) #> weights: listw #> #> RLMlag = 25.19, df = 1, p-value = 5.196e-07 #> #> #> Lagrange multiplier diagnostics for spatial #> dependence #> #> data: #> model: lm(formula = formula_60, data = #> NCOVR.sf) #> weights: listw #> #> SARMA = 233.51, df = 2, p-value < 2.2e-16 ``` The same tests can be obtained with the function `lmtestspsur` ```r lmtest.spsur <- lmtestspsur(formula = formula_60, listw = listw, data = NCOVR.sf) print(lmtest.spsur) ``` ``` #> [[1]] #> #> LM-SUR-SLM #> #> data: NCOVR.sf #> LM-stat = 232.22, df = 1, p-value < 2.2e-16 #> #> #> [[2]] #> #> LM-SUR-SEM #> #> data: NCOVR.sf #> LM-stat = 208.19, df = 1, p-value < 2.2e-16 #> #> #> [[3]] #> #> LM*-SUR-SLM #> #> data: NCOVR.sf #> LM-stat = 25.181, df = 1, p-value = #> 5.218e-07 #> #> #> [[4]] #> #> LM*-SUR-SEM #> #> data: NCOVR.sf #> LM-stat = 1.1431, df = 1, p-value = 0.285 #> #> #> [[5]] #> #> LM-SUR-SARAR #> #> data: NCOVR.sf #> LM-stat = 233.37, df = 2, p-value < 2.2e-16 ``` Note that the ordering of the battery of Lagrange Multipliers is not the same. Otherwise, the results are almost identical. # The Spatial Lag Model Both R packages **spatialreg** and **spsur** can estimate Spatial Lag Models for a single cross-section. Continuing with the example before, the model that we want to estimate is: $$ \begin{equation} HR_{60} = \rho W HR_{60} +\beta_{0}+\beta_{1}RD_{60}+\beta_{2}PS_{60}+\beta_{3}MA_{60}+\beta_{4}DV_{60} +\beta_{5}UE_{60} +\beta_{6}SOUTH+\epsilon_{60}\ (\#eq:slm) \end{equation} $$ The ML estimation of equation \@ref(eq:slm) using the function `spsurml()` of **spsur** renders the following results: ```r slm.spsur <- spsur::spsurml(formula = formula_60, type = "slm", listw = listw, data = NCOVR.sf) ``` ``` #> neighbourhood matrix eigenvalues #> Computing eigenvalues ... #> #> Initial point: log_lik: -9101.642 rhos: 0.364 #> Iteration: 1 log_lik: -9098.484 rhos: 0.371 #> Iteration: 2 log_lik: -9098.484 rhos: 0.371 #> Time to fit the model: 1.06 seconds #> Time to compute covariances: 5.25 seconds ``` ```r summary(slm.spsur) ``` ``` #> Call: #> spsur::spsurml(formula = formula_60, data = NCOVR.sf, listw = listw, #> type = "slm") #> #> #> Spatial SUR model type: slm #> #> Equation 1 #> Estimate Std. Error t value #> (Intercept)_1 5.473546 0.647627 8.4517 #> RD60_1 1.357784 0.125733 10.7989 #> PS60_1 0.284344 0.089317 3.1835 #> MA60_1 -0.165429 0.019384 -8.5342 #> DV60_1 0.884949 0.093721 9.4424 #> UE60_1 -0.021774 0.034003 -0.6404 #> SOUTH_1 1.354664 0.244606 5.5382 #> rho_1 0.370632 0.030271 12.2436 #> Pr(>|t|) #> (Intercept)_1 < 2.2e-16 *** #> RD60_1 < 2.2e-16 *** #> PS60_1 0.001469 ** #> MA60_1 < 2.2e-16 *** #> DV60_1 < 2.2e-16 *** #> UE60_1 0.521983 #> SOUTH_1 3.314e-08 *** #> rho_1 < 2.2e-16 *** #> --- #> Signif. codes: #> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> R-squared: 0.3409 #> #> Residual standard error: 4.588 #> LMM: 22.483 p-value: (2.12e-06) ``` The output of the function `spatialreg::lagsarlm()` from **spatialreg** is ```r slm.spatialreg <- spatialreg::lagsarlm(formula = formula_60, listw = listw, type = "lag", data = NCOVR.sf) summary(slm.spatialreg) ``` ``` #> #> Call: #> spatialreg::lagsarlm(formula = formula_60, data = NCOVR.sf, listw = listw, #> type = "lag") #> #> Residuals: #> Min 1Q Median 3Q Max #> -13.54486 -2.11156 -0.63891 1.30728 88.57308 #> #> Type: lag #> Coefficients: (asymptotic standard errors) #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) 5.473303 0.647529 8.4526 < 2.2e-16 #> RD60 1.357744 0.125714 10.8002 < 2.2e-16 #> PS60 0.284338 0.089303 3.1840 0.001453 #> MA60 -0.165423 0.019381 -8.5352 < 2.2e-16 #> DV60 0.884924 0.093706 9.4436 < 2.2e-16 #> UE60 -0.021771 0.033998 -0.6404 0.521938 #> SOUTH 1.354546 0.244570 5.5385 3.051e-08 #> #> Rho: 0.37067, LR test value: 155.71, p-value: < 2.22e-16 #> Asymptotic standard error: 0.03027 #> z-value: 12.245, p-value: < 2.22e-16 #> Wald statistic: 149.95, p-value: < 2.22e-16 #> #> Log likelihood: -9098.484 for lag model #> ML residual variance (sigma squared): 21.041, (sigma: 4.5871) #> Number of observations: 3085 #> Number of parameters estimated: 9 #> AIC: 18215, (AIC for lm: 18369) #> LM test for residual autocorrelation #> test value: 22.524, p-value: 2.0753e-06 ``` Once again, the two estimations are almost the same. The estimated log-likelihood of the SLM can be recovered using the function `logLik()` ```r logLik(slm.spsur) ``` ``` #> 'log Lik.' -9098.484 (df=9) ``` The log-likelihood corresponding to the SIM model is much lower, -9176.338, which points to a severe misspecification in the last model. More formally, the LR, obtained with the function `anova` of **spsur** strongly rejects the SIM model in favour of the SLM alternative; the AIC and BIC statistics indicates the same. ```r anova(ols.spsur, slm.spsur) ``` ``` #> logLik df AIC BIC LRtest #> model 1: sim -9176.3 8 18369 18353 #> model 2: slm -9098.5 9 18215 18197 155.71 #> p.val #> model 1: sim #> model 2: slm 9.805e-36 ``` The SLM model can also be estimate by Three Stages-Least-Squares (3SLS) by both R-packages. The function for **spsur** is `spsur3sls()` ```r slm.3sls.spsur <- spsur3sls(formula = formula_60, type = "slm", listw = listw, data = NCOVR.sf) ``` ``` #> Time to fit the model: 0.05 seconds ``` ```r summary(slm.3sls.spsur) ``` ``` #> Call: #> spsur3sls(formula = formula_60, data = NCOVR.sf, listw = listw, #> type = "slm") #> #> #> Spatial SUR model type: slm #> #> Equation 1 #> Estimate Std. Error t value #> (Intercept)_1 3.99354886 0.80980919 4.9315 #> RD60_1 1.11201434 0.14804917 7.5111 #> PS60_1 0.24285142 0.09010103 2.6953 #> MA60_1 -0.12913458 0.02271825 -5.6842 #> DV60_1 0.73146230 0.10670843 6.8548 #> UE60_1 0.00064201 0.03483656 0.0184 #> SOUTH_1 0.63822931 0.34132155 1.8699 #> rho_1 0.57744206 0.07411084 7.7916 #> Pr(>|t|) #> (Intercept)_1 8.594e-07 *** #> RD60_1 7.638e-14 *** #> PS60_1 0.00707 ** #> MA60_1 1.437e-08 *** #> DV60_1 8.595e-12 *** #> UE60_1 0.98530 #> SOUTH_1 0.06160 . #> rho_1 8.979e-15 *** #> --- #> Signif. codes: #> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> R-squared: 0.3447 #> #> Residual standard error: 4.574 ``` Whereas **spatialreg** uses the function `spatialreg::stsls()` ```r slm.3sls.spatialreg <- spatialreg::stsls(formula = formula_60, listw = listw, data = NCOVR.sf) summary(slm.3sls.spatialreg) ``` ``` #> #> Call: #> spatialreg::stsls(formula = formula_60, data = NCOVR.sf, listw = listw) #> #> Residuals: #> Min 1Q Median 3Q Max #> -14.79844 -2.06529 -0.57084 1.22908 88.71872 #> #> Coefficients: #> Estimate Std. Error t value #> Rho 0.57744206 0.07419509 7.7828 #> (Intercept) 3.99354886 0.81072980 4.9259 #> RD60 1.11201434 0.14821748 7.5026 #> PS60 0.24285142 0.09020346 2.6923 #> MA60 -0.12913458 0.02274408 -5.6777 #> DV60 0.73146230 0.10682974 6.8470 #> UE60 0.00064201 0.03487617 0.0184 #> SOUTH 0.63822931 0.34170958 1.8678 #> Pr(>|t|) #> Rho 7.105e-15 #> (Intercept) 8.399e-07 #> RD60 6.262e-14 #> PS60 0.007097 #> MA60 1.365e-08 #> DV60 7.542e-12 #> UE60 0.985313 #> SOUTH 0.061796 #> #> Residual variance (sigma squared): 20.966, (sigma: 4.5788) ``` There is hardly any difference between them because the estimation algorithm is quasilinear and both functions use the same set of instruments. The case of the SDM model produces identical results. # Spatial Error Model The model to estimate in this case is: $$ \begin{equation} HR_{60} = \beta_{0}+\beta_{1}RD_{60}+\beta_{2}PS_{60}+\beta_{3}MA_{60}+\beta_{4}DV_{60} +\beta_{5}UE_{60} +\beta_{6}SOUTH+ u_{60} \\ u_{60} = \lambda W u_{60} + \epsilon_{60}\ (\#eq:sem) \end{equation} $$ which can be solved by ML using the function `spatialreg::errorsarlm()`, from **spatialreg**. ```r sem.spatialreg <- spatialreg::errorsarlm(formula = formula_60, listw = listw, data = NCOVR.sf) summary(sem.spatialreg) ``` ``` #> #> Call: #> spatialreg::errorsarlm(formula = formula_60, data = NCOVR.sf, #> listw = listw) #> #> Residuals: #> Min 1Q Median 3Q Max #> -13.54039 -2.10737 -0.66853 1.27843 88.31442 #> #> Type: error #> Coefficients: (asymptotic standard errors) #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) 7.527324 0.766690 9.8179 < 2.2e-16 #> RD60 1.742345 0.148640 11.7219 < 2.2e-16 #> PS60 0.346998 0.109230 3.1768 0.001489 #> MA60 -0.208238 0.023391 -8.9024 < 2.2e-16 #> DV60 0.957270 0.108387 8.8320 < 2.2e-16 #> UE60 0.018167 0.039935 0.4549 0.649165 #> SOUTH 2.494740 0.316266 7.8881 3.109e-15 #> #> Lambda: 0.38258, LR test value: 140.59, p-value: < 2.22e-16 #> Asymptotic standard error: 0.032059 #> z-value: 11.934, p-value: < 2.22e-16 #> Wald statistic: 142.41, p-value: < 2.22e-16 #> #> Log likelihood: -9106.043 for error model #> ML residual variance (sigma squared): 21.123, (sigma: 4.596) #> Number of observations: 3085 #> Number of parameters estimated: 9 #> AIC: 18230, (AIC for lm: 18369) ``` **spsur** always uses the same function for the ML algorithm, `spsurml()`; you only have to change the type of model to estimate. ```r sem.spsur <- spsurml(formula = formula_60, type = "sem", listw = listw, data = NCOVR.sf) ``` ``` #> neighbourhood matrix eigenvalues #> Computing eigenvalues ... #> #> Initial point: log_lik: -9108.833 lambdas: 0.375 #> Iteration: 1 log_lik: -9106.044 lambdas: 0.382 #> Iteration: 2 log_lik: -9106.043 lambdas: 0.383 #> Time to fit the model: 1.89 seconds #> Time to compute covariances: 66.02 seconds ``` ```r summary(sem.spsur) ``` ``` #> Call: #> spsurml(formula = formula_60, data = NCOVR.sf, listw = listw, #> type = "sem") #> #> #> Spatial SUR model type: sem #> #> Equation 1 #> Estimate Std. Error t value #> (Intercept)_1 7.527394 0.766797 9.8167 #> RD60_1 1.742351 0.148661 11.7203 #> PS60_1 0.346996 0.109246 3.1763 #> MA60_1 -0.208241 0.023395 -8.9013 #> DV60_1 0.957295 0.108403 8.8309 #> UE60_1 0.018158 0.039940 0.4546 #> SOUTH_1 2.494771 0.316304 7.8873 #> lambda_1 0.382536 0.032105 11.9151 #> Pr(>|t|) #> (Intercept)_1 < 2.2e-16 *** #> RD60_1 < 2.2e-16 *** #> PS60_1 0.001507 ** #> MA60_1 < 2.2e-16 *** #> DV60_1 < 2.2e-16 *** #> UE60_1 0.649409 #> SOUTH_1 4.255e-15 *** #> lambda_1 < 2.2e-16 *** #> --- #> Signif. codes: #> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> R-squared: 0.3388 #> #> Residual standard error: 4.597 #> LMM: 1.1852 p-value: (0.276) ``` The LR test for nested models, in this case SIM vs SEM, can be obtained as before using the function `anova()` ```r anova(ols.spsur, sem.spsur) ``` ``` #> logLik df AIC BIC LRtest #> model 1: sim -9176.3 8 18369 18353 #> model 2: sem -9106.0 9 18230 18212 140.59 #> p.val #> model 1: sim #> model 2: sem 1.9786e-32 ``` Clearly, the SEM model is preferable to the SIM specification because there is spatial dependence in the data, which has been effectively captured by the SEM mechanism. Moreover, according to the LMM test below (this is a Marginal Lagrange Multiplier of the type $LM(\rho|\lambda)$; see the *spsur user guide*), once the spatial errors are introduced in the equation, the spatial lag of the endogenous variable, $WHR_{60}$ is not statistically significant. ```r print(sem.spsur$LMM) ``` ``` #> [1] 1.185223 ``` # Spatial Simultaneous Autoregressive Model (SARAR) The especification of the SARAR model is as usual $$ \begin{equation} HR_{60} = \rho W HR_{60} + \beta_{0}+\beta_{1}RD_{60}+\beta_{2}PS_{60}+\beta_{3}MA_{60}+\beta_{4}DV_{60} +\beta_{5}UE_{60} +\beta_{6}SOUTH+ u_{60} \\ u_{60} = \lambda W u_{60} + \epsilon_{60}\ (\#eq:sarar) \end{equation} $$ **spsur** estimates this model by using the funtion `spsurml()`; you only have to adjust the argument *type* to *sarar*, that is ```r sarar.spsur <- spsurml(formula = formula_60, listw = listw, type ="sarar",data = NCOVR.sf) ``` ``` #> neighbourhood matrix eigenvalues #> Computing eigenvalues ... #> #> Initial point: log_lik: -9085.318 rhos: 0.663 lambdas: -0.603 #> Iteration: 1 log_lik: -9062.748 rhos: 0.748 lambdas: -0.837 #> Iteration: 2 log_lik: -9060.355 rhos: 0.77 lambdas: -0.905 #> Iteration: 3 log_lik: -9060.132 rhos: 0.776 lambdas: -0.925 #> Iteration: 4 log_lik: -9060.111 rhos: 0.777 lambdas: -0.931 #> Iteration: 5 log_lik: -9060.109 rhos: 0.778 lambdas: -0.933 #> Iteration: 6 log_lik: -9060.108 rhos: 0.778 lambdas: -0.933 #> Time to fit the model: 42.97 seconds #> Time to compute covariances: 10.36 seconds ``` ```r summary(sarar.spsur) ``` ``` #> Call: #> spsurml(formula = formula_60, data = NCOVR.sf, listw = listw, #> type = "sarar") #> #> #> Spatial SUR model type: sarar #> #> Equation 1 #> Estimate Std. Error t value #> (Intercept)_1 2.421946 0.391455 6.1870 #> RD60_1 0.657214 0.079674 8.2488 #> PS60_1 0.149669 0.054110 2.7660 #> MA60_1 -0.079578 0.011837 -6.7230 #> DV60_1 0.500917 0.063113 7.9369 #> UE60_1 -0.031826 0.021372 -1.4892 #> SOUTH_1 0.260660 0.127846 2.0389 #> rho_1 0.777937 0.018114 42.9474 #> lambda_1 -0.933030 0.083248 -11.2078 #> Pr(>|t|) #> (Intercept)_1 6.942e-10 *** #> RD60_1 2.344e-16 *** #> PS60_1 0.005708 ** #> MA60_1 2.114e-11 *** #> DV60_1 2.880e-15 *** #> UE60_1 0.136550 #> SOUTH_1 0.041550 * #> rho_1 < 2.2e-16 *** #> lambda_1 < 2.2e-16 *** #> --- #> Signif. codes: #> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> R-squared: 0.4441 #> #> Residual standard error: 4.24 ``` In the case of **spatialreg**, the function for this case is `spatialreg::sacsar()` ```r sarar.spatialreg <- spatialreg::sacsarlm(formula = formula_60, listw = listw, data = NCOVR.sf) summary(sarar.spatialreg) ``` ``` #> #> Call: #> spatialreg::sacsarlm(formula = formula_60, data = NCOVR.sf, listw = listw) #> #> Residuals: #> Min 1Q Median 3Q Max #> -12.73012 -1.99470 -0.57684 1.20274 80.71945 #> #> Type: sac #> Coefficients: (asymptotic standard errors) #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) 2.419903 0.410043 5.9016 3.600e-09 #> RD60 0.656738 0.085035 7.7232 1.132e-14 #> PS60 0.149557 0.054411 2.7486 0.005985 #> MA60 -0.079518 0.012398 -6.4137 1.420e-10 #> DV60 0.500589 0.066415 7.5373 4.796e-14 #> UE60 -0.031805 0.021375 -1.4879 0.136767 #> SOUTH 0.260047 0.131573 1.9764 0.048104 #> #> Rho: 0.77818 #> Asymptotic standard error: 0.023041 #> z-value: 33.774, p-value: < 2.22e-16 #> Lambda: -0.93384 #> Asymptotic standard error: 0.072693 #> z-value: -12.846, p-value: < 2.22e-16 #> #> LR test value: 232.46, p-value: < 2.22e-16 #> #> Log likelihood: -9060.108 for sac model #> ML residual variance (sigma squared): 17.97, (sigma: 4.2391) #> Number of observations: 3085 #> Number of parameters estimated: 10 #> AIC: 18140, (AIC for lm: 18369) ``` The same as before, the differences between the two codes are minimal. Finally, according to the `anova()` function of **spsur**, the SARAR model is preferable to the SLM and SEM alternatives, which are nested in the SARAR, in spite of the high value estimated for the parameter $\lambda$, -0.93327, close to a unit root case. The LR and both AIC and BIC statistics reach the same conclusion. ```r anova(slm.spsur,sarar.spsur) ``` ``` #> logLik df AIC BIC LRtest #> model 1: slm -9098.5 9 18215 18197 #> model 2: sarar -9060.1 10 18140 18120 76.751 #> p.val #> model 1: slm #> model 2: sarar 1.9393e-18 ``` ```r anova(sem.spsur,sarar.spsur) ``` ``` #> logLik df AIC BIC LRtest #> model 1: sem -9106.0 9 18230 18212 #> model 2: sarar -9060.1 10 18140 18120 91.87 #> p.val #> model 1: sem #> model 2: sarar 9.2565e-22 ``` # References