diff --git a/DESCRIPTION b/DESCRIPTION index 301772c..84dc090 100755 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -2,7 +2,7 @@ Package: SUMMER Type: Package Title: Small-Area-Estimation Unit/Area Models and Methods for Estimation in R Version: 1.4.0 -Date: 2023-10-24 +Date: 2024-02-29 Authors@R: c( person(given = "Zehang R", family = "Li", email= "lizehang@gmail.com", role = c("cre","aut")), person(given = "Bryan D", family = "Martin", email= "bmartin6@uw.edu", role = "aut"), diff --git a/NAMESPACE b/NAMESPACE index 5548879..4e1c377 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -77,6 +77,7 @@ importFrom(ggplot2,ylab) importFrom(grDevices,colorRampPalette) importFrom(graphics,text) importFrom(haven,as_factor) +importFrom(methods,as) importFrom(methods,is) importFrom(sf,st_as_sf) importFrom(sf,st_centroid) diff --git a/R/getSmoothedDensity.R b/R/getSmoothedDensity.R index e1d14c8..7039f4c 100644 --- a/R/getSmoothedDensity.R +++ b/R/getSmoothedDensity.R @@ -20,6 +20,7 @@ #' #' @return ridge plot of the density, and if \code{save.density = TRUE}, also a data frame of the calculated densities #' @seealso \code{\link{plot.SUMMERproj}} +#' @importFrom methods as #' @author Zehang Richard Li #' @examples #' \dontrun{ diff --git a/inst/doc/small-area-estimation.html b/inst/doc/small-area-estimation.html index 8654197..75d19e4 100644 --- a/inst/doc/small-area-estimation.html +++ b/inst/doc/small-area-estimation.html @@ -12,7 +12,7 @@ - + Generic small area estimation @@ -362,284 +362,251 @@

Generic small area estimation

Peter Gao and Jon Wakefield

-

2024-02-28

+

2024-02-29

In this vignette, we illustrate the use of the smoothArea and smoothUnit functions for generic small area estimation of outcomes other than under-five -mortality rates.

-
-

Load packages

-

First, we load the necessary packages and the necessary data. The -required package INLA is not in a standard repository, so -we install if it is not available. The survey package will -be used to generate direct estimates, while dplyr and -tidyr will be used to facilitate data manipulation.

+mortality rates. Using example data presented in the sae +vignette, we compare our results from SUMMER with those +obtained using the sae package (for a survey of other +packages for small area estimation, see (kreutzmann2019?)).

+
+

Preliminary

+

First, we load the necessary packages and data. The required package +INLA is not available via a standard repository, so we +include code for installation if it is not found. The +survey package will be used to generate direct estimates, +while dplyr and tidyr will be used for data +manipulation.

library(sae)
 library(SUMMER)
 library(survey)
 library(dplyr)
 library(tidyr)
-if (!isTRUE(requireNamespace("INLA", quietly = TRUE))) {
-  install.packages("INLA", repos=c(getOption("repos"), 
-                  INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE)
-}
+library(sf) +library(ggplot2) +if (!isTRUE(requireNamespace("INLA", quietly = TRUE))) { + install.packages("INLA", repos=c(getOption("repos"), + INLA="https://inla.r-inla-download.org/R/stable"), dep=TRUE) +}

Area level models

-
-

Income data

-

First, we compare our results with those obtained using the -sae package using examples presented in the -sae vignette. Molina and Marhuenda -(2015) generate an artificial dataset on income and other related -variables to illustrate the use of area level models in the -sae package. In their simulated example, the objective is -to estimate prevalence of poverty in Spanish counties.

+
+

Artificial poverty rate example

+

In their vignette for the sae package, (molina2015?) generate an +artificial dataset on income and other related variables to illustrate +the use of area level models. In this example, the objective is to +estimate prevalence of poverty in Spanish counties.

data("incomedata")
 data("sizeprov")
 data("sizeprovedu")
-povertyline <- 0.6*median(incomedata$income) # 6557.143
-incomedata$poor <- as.integer(incomedata$income < povertyline)
-

The incomedata data frame contains information on -17199 observations of individuals in 52 -Spanish provinces. Income values and sampling weights are provided for -each individual along with covariate information including age group and -education level. Molina and Marhuenda -(2015) define the poverty line z, and calculate an -indicator variable with value 1 if the corresponding income value is -below the poverty line and 0 otherwise.

-
-

Direct estimation with sae

-

The sae::direct function computes the Horvitz-Thompson -estimator of domain means, given by,

-

\[\widehat{\overline{Y}}_{d}^{\text{DIR}}=\frac{1}{N_d}\sum_{k\in -S_d}w_{k}y_{k}\]

-

where \(N_d\) is the population size -of domain \(d\), \(S_d\) is the set of sampled observations in -domain \(d\), \(w_{k}\) is the sampling weight for unit -\(k\), and \(y_{k}\) is the observation for unit \(k\), for all \(k -\in S_d\). The sae::direct function also estimates -standard deviation and coefficient of variation for each domain.

+povertyline <- 0.6 * median(incomedata$income) # 6557.143 +incomedata$in_poverty <- as.integer(incomedata$income < povertyline)
+

The incomedata data frame contains information on 17199 +observations of individuals in 52 Spanish provinces. Income values and +sampling weights are provided for each individual along with covariate +information including age group and education level. (molina2015?) define the +poverty line and calculate an indicator variable (which we name +in_poverty) with value 1 if the corresponding income value +is below the poverty line and 0 otherwise.

+
+

Direct estimation with sae

+

Before considering model-based methods for small area estimation, we +compute direct weighted estimators for the desired small area means. The +sae::direct function computes the Horvitz-Thompson +estimator of domain means given by

+

\[ +\widehat{\overline{Y}}_{i}^{\text{DIR}}=\frac{1}{N_i}\sum_{j\in +S_i}w_{j}y_{j} +\]

+

where \(N_i\) is the population size +of domain \(i\), \(S_i\) is the set of sampled observations in +domain \(i\), \(w_{j}\) is the sampling weight for unit +\(j\), and \(y_{j}\) is the observation for unit \(j\), for all \(j +\in S_i\). The sae::direct function also estimates +standard deviation and coefficient of variation for each domain. Note +that \(N_i\) is assumed known and is +provided in the data frame sizeprov. The domains of +interest are identified via the provlab variable.

Popn <- sizeprov[, c("provlab", "Nd")]
-sae.DIR <- sae::direct(y = incomedata$poor, dom = incomedata$provlab,
-                       sweight = incomedata$weight, domsize = Popn)
-knitr::kable(head(sae.DIR), digits = 3)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DomainSampSizeDirectSDCV
Alava960.2550.04819
Albacete1730.1410.03022
Alicante5390.2050.02211
Almeria1980.2650.04115
Avila580.0550.02646
Badajoz4940.2100.02311
+sae.DIR <- sae::direct(y = incomedata$in_poverty, dom = incomedata$provlab, + sweight = incomedata$weight, domsize = Popn) |> + select(Domain, Direct, SD)
-
-

Direct estimation with survey

- +
+

Direct estimation with survey

We can similarly use the survey::svyby function to compute the Horvitz-Thompson estimates:

incomedata$pop <- sum(sizeprov$Nd[match(incomedata$provlab, sizeprov$provlab)])
-design <- survey::svydesign(ids = ~1, weights = ~weight, data = incomedata, fpc = ~pop)
-
-# estimate area totals
-svy.DIR <- survey::svyby(~poor, ~provlab, design, svytotal)
-
-# calculate corresponding area mean estimates
-svy.DIR$prov_pop <- sizeprov$Nd[match(svy.DIR$provlab, sizeprov$provlab)]
-svy.DIR$poor_mean = svy.DIR$poor/svy.DIR$prov_pop
-svy.DIR$poor_mean_se = svy.DIR$se/svy.DIR$prov_pop
-knitr::kable(head(svy.DIR) %>% select(provlab, poor_mean, poor_mean_se), digits = 3)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
provlabpoor_meanpoor_mean_se
AlavaAlava0.2550.048
AlbaceteAlbacete0.1410.030
AlicanteAlicante0.2050.022
AlmeriaAlmeria0.2650.041
AvilaAvila0.0550.026
BadajozBadajoz0.2100.023
+design <- survey::svydesign(ids = ~1, weights = ~weight, + data = incomedata, fpc = ~pop) + +# estimate area totals +svy.DIR <- survey::svyby(~in_poverty, ~provlab, design, svytotal) + +# calculate corresponding area mean estimates +svy.DIR$prov_pop <- sizeprov$Nd[match(svy.DIR$provlab, sizeprov$provlab)] +svy.DIR$Domain <-svy.DIR$provlab +svy.DIR$Direct = svy.DIR$in_poverty/svy.DIR$prov_pop +svy.DIR$SD= svy.DIR$se/svy.DIR$prov_pop
-
-

Fay-Herriot model

-

The Fay-Herriot model is a basic area level model used to obtain -small area estimators Fay and Herriot -(1979). The Fay-Herriot model is often specified as a two-stage -model; the first stage is the sampling model and the second stage is the -linking model.

-

First stage:

-

Let \(\widehat\theta^{\text{DIR}}_d\) be a direct -estimator of \(\theta_d\): \[\widehat\theta^{\text{DIR}}_d=\theta_d+\epsilon_d;\hspace{2em}\epsilon_d\sim_{ind}N(0,V_d),\hspace{2em}d=1,\ldots, -D,\]

-

where \(V_d\) is the known -sampling variance of the direct estimator \(\widehat{\theta}^{\text{DIR}}_d\).

-

Second stage:

-

\[\theta_d = -\mathbf{x}_d^T\boldsymbol\beta+u_d,\hspace{2em}u_d\sim_{ind}N(0,\sigma_u^2)\hspace{2em}d=1,\ldots, -D,\]

+
+

Basic area level model

+

The basic area level model, also called the Fay-Herriot model, treats +direct estimates of small area quantities as response data and +explicitly models differences between areas using covariate information +and random effects (fay1979?). The Fay-Herriot +model can be viewed as a two-stage model: in the first stage, a +sampling model represents the sampling variability of a direct +estimator and in the second stage, a linking model describes +the between area differences in small area quantities.

+

Sampling model:

+

Let \(\widehat\theta^{\text{DIR}}_i\) be a direct +estimator of an area level mean or total \(\theta_i\). The sampling model treats \(\widehat\theta^{\text{DIR}}_i\) as a noisy +observation of the true finite population quantity \(\theta_i\):

+

\[ +\widehat\theta^{\text{DIR}}_i=\theta_i+\epsilon_i;\hspace{2em}\epsilon_i\sim_{ind}N(0,V_i),\hspace{2em}i=1,\ldots, +M +\]

+

where \(V_i\) is the known +sampling variance of the direct estimator \(\widehat{\theta}^{\text{DIR}}_i\).

+

Linking model:

+

\[ +\theta_i = +\textbf{x}_i^T\boldsymbol\beta+u_i,\hspace{2em}u_i\sim_{ind}N(0,\sigma_u^2)\hspace{2em}i=1,\ldots, +M, +\]

where \(\sigma_u^2\) (between-area residual variance) is estimated. In this basic Fay-Herriot model, the -area-specific random effects \(u_d\) -are assumed to be independent between areas.

-
-
-

Implementing Fay-Herriot model with sae

+area-specific random effects \(u_i\) +are assumed to be independent and identically distributed (IID) between +areas.

Below, we provide a quantile-quantile plot comparing the direct estimates to a Gaussian distribution. Here the observed quantiles align well with those from a Gaussian distribution, which lends some support to the basic IID model.

-
par(pty="s")
-muD <- mean(sae.DIR$Direct)
-sdD <- sd(sae.DIR$Direct)
-qqnorm((sae.DIR$Direct-muD)/sdD,main="")
-abline(0,1,col="red")
-

-

The sae::mseFH function calculates the empirical best -linear unbiased predictors (EBLUP) for all domain means as well as their -estimated MSEs.

-
sae.FH <- sae::mseFH(sae.DIR$Direct~1, sae.DIR$SD^2)
+
par(pty = "s")
+mu.DIR <- mean(sae.DIR$Direct)
+sd.DIR <- sd(sae.DIR$Direct)
+qqnorm((sae.DIR$Direct - mu.DIR) / sd.DIR, main = "")
+abline(0, 1, col = "red")
+

+
+
+

Area level modeling with sae

+

As described by (molina2015?), the +sae::mseFH function fits the basic area level model (via +REML by default) and calculates the empirical best linear unbiased +predictors (EBLUP) for all domain means as well as their estimated +MSEs.

+
sae.FH <- sae::mseFH(sae.DIR$Direct~1, sae.DIR$SD^2)
+sae.FH.table <- data.frame(
+  Domain = sae.DIR$Domain,
+  EBLUP = sae.FH$est$eblup,
+  RMSE = sqrt(sae.FH$mse)
+)
+head(sae.FH.table)
+##     Domain      EBLUP       RMSE
+## 1    Alava 0.23407448 0.03836098
+## 2 Albacete 0.15335213 0.02744856
+## 3 Alicante 0.20511853 0.02051886
+## 4  Almeria 0.24498009 0.03427432
+## 5    Avila 0.07797403 0.02372809
+## 6  Badajoz 0.20928033 0.02186672
-
-

Implementing Fay-Herriot model in SUMMER

-

The smoothArea function compute direct estimates and -then produces smoothed estimates using a Bayesian Fay-Herriot model. The -main arguments of interest are:

+
+

Area level modeling with SUMMER

+

The SUMMER package adopts a Bayesian approach to +inference using models such as the basic area level model, carrying out +computation via the INLA package. The +smoothArea function computes direct estimates and then +produces smoothed estimates using a Bayesian Fay-Herriot model. The main +arguments of interest are:

    -
  • formula: describing the response variable and any +
  • formula: Describing the response variable and any area-level covariates
  • -
  • domain a one-sided formula with the variable containing +
  • domain A one-sided formula with the variable containing domain labels on the right. The domain labels variable should be contained in the dataset used to generate the design.
  • -
  • design: a survey.design object containing +
  • design: A survey.design object containing survey data and specifying the survey design.

In addition, other commonly used optional arguments include:

  • adj.mat: Optional adjacency matrix if a spatial smoothing model is desired.
  • -
  • direct.est: If desired, the direct estimates may be -specified and smoothing will be applied directly to the direct +
  • transform: If "logit" is specified, a +logit transform will be applied to the direct estimates and an +appropriate transformation will be applied to the estimated sampling +variances before smoothing.
  • +
  • direct.est: The direct estimates may be specified +directly and smoothing will be applied directly to these user-provided estimates.
  • -
  • X.domain: Areal covariates data frame.
  • -
  • domain.size: Domain size data frame used for computing -direct estimates if domain sizes are known.
  • +
  • X.domain: Data frame of area level covariates.
  • +
  • domain.size: Data frame of domain sizes used for +computing direct estimates if domain sizes are known.

Other optional arguments can be specified to change the priors and are described further in the documentation.

-

We obtain direct estimates and smoothed estimates below, fitting the -Fay-Herriot model to the untransformed direct estimates.

-
# fit the model
+

For the artificial poverty rate example, we fit the Fay-Herriot model +and obtain the following smoothed estimates.

+
# specify known domain sizes
 domain.size <- sizeprov[, c("provlab", "Nd")]
 colnames(domain.size)[2] <- "size"
 
-summer.FH <- smoothArea(formula = poor~1,
-                        domain = ~provlab,
-                        design = design,
-                        domain.size = domain.size)
+# fit model and obtain svysae object +summer.FH <- smoothArea(formula = in_poverty~1, + domain = ~provlab, + design = design, + domain.size = domain.size, + return.samples = T) +## Warning in smoothArea(formula = in_poverty ~ 1, domain = ~provlab, design = +## design, : Direct estimates appear to be proportions. You may want to consider +## using transform = 'logit'. + +summer.FH.table <- data.frame( + Domain = sae.DIR$Domain, + Median = summer.FH$iid.model.est$median, + SE = sqrt(summer.FH$iid.model.est$var) +) +head(summer.FH.table) +## Domain Median SE +## 1 Alava 0.23666203 0.03719127 +## 2 Albacete 0.15085556 0.02667922 +## 3 Alicante 0.20531685 0.01909936 +## 4 Almeria 0.24577825 0.03769929 +## 5 Avila 0.07652956 0.02222395 +## 6 Badajoz 0.20746467 0.02282758
+ + + + + + + + + + + +

The fitted parameters from sae (obtained via likelihood-based methods) and estimated parameter posterior distribution from SUMMER (obtained from Bayesian methods, implemented via INLA) are in reasonable agreement. The estimated intercept \(\beta_0\) from -sae is 0.2 ; the posterior median of \(\beta_0\) from SUMMER is -0.2. In the absence of strong priors, fixed effects are -usually in close agreement, with the posterior being symmetric. The -estimated precision \(1/\sigma_u^2\) -from sae is 281.35 , while the posterior -median of \(1/\sigma_u^2\) from -SUMMER is 273.24. The differences are larger -here, but the posterior for the variance is skewed, and we would expect -the posterior median to be smaller than the REML estimate. The area -estimates and measures of uncertainty are in close agreement, -however.

+sae is 0.202 ; the posterior median of \(\beta_0\) from SUMMER is +0.202. In the absence of strong priors, fixed effects are usually in +close agreement, with the posterior being symmetric. The estimated +precision \(1/\sigma_u^2\) from +sae is 281.34787 , while the posterior median of \(1/\sigma_u^2\) from SUMMER is +273.241562. The differences are larger here, but the posterior for the +variance is skewed, and we would expect the posterior median to be +smaller than the REML estimate. The area estimates and measures of +uncertainty are in close agreement, however.

We first illustrate the shrinkage of the EBLUP estimates, and the reduced uncertainty:

par(mfrow = c(1, 2))
@@ -655,13 +622,13 @@ 

Implementing Fay-Herriot model in SUMMER

ylim=c(min(sae.DIR$SD^2, sae.FH$mse),max(sae.DIR$SD^2, sae.FH$mse)), main = "Estimates of uncertainty") abline(0,1,col="red")
-

+

Now compare EBLUP and HB, using posterior variance from SUMMER and estimated MSE from sae to measure uncertainty:

par(mfrow = c(1, 2))
 plot(sae.FH$est$eblup, summer.FH$iid.model.est$median,
-     xlab = "SAE package",ylab = "SUMMER package",
+     xlab = "sae package",ylab = "SUMMER package",
      main = "Small area estimates")
 abline(0,1,col="red")
 plot(sae.FH$mse,
@@ -670,233 +637,219 @@ 

Implementing Fay-Herriot model in SUMMER

ylab = "SUMMER package model variance", main = "Estimates of mse/variance") abline(0,1,col="red")
-

-
-
-
-

Milk expenditures

-

Molina and Marhuenda (2015) also use a -real dataset on milk expenditures Arora and -Lahiri (1997) to illustrate the implementation of area level -models in the sae package. The milk dataset -contains information on direct estimators for area level expenditures on -milk based on data from the dairy survey component of the Consumer -Expenditure Survey conducted by the U.S. Census Bureau. Weights are -based on inverse sampling probabilities and adjusted for -non-response/post-stratification.

-
data(milk)
-milk$Var <- milk$SD^2
-knitr::kable(head(milk))
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
SmallAreaniyiSDCVMajorAreaVar
11911.100.160.1510.03
26331.070.080.0710.01
35971.100.080.0710.01
42210.630.110.1710.01
51950.750.120.1610.01
61910.980.140.1410.02
-

Here, SmallArea denotes areas of inferential interest, -yi gives the direct estimates, ni the sample -sizes, SD and CV give the estimated standard -deviation and coefficient of variation, and MajorArea -corresponds to major areas (US regions defined by You and Chapman (2006)).

-
-

Results using sae

-
attach(milk)
-sae.FH.milk <- mseFH(yi ~ as.factor(MajorArea), SD^2)
+

+

The SUMMER package includes functions to generate +additional diagnostic plots based on samples from the model posterior. +The compareEstimates() function generates a heatmap of the +posterior pairwise probabilities of one area’s mean exceeding another’s +mean.

+

When geographic polygon data for each domain is available as a +shapefile, the sf package can be used to load this data in +R. The mapEstimates() function can then be used to provide +a summary map of posterior medians or posterior variances.

-
-

Results using SUMMER

-

Since the milk dataset only provides area level direct -estimates and not unit level data, we use the direct.est -argument as input.

-
# Major Area fixed effects
-Xmat <- milk[, c("SmallArea", "MajorArea")]
-Xmat$MajorArea <- as.factor(Xmat$MajorArea)
-
-# format direct estimates for SUMMER
-milk.dir <- milk[, c("SmallArea", "yi", "Var")]
-# Fit the model with Major Area intercepts
-summer.FH.milk<- smoothArea(formula = yi~MajorArea, 
-                            direct.est = milk.dir, 
-                            X.domain = Xmat,
-                            domain = ~SmallArea)
-

Again note that the posterior median for \(\sigma_u^2\) is larger than the REML -estimate.

-
# random effects precision
-1 / sae.FH.milk$est$fit$refvar 
-## [1] 54
-summer.FH.milk$iid.model.fit$summary.hyperpar[,c(4)]
-## [1] 54
-
-# estimated fixed effects
-sae.FH.milk$est$fit$estcoef 
-##                         beta std.error tvalue  pvalue
-## X(Intercept)            0.97     0.069   14.0 2.8e-44
-## Xas.factor(MajorArea)2  0.13     0.103    1.3 2.0e-01
-## Xas.factor(MajorArea)3  0.23     0.092    2.5 1.4e-02
-## Xas.factor(MajorArea)4 -0.24     0.082   -3.0 3.1e-03
-summer.FH.milk$iid.model.fit$summary.fixed[,c(1:5)]  
-##              mean    sd 0.025quant 0.5quant 0.97quant
-## (Intercept)  0.97 0.071      0.829     0.97     1.103
-## MajorArea2   0.13 0.106     -0.073     0.13     0.334
-## MajorArea3   0.23 0.094      0.041     0.23     0.404
-## MajorArea4  -0.24 0.083     -0.406    -0.24    -0.083
-

Again, we observe good agreement.

-
par(mfrow = c(1, 2))
-plot(sae.FH.milk$est$eblup[as.numeric(summer.FH.milk$iid.model.est$domain)], 
-     summer.FH.milk$iid.model.est$median, 
-     xlab = "sae package",
-     ylab = "SUMMER package",
-     main = "Small area estimates")
-abline(0,1,color="red")
-plot(sae.FH.milk$mse[as.numeric(summer.FH.milk$iid.model.est$domain)], 
-     summer.FH.milk$iid.model.est$var, 
-     xlab = "sae package mse",
-     ylab = "SUMMER package model variance",
-     main = "Estimates of mse/variance")
-abline(0,1,col="red")
-

-
-
-

Spatial Fay-Herriot

+
+

Spatial Fay-Herriot

The sae package also provides tools for implementing a spatial version of the Fay-Herriot model which assumes that the vector of area specific effects follows a first order simultaneous -autoregressive, or SAR(1), process: \[\mathbf{u}=\rho_1\mathbf{Wu}+\boldsymbol\epsilon,\hspace{1em}\boldsymbol\epsilon\sim -N(\mathbf{0}_D,\sigma_I^2\mathbf{I}_D),\] where \(\mathbf{I}_D\) is the identity matrix for -the \(D\) areas and \(\mathbf{0}_D\) is a vector of zeroes of +autoregressive, or SAR(1), process: \[\textbf{u}=\rho_1\textbf{Wu}+\boldsymbol\epsilon,\hspace{1em}\boldsymbol\epsilon\sim +N(\textbf{0}_i,\sigma_I^2\textbf{I}_i),\] where \(\textbf{I}_i\) is the identity matrix for +the \(D\) areas and \(\textbf{0}_i\) is a vector of zeroes of size \(D\). Additionally, \(\rho_1\in(-1,1)\) is an autoregression -parameter and \(\mathbf{W}\) is an +parameter and \(\textbf{W}\) is an adjacency matrix (with rows standardized to sum to 1).

The sae::mseSFH function estimates the unknown variance parameters, the resulting EBLUP small area estimators, and then uses bootstrap methods to estimate the MSE of the estimators.

-

To illustrate the use of this function, Molina -and Marhuenda (2015) consider a synthetic dataset concerning -grape production surface area for 274 Italian municipalities. Below we -load the relevant objects from the sae package. The -grapes dataset containes direct estimators of the mean -surface area in hectares for grape production in each municipality -(grapehect), the sampling variance of these direct -estimators (var), and relevant covariates including number -of working dats and overall agrarian surface area. The -grapesprox object contains the relevant adjacency matrix -representing the municipalities’ neighborhood structure.

-
data("grapes")
-data("grapesprox")
-
+

To illustrate the use of this function, (molina2015?) consider a +synthetic dataset concerning grape production surface area for 274 +Italian municipalities. Below we load the relevant objects from the +sae package. The grapes dataset containes +direct estimators of the mean surface area in hectares for grape +production in each municipality (grapehect), the sampling +variance of these direct estimators (var), and relevant +covariates including number of working dats and overall agrarian surface +area. The grapesprox object contains the relevant adjacency +matrix representing the municipalities’ neighborhood structure.

+
data("grapes")
+data("grapesprox")
+

Results using sae

-
sae.FH.grapes <- sae::mseSFH(grapehect ~ area + workdays - 1, var, grapesprox, data = grapes)
-
-results <- data.frame(DIR = grapes$grapehect,
-                      eblup.SFH = sae.FH.grapes$est$eblup,
-                      mse = sae.FH.grapes$mse)
-# reorder results for comparison later
-results$area_name <- paste0('area_', rownames(results))
+
sae.FH.grapes <- sae::mseSFH(grapehect ~ area + workdays - 1, var, grapesprox, data = grapes)
+
+results <- data.frame(DIR = grapes$grapehect,
+                      eblup.SFH = sae.FH.grapes$est$eblup,
+                      mse = sae.FH.grapes$mse)
+# reorder results for comparison later
+results$area_name <- paste0('area_', rownames(results))
-
+

Results using SUMMER

-

The smoothSurvey function also allows the use of a model +

The smoothArea function also allows the use of a model with spatially correlated area effects, but the default implementation -assumes a BYM2 model for \(\mathbf{u}\) +assumes a BYM2 model for \(\textbf{u}\) rather than a simultaneous autoregressive model as in the SFH model implemented in sae.

-
# create area_name as SUMMER requires rownames of A_mat to match area variable
-grapes$area_name <- paste0('area_', rownames(grapes))
-Amat_grapes <- as.matrix(grapesprox)
-rownames(Amat_grapes) <- colnames(Amat_grapes) <- grapes$area_name
-X_grapes <- grapes[,c('area_name', 'area', 'workdays')]
-
-# format direct estimates for SUMMER
-grapes.dir <- grapes[, c(5, 1, 4)]
-# scale direct estimates for use with INLA
-grapes.dir$grapehect <- grapes.dir$grapehect / 10
-grapes.dir$var <- grapes.dir$var/ 100
-summer.FH.grapes<- smoothArea(formula = grapehect~area + workdays, 
-                              direct.est = grapes.dir, X.domain = X_grapes,
-                              domain = ~area_name, adj.mat = Amat_grapes)
+
# create area_name as SUMMER requires rownames of adj.mat to match area variable
+grapes$area_name <- paste0('area_', rownames(grapes))
+adj.mat.grapes <- as.matrix(grapesprox)
+rownames(adj.mat.grapes) <- colnames(adj.mat.grapes) <- grapes$area_name
+X_grapes <- grapes[,c('area_name', 'area', 'workdays')]
+
+# format direct estimates for SUMMER
+grapes.dir <- grapes[, c(5, 1, 4)]
+# scale direct estimates for use with INLA
+grapes.dir$grapehect <- grapes.dir$grapehect / 10
+grapes.dir$var <- grapes.dir$var/ 100
+summer.FH.grapes <- smoothArea(formula = grapehect~area + workdays, 
+                               direct.est = grapes.dir, X.domain = X_grapes,
+                               domain = ~area_name, adj.mat = adj.mat.grapes)
+plot_list <- plot(summer.FH.grapes, return_list = T)
+ + + + + + +

Despite the differing models, we again observe good agreement with -the estimates, though less so with the estimates of uncertainty, which -is interesting.

-
# rearrange output to be the same order
-results <- results[match(summer.FH.grapes$bym2.model.est$domain, results$area_name), ]
-par(mfrow = c(1, 2))
-plot(results$eblup.SFH,
-     summer.FH.grapes$bym2.model.est$median * 10,  
-     xlab = "sae package",
-     ylab = "SUMMER package",
-     main = "Small area estimates")
-abline(0, 1, col = 'red')
-plot(results$mse,
-     summer.FH.grapes$bym2.model.est$var * 100,  
-     xlab = "sae package mse",
-     ylab = "SUMMER package model variance",
-     main = "Estimates of mse/variance")
-abline(0, 1, col = 'red')
-

+the estimates, though less so with the estimates of uncertainty.

+
summer.bym2.est <- 
+  summer.FH.grapes$bym2.model.est[match(rownames(adj.mat.grapes), summer.FH.grapes$bym2.model.est$domain),]
+par(mfrow = c(1, 2))
+plot(results$eblup.SFH,
+     summer.bym2.est$median * 10,  
+     xlab = "sae package",
+     ylab = "SUMMER package",
+     main = "Small area estimates")
+abline(0, 1, col = 'red')
+plot(results$mse,
+     summer.bym2.est$var * 100,  
+     xlab = "sae package mse",
+     ylab = "SUMMER package model variance",
+     main = "Estimates of mse/variance")
+abline(0, 1, col = 'red')
+

+
+
+

BRFSS Data

+

Below, we provide an example comparing spatial models from +sae and SUMMER using data from the Behavioral +Risk Factor Surveillance System (BRFSS).

+
library(ggplot2)
+library(patchwork)
+data(BRFSS)
+data(KingCounty)
+BRFSS <- subset(BRFSS, !is.na(BRFSS$diab2))
+BRFSS <- subset(BRFSS, !is.na(BRFSS$hracode))
+head(BRFSS)
+##   age pracex       educau zipcode    sex street1 street2      seqno year
+## 1  30  White college grad   98001   male      NA      NA 2009000041 2009
+## 2  26  White college grad   98107 female      NA      NA 2009000309 2009
+## 3  33  Black college grad   98133   male      NA      NA 2009000404 2009
+## 4  25  White some college   98058   male      NA      NA 2009000531 2009
+## 5  23  White some college   98102   male      NA      NA 2009000675 2009
+## 6  19  Asian some college   98106   male      NA      NA 2009000694 2009
+##   hispanic mracex strata             hracode tract rwt_llcp genhlth2 fmd obese
+## 1 non-Hisp  White  53019        Auburn-North    NA 2107.463        0   0     0
+## 2 non-Hisp  White  53019             Ballard    NA 2197.322        0   1     0
+## 3 non-Hisp  Black  53019          NW Seattle    NA 3086.511        0   0     0
+## 4 non-Hisp  White  53019        Renton-South    NA 3184.740        1   1     1
+## 5 non-Hisp  White  53019 Capitol Hill/E.lake    NA 3184.740        0   0     0
+## 6 non-Hisp  Asian  53019      North Highline    NA 4391.304        0   0     0
+##   smoker1 diab2 aceindx2 zipout streetx ethn age4 ctmiss
+## 1       0     0       NA  98001       0    1    3      1
+## 2       0     0       NA  98107       0    1    3      1
+## 3       0     0       NA  98133       0    2    3      1
+## 4       0     0       NA  98058       0    1    3      1
+## 5       0     0       NA  98102       0    1    4      1
+## 6       0     0       NA  98106       0    3    4      1
+mat <- getAmat(KingCounty, KingCounty$HRA2010v2_)
+design <- svydesign(ids = ~1, weights = ~rwt_llcp,
+                    strata = ~strata, data = BRFSS)
+direct <- svyby(~diab2, ~hracode, design, svymean)
+
+

Results using sae

+

Below, we use sae to smooth the logit-transformed direct +estimates.

+
direct$var <- direct$se ^ 2
+direct$logit.diab2 <- SUMMER::logit(direct$diab2)
+direct$logit.var <- direct$var / (direct$diab2 ^ 2 * (1 - direct$diab2) ^ 2)
+SFH.brfss <- sae::mseSFH(logit.diab2 ~ 1, logit.var, mat, data = direct)
+
+results <- data.frame(domain = direct$hracode,
+                      eblup.SFH = SUMMER::expit(SFH.brfss$est$eblup), 
+                      mse = SFH.brfss$mse)
+
+
+

Results using SUMMER

+

Below, we fit two versions of the spatial area levelmodel in +SUMMER. If we change pc.u and +pc.alpha from the default value \(u=1,\alpha=0.01\) to \(u=0.1,\alpha=0.01\), we assign more prior +mass on smaller variance of the random effects, inducing more +smoothing.

+
summer.brfss <- smoothArea(diab2~1, domain= ~hracode,
+                           design = design,
+                           transform = "logit",
+                           adj.mat = mat, level = 0.95)
+summer.brfss.alt <- smoothArea(diab2~1, domain= ~hracode,
+                               design = design,
+                               transform = "logit",
+                               adj.mat = mat, level = 0.95,
+                               pc.u = 0.1, pc.alpha = 0.01)
+

Finally, we use SUMMER::mapPlot to compare median +estimates and uncertainty estimates obtained via sae and +SUMMER.

+
toplot <-  summer.brfss$bym2.model.est
+toplot$logit.var <- toplot$var / 
+  (summer.brfss$bym2.model.est$median ^ 2 * 
+     (1 - summer.brfss$bym2.model.est$median) ^ 2)
+toplot$median.alt <-  summer.brfss.alt$bym2.model.est$median
+toplot$logit.var.alt <-  summer.brfss.alt$bym2.model.est$var / 
+  (summer.brfss.alt$bym2.model.est$median ^ 2 *
+     (1 - summer.brfss.alt$bym2.model.est$median) ^ 2)
+toplot$median.sae <- results$eblup.SFH
+toplot$mse.sae <- results$mse
+variables <- c("median", "median.alt",  "median.sae",
+               "logit.var", "logit.var.alt", "mse.sae")
+names <- c("Median (default prior)", "Median (new prior)",  "EBLUP (sae)",
+           "Variance (default prior)", "Variance (new prior)", "MSE (sae)")
+mapPlot(data = toplot, geo = KingCounty,
+        variables=variables[1:3], 
+        labels = names[1:3], by.data = "domain",
+        by.geo = "HRA2010v2_", size = 0.1) 
+

+
mapPlot(data = toplot, geo = KingCounty,
+        variables=variables[4:6], labels = names[4:6],
+        by.data = "domain", by.geo = "HRA2010v2_", size = 0.1)  
+

+
par(mfrow = c(1, 2))
+range1 <- range(c(direct$diab2,toplot$median.alt))
+plot(direct$diab2,toplot$median,  
+     xlab = "direct estimates",
+     ylab = "model-based estimates",
+     main = "Small area estimates", col = 'red', pch = 16,
+     xlim=range1,ylim=range1)
+points(direct$diab2,toplot$median.sae,  col = 'blue', pch = 16)
+points(direct$diab2,toplot$median.alt,  col = 'cyan', pch = 16)
+legend('topleft', pch = 16, col = c('red', 'cyan', 'blue'), 
+       legend = c("SUMMER",'SUMMER (new prior)', "sae"),bty="n")
+abline(0,1)
+range2 <- range(c(direct$logit.var,toplot$mse.sae,toplot$logit.var.alt))
+plot(direct$logit.var,toplot$logit.var,  
+     xlab = "direct estimate var.",
+     ylab = "model-based uncertainty",
+     main = "Small area estimates", col = 'red', pch = 16, 
+     xlim=range2,ylim=range2)
+points(direct$logit.var,toplot$mse.sae,  col = 'blue', pch = 16)
+points(direct$logit.var,toplot$logit.var.alt,  col = 'cyan', pch = 16)
+legend('topleft', pch = 16, col = c('red', 'cyan','blue'),
+       legend = c("SUMMER var.", 'SUMMER var. (new prior)', "sae mse"),bty="n")
+abline(0,1)
+

+
@@ -908,9 +861,7 @@

Unit Level Models

Nested error model: \[y_{dk}=\mathbf{x}_{dk}^T\boldsymbol\beta+u_d+\epsilon_{dk},\hspace{1em}u_d\sim_{ind}N(0,\sigma_u^2),\hspace{1em}\epsilon_{dk}\sim_{ind}N(0,\sigma_\epsilon^2)\]

Here \(u_d\) are area random effects and \(\epsilon_{dk}\) are unit level -errors.

-

This model assumes there is no sample selection -bias.

+errors. This model assumes the sampling design is ignorable.

The sae package conducts estimation of domain means by first estimating variance parameters \(\sigma^2_u\) and \(\sigma^2_\epsilon\). Next, given known variance parameters, domain means \(\theta_d\) are predicted by calculating the @@ -951,46 +902,50 @@

Corn and Soy Production

MeanSoyBeansPixPerSeg provide the known county means of the auxiliary variables.

We load the sample data:

-
data("cornsoybean")
-head(cornsoybean,n=10)
-
##    County CornHec SoyBeansHec CornPix SoyBeansPix
-## 1       1     166         8.1     374          55
-## 2       2      96       106.0     209         218
-## 3       3      76       103.6     253         250
-## 4       4     185         6.5     432          96
-## 5       4     116        63.8     367         178
-## 6       5     162        43.5     361         137
-## 7       5     152        71.4     288         206
-## 8       5     162        42.5     369         165
-## 9       6      93       105.3     206         218
-## 10      6     150        76.5     316         221
+
data("cornsoybean")
+head(cornsoybean)
+
##   County CornHec SoyBeansHec CornPix SoyBeansPix
+## 1      1  165.76        8.09     374          55
+## 2      2   96.32      106.03     209         218
+## 3      3   76.08      103.60     253         250
+## 4      4  185.35        6.47     432          96
+## 5      4  116.43       63.82     367         178
+## 6      5  162.08       43.50     361         137

Next, we load the population auxiliary information:

-
data("cornsoybeanmeans")
-Xmean <-
-  data.frame(cornsoybeanmeans[, c("CountyIndex",
-                                  "MeanCornPixPerSeg",
-                                  "MeanSoyBeansPixPerSeg")])
-Popn <-
-  data.frame(cornsoybeanmeans[, c("CountyIndex",
-                                  "PopnSegments")])
-head(Xmean)
+
data("cornsoybeanmeans")
+Xmean <-
+  data.frame(cornsoybeanmeans[, c("CountyIndex",
+                                  "MeanCornPixPerSeg",
+                                  "MeanSoyBeansPixPerSeg")])
+head(Xmean)
##   CountyIndex MeanCornPixPerSeg MeanSoyBeansPixPerSeg
-## 1           1               295                   190
-## 2           2               300                   197
-## 3           3               290                   205
-## 4           4               291                   220
-## 5           5               318                   188
-## 6           6               257                   247
+## 1 1 295.29 189.70 +## 2 2 300.40 196.65 +## 3 3 289.60 205.28 +## 4 4 290.74 220.22 +## 5 5 318.21 188.06 +## 6 6 257.17 247.13 +
Popn <-
+  data.frame(cornsoybeanmeans[, c("CountyIndex",
+                                  "PopnSegments")])
+head(Popn)
+
##   CountyIndex PopnSegments
+## 1           1          545
+## 2           2          566
+## 3           3          394
+## 4           4          424
+## 5           5          564
+## 6           6          570

The sae::pbmseBHF function obtains EBLUPs under the nested error model and then uses a parametric bootstrap approach to estimate MSEs.

-
cornsoybean <- cornsoybean[-33, ] # remove outlier
-sae.bhf <- 
-  pbmseBHF(CornHec ~ CornPix + SoyBeansPix,
-           dom = County, meanxpop = Xmean,
-           popnsize = Popn, B = 200, 
-           data = cornsoybean)
-

SUMMER::smoothSurvey provides the ability to fit unit +

cornsoybean <- cornsoybean[-33, ] # remove outlier
+sae.bhf <- 
+  pbmseBHF(CornHec ~ CornPix + SoyBeansPix,
+           dom = County, meanxpop = Xmean,
+           popnsize = Popn, B = 200, 
+           data = cornsoybean)
+

SUMMER::smoothUnit provides the ability to fit unit level models with unit level covariates for Gaussian response variables. Below we use the is.unit argument to specify a unit level model and then provide the column names of the unit level covariates in @@ -1003,157 +958,44 @@

Corn and Soy Production

complexity prior such that \(P(\sigma_u > 100)=0.01\) where \(\sigma_u\) is the standard deviation of the area-specific random effects).

-
cornsoybean$id <- 1:dim(cornsoybean)[1]
-Xsummer <- Xmean
-colnames(Xsummer) = c("County", "CornPix", "SoyBeansPix")
-des0 <- svydesign(ids = ~1, data = cornsoybean)
-summer.bhf.unit <- smoothUnit(formula = CornHec ~ CornPix + SoyBeansPix,
-                              family = "gaussian",
-                              domain = ~County,
-                              design = des0, X.pop = Xsummer,
-                              pc.u = 1000, pc.alpha = 0.01, level = 0.95)
+
cornsoybean$id <- 1:dim(cornsoybean)[1]
+Xsummer <- Xmean
+colnames(Xsummer) = c("County", "CornPix", "SoyBeansPix")
+des0 <- svydesign(ids = ~1, data = cornsoybean)
+## Warning in svydesign.default(ids = ~1, data = cornsoybean): No weights or
+## probabilities supplied, assuming equal probability
+summer.bhf.unit <- smoothUnit(formula = CornHec ~ CornPix + SoyBeansPix,
+                              family = "gaussian",
+                              domain = ~County,
+                              design = des0, X.pop = Xsummer,
+                              pc.u = 1000, pc.alpha = 0.01, level = 0.95)
+## Warning in smoothUnit(formula = CornHec ~ CornPix + SoyBeansPix, family =
+## "gaussian", : No spatial information provided, using iid domain effects

Below, we plot comparisons of the sae and SUMMER results.

-
par(mfrow = c(1, 2))
-range1 <- range(c(sae.bhf$est$eblup$eblup,summer.bhf.unit$median))
-plot(sae.bhf$est$eblup$eblup,summer.bhf.unit$iid.model.est$median,  
-     xlab = "sae package",
-     ylab = "SUMMER unit-level model package",
-     main = "Small area estimates",
-     xlim=range1,ylim=range1)
-abline(0,1)
-range2 <- range(c(sae.bhf$mse$mse, summer.bhf.unit$var))
-plot(sae.bhf$mse$mse, summer.bhf.unit$iid.model.est$var,  
-     xlab = "sae package mse",
-     ylab = "SUMMER unit-level model package model variance",
-     main = "Estimates of mse/variance",
-     xlim=range2,ylim=range2)
-abline(0,1)
-

-
-

BRFSS Data

-

Below, we provide an example comparing spatial models from -sae and SUMMER using data from the Behavioral -Risk Factor Surveillance System (BRFSS).

-
library(patchwork)
-data(BRFSS)
-data(KingCounty)
-BRFSS <- subset(BRFSS, !is.na(BRFSS$diab2))
-BRFSS <- subset(BRFSS, !is.na(BRFSS$hracode))
-mat <- getAmat(KingCounty, KingCounty$HRA2010v2_)
-design <- svydesign(ids = ~1, weights = ~rwt_llcp,
-                    strata = ~strata, data = BRFSS)
-direct <- svyby(~diab2, ~hracode, design, svymean)
-
-

Results using sae

-

Below, we use sae to smooth the logit-transformed direct -estimates.

-
direct$var <- direct$se ^ 2
-direct$logit.diab2 <- SUMMER::logit(direct$diab2)
-direct$logit.var <- direct$var / (direct$diab2 ^ 2 * (1 - direct$diab2) ^ 2)
-SFH.brfss <- sae::mseSFH(logit.diab2 ~ 1, logit.var, mat, data = direct)
-
-results <- data.frame(region = direct$hracode,
-                      eblup.SFH = SUMMER::expit(SFH.brfss$est$eblup), 
-                      mse = SFH.brfss$mse)
-
-
-

Results using SUMMER

-

Below, we fit two versions of the spatial area levelmodel in -SUMMER. If we change pc.u and -pc.alpha from the default value \(u=1,\alpha=0.01\) to \(u=0.1,\alpha=0.01\), we assign more prior -mass on smaller variance of the random effects, inducing more -smoothing.

-
summer.brfss <- smoothArea(diab2~1, domain= ~hracode,
-                           design = design,
-                           adj.mat = mat)
-summer.brfss.alt <- smoothArea(diab2~1, domain= ~hracode,
-                           design = design,
-                           adj.mat = mat, level = 0.95,
-                           pc.u = 0.1, pc.alpha = 0.01)
-

Finally, we use SUMMER::mapPlot to compare median -estimates and uncertainty estimates obtained via sae and -SUMMER.

-
toplot <-  summer.brfss$bym2.model.est
-toplot$logit.var <- toplot$var / 
-  (summer.brfss$bym2.model.est$median ^ 2 * 
-     (1 - summer.brfss$bym2.model.est$median) ^ 2)
-toplot$median.alt <-  summer.brfss.alt$bym2.model.est$median
-toplot$logit.var.alt <-  summer.brfss.alt$bym2.model.est$var / 
-  (summer.brfss.alt$bym2.model.est$median ^ 2 *
-     (1 - summer.brfss.alt$bym2.model.est$median) ^ 2)
-toplot$median.sae <- results$eblup.SFH
-toplot$mse.sae <- results$mse
-variables <- c("median", "median.alt",  "median.sae",
-               "logit.var", "logit.var.alt", "mse.sae")
-names <- c("Median (default prior)", "Median (new prior)",  "EBLUP (sae)",
-           "Variance (default prior)", "Variance (new prior)", "MSE (sae)")
-g1 <- mapPlot(data = toplot, geo = KingCounty,
-              variables=variables[1:3], 
-              labels = names[1:3], by.data = "domain",
-              by.geo = "HRA2010v2_", size = 0.1) 
-g2 <- mapPlot(data = toplot, geo = KingCounty,
-              variables=variables[4:6], labels = names[4:6],
-              by.data = "domain", by.geo = "HRA2010v2_", size = 0.1)  
-g1 / g2
-

-
par(mfrow = c(1, 2))
-range1 <- range(c(direct$diab2,toplot$median.alt))
-plot(direct$diab2,toplot$median,  
-     xlab = "direct estimates",
-     ylab = "model-based estimates",
-     main = "Small area estimates", col = 'red', pch = 16,
-     xlim=range1,ylim=range1)
-points(direct$diab2,toplot$median.sae,  col = 'blue', pch = 16)
-points(direct$diab2,toplot$median.alt,  col = 'cyan', pch = 16)
-legend('topleft', pch = 16, col = c('red', 'cyan', 'blue'), 
-       legend = c("SUMMER",'SUMMER (new prior)', "sae"),bty="n")
-abline(0,1)
-range2 <- range(c(direct$logit.var,toplot$mse.sae,toplot$logit.var.alt))
-plot(direct$logit.var,toplot$logit.var,  
-     xlab = "direct estimate var.",
-     ylab = "model-based uncertainty",
-     main = "Small area estimates", col = 'red', pch = 16, 
-     xlim=range2,ylim=range2)
-points(direct$logit.var,toplot$mse.sae,  col = 'blue', pch = 16)
-points(direct$logit.var,toplot$logit.var.alt,  col = 'cyan', pch = 16)
-legend('topleft', pch = 16, col = c('red', 'cyan','blue'),
-       legend = c("SUMMER var.", 'SUMMER var. (new prior)',"sae mse"),bty="n")
-abline(0,1)
-

+
par(mfrow = c(1, 2))
+range1 <- range(c(sae.bhf$est$eblup$eblup,summer.bhf.unit$median))
+plot(sae.bhf$est$eblup$eblup,summer.bhf.unit$iid.model.est$median,  
+     xlab = "sae package",
+     ylab = "SUMMER unit-level model package",
+     main = "Small area estimates",
+     xlim=range1,ylim=range1)
+abline(0,1)
+range2 <- range(c(sae.bhf$mse$mse, summer.bhf.unit$var))
+plot(sae.bhf$mse$mse, summer.bhf.unit$iid.model.est$var,  
+     xlab = "sae package mse",
+     ylab = "SUMMER unit-level model package model variance",
+     main = "Estimates of mse/variance",
+     xlim=range2,ylim=range2)
+abline(0,1)
+

-
-Arora, Vipin, and P. Lahiri. 1997. “On the SUPERIORITY OF THE -BAYESIAN METHOD OVER THE BLUP IN SMALL AREA -ESTIMATION PROBLEMS.” Statistica Sinica 7: 1053–63. -
Battese, George E, Rachel M Harter, and Wayne A Fuller. 1988. “An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data.” Journal of the American Statistical Association 83: 28–36.
-
-Fay, R. E., and R. A. Herriot. 1979. “Estimates of Income for -Small Places: An Application of James–Stein Procedure to -Census Data.” Journal of the American Statistical -Association 74: 269–77. -
-
-Molina, Isabel, and Yolanda Marhuenda. 2015. “Sae: An -R Package for Small -Area Estimation.” The R -Journal 7: 81–98. -
-
-You, Yong, and Beatrice Chapman. 2006. “Small Area -Estimation Using Area -Level Models and Estimated -Sampling Variances.” Survey -Methodology 32 (1): 97–103. -
-
-