Introduction to snfa

Economic efficiency analyses consider two separate measures of efficiency:

  • Technical efficiency: How effectively inputs are transformed into outputs

  • Allocative inefficiency: How effectively the production plan maximizes profit

Traditionally, data envelopment analysis (DEA) and stochastic frontier analysis (SFA) have been used to estimate technical efficiency. Both methods estimate a production frontier that quantifies the maximum quantity of output producible by a given bundle of inputs. The observed output of a decision making unit (DMU) can be compared to the frontier to arrive at an estimate of efficiency. SFA assumes efficiency is randomly determined and follows a distribution common across DMUs (Aigner et al. 1977). A standard stochastic frontier model takes the form yi = f(Xi) − δi + εi, where i indexes the DMU, yi is log-output, f is a transformation function, Xi are inputs, δi > 0 is log-efficiency, and εi is observational error. The log-efficiency term δi may be assumed to follow a half-normal, exponential, or other positively-defined distribution. Traditionally, SFA uses a parametric approximation for f.

DEA, on the other hand, assumes the production frontier is deterministic and defined by highest observed output (conditional on inputs used). The estimated frontier is piecewise-linear connecting maximum observed output conditional on inputs used while enforcing monotonicity and concavity constraints (as well as constraints on returns to scale, if those are specified). An example DEA frontier using variable returns to scale is shown below:

library(ggplot2)
# library(rDEA) #rDEA not installing with travis-ci, dea method below is included in snfa
data(univariate)

dea.fit <- dea(univariate$x, univariate$y,
               univariate$x, univariate$y,
               model = "output",
               RTS = "variable")
univariate$frontier <- univariate$y / dea.fit$thetaOpt
ggplot(univariate, aes(x, y)) +
  geom_point() +
  geom_line(aes(y = frontier), color = "red")

Allocative efficiency is evaluated by checking whether DMUs’ first-order condition for profit-maximization is satisifed. Specifically, if DMUs are price-takers in input and output markets, first-order conditions are given by $$\frac{\partial f(X_i)}{\partial x^j} = \frac{w_i^j}{p_i},$$ where superscripts index inputs, wij is the cost of input j to DMU i, and pi is the price of output for DMU i. If efficiency is multiplicative, so that f(Xi) = p(Xi)δi for a frontier function p, then the first-order condition can be expressed as $$\delta_i\frac{\partial p(X_i)}{\partial x^j} = \frac{w_i^j}{p_i}.$$ Empircally, log-overallocation of input j by DMU i can be estimated by $$\log\left(w_i^j\right) + \log\left(p_i\right) - \log\left(\delta_i\frac{\partial p(X_i)}{\partial x^j}\right).$$ If this quantity is positive, DMU i used more of input j than would be profit maximizing.

As detailed above, to estimate allocative inefficiency one needs to be able to estimate marginal input productivities, which are derivatives of the production function/frontier. Since DEA is piecewise-linear, there can be points on the estimated frontier where the derivative is undefined. Using the previous example, the points in blue in the following plot have undefined derivatives.

Thus, DEA is inappropriate for estimating allocative inefficiency.

Smooth non-parametric frontier analysis (SNFA) is a smooth analogue of DEA (Racine et al. 2009). It uses constrained kernel smoothing that ensures the estimated frontier lies above observed output for each observation, as well as imposing monotonicity or concavity constraints, if specified. An increasing, concave boundary is fit below:

X <- as.matrix(univariate$x)
y <- univariate$y

N.fit <- 100
X.fit <- as.matrix(seq(min(X), max(X), length.out = N.fit))

#Reflect data for fitting
reflected.data <- reflect.data(X, y)
X.eval <- reflected.data$X
y.eval <- reflected.data$y

frontier.mc <- fit.boundary(X.eval, y.eval, 
                            X.bounded = X, y.bounded = y,
                            X.constrained = X.fit,
                            X.fit = X.fit,
                            method = "mc")

frontier.df <- data.frame(x = X.fit,
                          y = frontier.mc$y.fit)
ggplot(univariate, aes(x, y)) +
  geom_point() +
  geom_line(data = frontier.df, color = "red")

slope.df <- data.frame(x = X.fit,
                       slope = frontier.mc$gradient.fit)
ggplot(slope.df, aes(x, slope)) +
  geom_line()

A more in-depth example examining different constraints can be found in the example for fit.boundary:

example("fit.boundary")

The function allocative.efficiency uses SNFA to fit a production frontier, then uses the frontier to derive estimates of marginal input productivities. Those marginal productivities are compared to ratio of input to output prices to arrive at an estimate of overallocation. The example in allocative.efficiency estimates overallocation of labor and capital in the U.S. using macroeconomic data. First, data is loaded and cleaned:

data(USMacro)

USMacro <- USMacro[complete.cases(USMacro),]

#Extract data
X <- as.matrix(USMacro[,c("K", "L")])
y <- USMacro$Y

X.price <- as.matrix(USMacro[,c("K.price", "L.price")])
y.price <- rep(1e9, nrow(USMacro)) #Price of $1 billion of output is $1 billion

Then, the model is fit with allocative.efficiency:

#Run model
efficiency.model <- allocative.efficiency(X, y,
                                          X.price, y.price,
                                          X.constrained = X,
                                          model = "br",
                                          method = "mc")

Finally, results are plotted and average overallocation is estimated:

#Plot technical/allocative efficiency over time
library(ggplot2)

technical.df <- data.frame(Year = USMacro$Year,
                           Efficiency = efficiency.model$technical.efficiency)

ggplot(technical.df, aes(Year, Efficiency)) +
  geom_line()

allocative.df <- data.frame(Year = rep(USMacro$Year, times = 2),
                            log.overallocation = c(efficiency.model$log.overallocation[,1],
                                                   efficiency.model$log.overallocation[,2]),
                            Variable = rep(c("K", "L"), each = nrow(USMacro)))

ggplot(allocative.df, aes(Year, log.overallocation)) +
  geom_line(aes(color = Variable))

#Estimate average overallocation across sample period
lm.model <- lm(log.overallocation ~ 0 + Variable, allocative.df)
summary(lm.model)
#> 
#> Call:
#> lm(formula = log.overallocation ~ 0 + Variable, data = allocative.df)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.43704 -0.18195 -0.08572  0.14338  0.61385 
#> 
#> Coefficients:
#>           Estimate Std. Error t value Pr(>|t|)    
#> VariableK   2.0297     0.0465   43.65   <2e-16 ***
#> VariableL  -0.8625     0.0465  -18.55   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.279 on 70 degrees of freedom
#> Multiple R-squared:  0.9698, Adjusted R-squared:  0.9689 
#> F-statistic:  1124 on 2 and 70 DF,  p-value: < 2.2e-16

References

Aigner D, Lovell CK, Schmidt P (1977). “Formulation and estimation of stochastic frontier production function models.” Journal of Econometrics, 6(1), 21-37.

Racine JS, Parmeter CF, Du P (2009). “Constrained nonparametric kernel regression: Estimation and inference.” Working paper.