#### level: A stochastic frontier approach Alessandro Magrini **Assessment of agricultural productivity change at country level: A stochastic frontier approach**

Assessment of agricultural productivity change at country

Department of Statistics, Computer Science, Applications – University of Florence, Italy. E-mail: alessandro.magrini@unifi.it Alessandro Magrini

# 1. Introduction

Productivity growth of agriculture is widely recognized as a key resource to meet food demand of the rapidly increasing world population, thus monitoring agricultural productivity change at country level is of core importance for international decision makers. The United States Department of Agriculture (USDA) represents the reference source for agricultural productivity change estimates at country level, covering almost all countries in the world for a long and updated period (from 1961 to 2016). USDA estimates consist of yearly changes in Total Factor Productivity (TFP) based on the growth accounting method, i.e., they are obtained as the ratio between the aggregated output and the sum of the input quantities weighted by their cost shares (Caves *et al.*, 1982). Growth accounting is a widely adopted methodology to assess TFP change due to several advantages, in particular it does not require assumptions on the characteristics of the production processes and allows to consider one decision making unit at a time. However, input cost shares are often partially available, and thus they should be approximated or imputed based on several different sources, like in the case of USDA estimates (see Fuglie, 2015, Table A2), with uncontrollable consequences on the accuracy of estimates. In addition, the growth accounting method has the limitation of assuming that the decision making units operate at their optimal conditions, thus it may overestimate TFP change in presence of technical inefficiency. Frontier-based methods like Data Envelopment Analysis (DEA, Charnes *et al.*, 1978) and Stochastic Frontier Models (SFMs, Schmidt & Sickles, 1984) represent valid alternatives because, by estimating the production frontier from the sample of decision making units, they can distinguish between change in technology and change in technical efficiency, and do not require input cost shares. The main difference between DEA and SFMs is that DEA does not make any assumption on the production frontier, but it is unable to account for random shocks independent of production and, as a consequence, all the deviations from the frontier are attributed to technical inefficiency. Instead, SFMs can disentangle technical inefficiency from external shocks, but they require parametric assumptions on the production frontier. Despite their appealing properties, SFMs and DEA have been employed only in some scattered studies (see the review in Kryszak *et al.*, 2021) and, as such, the available estimates are not comparable with USDA ones.

In this paper, we apply a SFM with translog specification to the same data on agricultural output and inputs employed by USDA, and exploit the generalized Malmquist index proposed by Orea (2002) to derive country level measures of agricultural TFP change, which are then compared with USDA estimates. Our preference for SFMs over DEA relies in the opportunity to account for external shocks and to assess differences in technology across countries, interactions among inputs and the trend of returns to scale.

### 2. Data and methodology

In this study, we employ the same data on which USDA estimates of agricultural TFP change are based (USDA, 2019). These data are sourced to Food and Agriculture Organization (FAO)

Alessandro Magrini, University of Florence, Italy, alessandro.magrini@unifi.it, 0000-0002-7278-5332 FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

<sup>183</sup> Alessandro Magrini, *Assessment of agricultural productivity change at country level: A stochastic frontier approach*, pp. 197-202, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.37, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

and to International Labour Organization (ILO), and integrated by modeled estimates. The output variable is the gross agricultural production (Y , thousand US dollars, 2004–2006 average international prices), while the input variables consist of six measures: land use (X1, rainfed cropland equivalents), labour force (X2, economically active adults), livestock (X3, cattle equivalents), machinery stock (X4, 40 CV tractor equivalents), fertilizer use (X5, tonnes of nutrients), animal feed (X6, megacalories of metabolizable energy). The data have annual frequency in the period 1961–2016 and cover the almost totality of countries in the world. In particular, the considered countries account for more than 99.7% of FAO's global gross agricultural output. Some national data have been aggregated to create consistent political units over time (e.g., former Yugoslavia, former Czechoslovakia, Ethiopia plus Eritrea, former Soviet Union) or to avoid very small measurements (e.g., Lesser Antilles, Micronesia), for a total of 170 countries. Futher details and descriptive statistics can be found in Fuglie (2015).

Let i = 1,...,n denote the decision making units (countries) and t = 1,...,T the time points (years). Also, let yi,t be the output level produced by unit i at time t and xi,j,t the level of the j-th input (j = 1,...,p) employed by unit i at time t. A Stochastic Frontier Model (SFM) has the following general form (Schmidt & Sickles, 1984):

$$y\_{i,t} = f(\mathbf{x}\_{i,t}; \Theta) \exp(v\_{i,t} - u\_{i,t}) \qquad i = 1, \ldots, n; \ t = 1, \ldots, T \tag{1}$$

where f is the production frontier, representing the maximum output level technically feasible based on a given combination of the inputs xi,t = (xi,1,t,...,xi,j,t,...,xi,p,t) and a given technology Θ, while vi,t ∈ R and ui,t ∈ R<sup>+</sup> are two random errors representing the deviation from the production frontier f due to shocks, respectively, independent of the producer and related to the production. As such, the maximum feasible output may differ from the maximum output level technically feasible due to the occurrence of either favourable or unfavourable events beyond the control of producers. Specifically, the maximum feasible output for unit i at time t is equal to y<sup>∗</sup> i,t = f(xi,t; Θ) exp(vi,t), thus technical efficiency is TEi,t = yi,t/y<sup>∗</sup> i,t = exp(−ui,t). We employ the following translog specification for f:

$$\begin{split} f(\mathbf{z}\_{i,t};\Theta) = \exp\left(\alpha\_i + \delta t + \gamma t^2 + \sum\_{j=1}^p \beta\_j \log x\_{i,j,t} + \sum\_{j=1}^p \sum\_{k=i}^p \beta\_{j,k} \log x\_{i,j,t} \log x\_{i,k,t} + \delta t\right) \\ + \sum\_{j=1}^p \lambda\_j t \log x\_{i,j,t} + \sum\_{j=1}^p \eta\_j t^2 \log x\_{i,j,t} \end{split} \tag{2}$$

This formulation is identical to the most commonly adopted one in the literature (see the review in Laureti, 2006, Chapter 3, and in Magrini, 2021), with the difference that we added parameters η1,...,η<sup>p</sup> to allow output elasticities to vary in time according to a quadratic trend, rather than to a linear one. The frontier specification in (2) leads to the following SFM:

$$\begin{aligned} \log y\_{i,t} &= \alpha\_i + \delta t + \gamma t^2 + \sum\_{j=1}^p \beta\_j \log x\_{i,j,t} + \sum\_{j=1}^p \sum\_{k=i}^p \beta\_{j,k} \log x\_{i,j,t} \log x\_{i,k,t} + \\ &+ \sum\_{j=1}^p \lambda\_j t \log x\_{i,j,t} + \sum\_{j=1}^p \eta\_j t^2 \log x\_{i,j,t} + v\_{i,t} - u\_{i,t} \end{aligned} \tag{3}$$

with εi,t = vi,t − ui,t. We complete the specification of the SFM by assuming:

$$\begin{aligned} v\_{i,t} &\sim\_{i.i.d.} \mathbf{N}(0, \sigma\_V^2) \\ u\_{i,t} &= \phi\_i t + \psi\_i t^2 + U\_i & U\_i &\sim\_{i.i.d.} \mathbf{N}^+(0, \sigma^2) \\ \mathbf{Cov}(v\_{i,t}, U\_i) &= 0 \; \forall i, t \end{aligned} \tag{4}$$

where parameters φ<sup>i</sup> and ψ<sup>i</sup> regulate the second order polynomial trend of the logarithmic technical efficiency of country i, 'i.i.d.' stands for 'independent and identically distributed', N(·) and N<sup>+</sup>(·) denote the Normal and the half Normal distribution, respectively. This specification for ui,t is the same as in Battese & Coelli (1995) with the addition of the quadratic term.

In order to account for technological gaps among countries with different level of development, we specify four separate models according to the WESP 2020 classification (United Nations, 2020): 'industrialized' (28), 'transition' (22), 'developing' (42), 'least developed' (78). Before estimating the parameters, the time variable is coded as the year minus 1961, thus t = 0, 1,..., 55, and the input variables are divided by their respective sample mean. This allows first order coefficients β1,...,β<sup>p</sup> to be interpreted as the output elasticity of each input evaluated at the sample mean and at the first time point (year 1961), and makes the output elasticity of the j-th input evaluated at the sample mean and at year s equal to βj+λ<sup>j</sup> (s−1961)+η<sup>j</sup> (s−1961)<sup>2</sup>.

TFP change is assessed through the generalized Malmquist index proposed by Orea (2002), which allows to account for variable returns to scale. Based on this index, the TFP change between two time points s and t (TFPCs,t) is decomposed into technological change (TCs,t), technical efficiency change (ECs,t), and scale change (SCs,t):

$$\text{TFPC}\_{s,t} = \text{TC}\_{s,t} \cdot \text{EC}\_{s,t} \cdot \text{SC}\_{s,t} \tag{5}$$

Orea (2002) showed that, under a translog production frontier, these three terms equate to:

$$\begin{split} \mathbf{TC}\_{s,t} &= \exp\left[\frac{1}{2} \left( \frac{\partial \log y\_{i,s}}{\partial s} + \frac{\partial \log y\_{i,t}}{\partial t} \right) \right] \\ \mathbf{HC}\_{s,t} &= \exp[\mathbb{E}(u\_{i,s} \mid \varepsilon\_{i,s}) - \mathbb{E}(u\_{i,t} \mid \varepsilon\_{i,t})] \\ \mathbf{SC}\_{s,t} &= \exp\left[\frac{1}{2} \sum\_{j=1}^{p} \left( \frac{\sum\_{j=1}^{p} e\_{i,j,s} - 1}{\sum\_{j=1}^{p} e\_{i,j,s}} e\_{i,j,s} + \frac{\sum\_{j=1}^{p} e\_{i,j,t} - 1}{\sum\_{j=1}^{p} e\_{i,j,t}} e\_{i,j,t} \right) \frac{\log x\_{i,j,t}}{\log x\_{i,j,s}} \right] \\ e\_{i,j,s} &= \frac{\partial \log y\_{i,s}}{\partial \log x\_{i,j,s}} \qquad e\_{i,j,t} = \frac{\partial \log y\_{i,t}}{\partial \log x\_{i,j,t}} \end{split} \tag{6}$$

## 3. Results

We performed maximum likelihood estimation of model (3) for each group of countries using the R package frontier (Coelli & Henningsen, 2020). Parameter estimates imply significant and positive output elasticities at the sample mean for the almost totality of time points in all the four models, suggesting consistency with the economic theory. Also, the quadratic component of the trend of output elasticities (parameters η<sup>j</sup> , j = 1,...,p) and of logarithmic technical inefficiencies (parameters ψi, i = 1,...,n) are significant, respectively, for most inputs and countries, supporting the adequacy of our model formulation.

Figure 1 displays the time series of the estimated overall elasticity at the sample mean, equal to the sum of all output elasticities at the sample mean by time point. Since the overall elasticity is almost always significantly lower than one for all groups of countries, we deduce that returns to scale are decreasing (and not constant) in the considered period. Based on this result, the assumption of constant returns to scale made by many authors appears just a simplification and not a real property of the production processes of the various countries.

Based on the estimated models, we computed TFPC and its components (TC, EC and SC) with s = t−1 (chained index numbers) and with s = 1961 (index numbers with base year 1961). Table 1 reports average annual percentage changes averaged by group of countries, while Figure 2 displays the time series of index numbers with base year 1961 for a selection of countries. We see that USDA estimates of TFP change are greater in absolute value than ours for most

Figure 1: Time series of the estimated overall elasticity at the sample mean. Shaded areas indicate 95% confidence intervals.

Table 1: Average annual percentage variation of our and USDA's estimates of TFP change averaged by geographical region. The region 'Africa, sub-Saharan' does not include South Africa, while the region 'Oceania' does not include Australia and New Zealand.


geographical regions and periods. Exceptions include North America, Sub-Saharan Africa, East Europe, Central Asia, North Africa and Australia-New Zealand, where our estimates are greater in absolute value than USDA ones, or even discordant, for at least half the periods shown in Table 1. From TFP changes at country level, we note that our and USDA's estimates are in substantial agreement for United States, France, United Kingdom, Australia, South Africa and India, while USDA estimates are very higher than ours for Germany, Italy, Japan, China and Brazil, and moderately higher for Russian Federation and former Yugoslavia. Instead, our estimates are fairly higher than USDA ones for Canada, Afghanistan and Somalia.

The difference between our and USDA's estimates may be due to the presence of techni-

Figure 2: Our and USDA's estimates of TFP change for a selection of countries (indices, 1961=1). The time series of TFPC is shown in blue (TC, EC and SC denoted respectively by straight, dashed and dash-dotted black lines), while USDA estimates are shown in red.

cal inefficiency, that can be taken into account only by stochastic frontier models, but also to inaccuracies in USDA's input cost shares and/or in our model specification. Furthermore, our model is able to detect changes in input use through the term SC, which appears generally non-negligible, coherently with the evidence found in favour of decreasing returns to scale.

To provide an overall assessment on the agreement between our and USDA's estimates, we computed the Person correlation by country and found a median equal to 0.857, with first and third quartile equal to 0.554 and 0.943, respectively. These correlations emphasize that our and USDA's methodology provide different results but in substantial agreement, thus confirming the different theoretical foundations and suggesting the empirical validity of both of them. Full results are available at https://github.com/alessandromagrini/agrTFP.

# 4. Concluding remarks

We have estimated agricultural TFP change at country level based on the same data employed by the United States Department of Agriculture (USDA) using a stochastic frontier model instead of the growth accounting method. This work has the value to provide, for the first time in the literature, a comparison between agricultural TFP changes estimated with different methodologies, and an additional data source that can be employed in a large variety of longitudinal economic analyses at country level.

Our methodology overcomes the limitation of USDA estimates which rely on approximated and imputed input cost shares, and of the growth accounting method in general, which ignores technical inefficiency. However, the accuracy of estimates based on a stochastic frontier are sensitive to model specification. For this reason, we employed a more flexible specification than those adopted in the literature, but, since it is based on deterministic trends, it may be inadequate for long periods, like the one considered in this study. We also paid attention to account for heterogeneity in technology among the various countries by specifying four separate models based on the level of development.

In the future, we plan to improve our methodology by introducing autoregressive coefficients to represent stochastic trends, and by specifying latent classes to account for heterogeneity in technology among the various countries.

## References

