#### Maurizio Carpita <sup>a</sup> , Silvia Golia <sup>a</sup> <sup>a</sup> Department of Economics and Management, University of Brescia, Brescia, Italy **Prediction of wine sensorial quality: a classification problem**

Prediction of wine sensorial quality: a classification problem

Maurizio Carpita, Silvia Golia

# 1. Introduction

When dealing with a wine, it is of interest to be able to predict its quality based on chemical and/or sensory variables. There is no agreement on what wine quality means, or how it should be assessed and it is often viewed in intrinsic (physicochemical, sensory) or extrinsic (price, prestige, context) terms (Jackson, 2017). For example, in Golia et al. (2017) it was measured by a global score of quality, ranging from 0 to 100, produced by Altroconsumo, an Italian independent consumer's association, and based on a large set of variables including chemical and sensory variables, as well as variables of context. Cortez et al. (2009) used an indicator, ranging from 0 to 10 with 0 meaning very bad and 10 excellent, obtained from the evaluations of experienced judges who scored the wines.

In this study we started from the Cortez et al. (2009) paper, but we maintained the categorical nature of the variable measuring the wine sensorial quality. The approach to the prediction of this categorical variable followed by Cortez and coauthors makes use of the observed wine quality, but it suffers from the fact that it is necessary to know the wine quality measure. Instead, in this paper we started from the predicted probabilities' record of the categories of the target variable, obtained from the application of the Cumulative Logit Model, and then we applied a classifier in order to predict the final category. This last step is the one of interest for this paper; in fact we will compare the predictive performances of the default method (Bayes Classifier), which assigns a unit to the most likely category, and other two methods (Maximum Difference Classifier and Maximum Ratio Classifier). In order to do that, we will use the data analysed in Cortez et al. (2009) concerning both the white and red variants of the Portuguese "Vinho Verde" wine.

The paper is organized as follows. Section 2 discusses the categorical classifiers used in this study, whereas Section 3 reports the results concerning the prediction of the wine sensorial quality. Conclusions follow in Section 4.

### 2. The categorical classifiers

As stated in the introduction, the statistical problem of this study refers to the way in which the record of the predicted occurrence probabilities of each of the categories of the categorical target variable is transformed into a single value. The default method is the *Bayes Classifier* (BC), which assigns a unit to the most likely category. BC has the property to minimize, on average, the test error rate (James et al., 2013), so it is the optimal criterion when the accuracy of the classification is the main goal. Nevertheless, BC favors the prevalent category most and when there is not a category of interest but all the categories have the same relevance, it can not be the best choice.

Starting from this observation, in Golia and Carpita (2018, 2020) we have investigated the performances of different categorical classifiers (some of them take into account also the ordinal nature of the target variable) and we have found the so-called *Maximum Difference Classifier* (MDC) promising. In this study we considered MDC and a new classifier denoted as *Maximum*

Maurizio Carpita, Silvia Golia, *Prediction of wine sensorial quality: a classification problem*, pp. 235-238, © 2021 Author(s), CC BY 4.0 International, DOI 10.36253/978-88-5518-461-8.44, in Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci (edited by), *ASA 2021 Statistics and Information Systems for Policy Evaluation. Book of short papers of the on-site conference*, © 2021 Author(s), content CC BY 4.0 International, metadata CC0 1.0 Universal, published by Firenze University Press (www.fupress.com), ISSN 2704-5846 (online), ISBN 978-88-5518-461-8 (PDF), DOI 10.36253/978-88-5518-461-8

Maurizio Carpita, University of Brescia, Italy, maurizio.carpita@unibs.it, 0000-0001-7998-5102 Silvia Golia, University of Brescia, Italy, silvia.golia@unibs.it, 0000-0003-0015-8126

<sup>219</sup> FUP Best Practice in Scholarly Publishing (DOI 10.36253/fup\_best\_practice)

*Ratio Classifier* (MRC). Both classifiers are based on the comparison between the predicted probabilities and the sample frequencies and they are defined as follows.

Let pr<sup>i</sup> be the predicted probability of the category c<sup>i</sup> (i = 1, 2,...,k) of the categorical variable C, and let fr<sup>i</sup> be the corresponding frequency computed from observed data. The MDC computes the deviations of pr<sup>i</sup> from fr<sup>i</sup> and takes the category corresponding to the maximum difference, that is:

$$MDC: \arg\max\_{i \in (c\_1, c\_2, \dots, c\_k)} (pr\_i - fr\_i).$$

This classifier represents the extension of what proposed by Cramer (1999) for the dichotomous case.

The MRC computes the relative deviations of pr<sup>i</sup> from fr<sup>i</sup> and takes the category corresponding to the maximum ratio, that is:

$$MRC: \arg\max\_{i \in \left(c\_1, c\_2, \dots, c\_k\right)} \left(pr\_i/fr\_i\right).$$

### 3. The prediction of wine quality

The data under study concern the sensorial quality of the white and red variants of the Portuguese "Vinho Verde" wine (Cortez et al., 2009). The wine quality was measured by a sensory preference variable, from now on denoted as SPV, using a 0-10 scale. For each wine, eleven of the most common physicochemical variables were recorded; they represent the explanatory variables for the SPV, which is the target variable. Table 1 reports the frequencies of SPV scores observed in the white and red wine data sets; not all the available scores were used and some of them own a low frequency.

Table 1: Frequencies of the sensory preferences observed in the white and red wine data sets


The model used to study and predict the occurrence probabilities of each of the categories of the SPV, is the *Cumulative Logit Model* (CLM) (Agresti, 2010), defined as follows. Let Y be a categorical target variable with k ordinal categories {1, 2,...,k}, and let {X1,...,Xp} be a set of explanatory variables; for the statistical unit s, the CLM has the following form:

$$\text{logit}[P(Y\_s \le i)] = \log \frac{P(Y\_s \le i)}{1 - P(Y\_s \le i)} = \alpha\_i + \sum\_{m=1}^p \beta\_m x\_{sm}, \quad \text{for } i = 1, 2, \dots, k - 1.$$

Once estimated the parameters, it is possible to use the model for predictive purposes, so the CLM gives the k predicted probabilities that are passed to the categorical classifier.

In order to evaluate the predictive performance of a classifier, some indicators computed from the confusion matrix can be used. In this study they are: the *Sensitivity* (Sen) of each category, the *Maximum Distance Between Sensitivities* (MDBSen), the *Overall Accuracy* (OvAc), the *Macro Average F1 score* (MAF1) and the *Kappa statistic* (Kappa) (Raschka and Mirjalili, 2019). Sen<sup>i</sup> expresses how well the classifier recognizes a unit belonging to the category ci. MDBSen, defined as:

$$\mathbf{MDBSen} = \max\_{i \neq j} |\mathbf{Sen}\_i - \mathbf{Sen}\_j|,$$

highlights the balanced or unbalanced ability of the classifier to assign a unit to the right category, the lower the MDBSen, the more balanced the classification. The OvAc is the rate of correct classification and it is the indicator maximized by BC. The MAF1 is another indicator to measure the accuracy of the classifier and it is obtained as the average of the F1 scores classby-class. The choice of MAF1 instead of the weighted average F1 score, is linked to the will to attribute the same relevance to all classes. Kappa is used to measure the agreement between the actual and the predicted classifications of a dataset, while correcting for agreement occurred by chance.

Table 2 reports the value of these statistics computed on the base of the in-sample prediction of the SPV of all the available wines. For the sake of clarity, we added the percentage variation of OvAc, MAF1 and Kappa with respect to the value obtained applying BC in the last three rows.


In the face of an expected but limited reduction in OvAc (6.4% for white wines and 2.7% for red wines), MDC performs better than BC with respect to MAF1 and Kappa and shows more balanced values of the sensitivities, especially for the white wines. MRC outperforms both BC and MDC in terms of balancing the sensitivities, but loses a lot in terms of OvAc and Kappa.

Given that the lowest and highest sensory preferences have low frequency, we merged the first two and the last two categories for both the two varieties of wine, obtaining a SPV on a 5-category ordinal scale for white wine and on a 4-category ordinal scale for red wines.

Table 3 reports the indicators of Table 2 with the exception of the sensitivities of the single categories. The results show the same behaviour observed in Table 2. It is of interest to note that also in this case there are some categories with a low frequency and others that absorb the majority of the statistical units.

# 4. Conclusions

In this paper we investigated the impact of different classifiers in the capability to predict the wine sensorial quality of the Portuguese "Vinho Verde" wine. We have studied this variable applying the CLM for prediction purposes. We have transformed the prediction of the occurrence probabilities of each of its categories into a single sensory preference through three


Table 3: Performance Indicators for white and red wines after merging some categories

different classifiers, the BD, the MDC and the MRC. The results have shown that, despite an expected but limited reduction of the overall accuracy, the MDC seems to be the suitable categorical classifier in an unbalanced context (that is when some categories absorb almost all the statistical units) and when all the categories have equal importance (i.e. different types of mis-classification do not involve different costs).

# References


Jackson R.S. (2017). *Wine Tasting, 3rd ed*. Academic Press.

