Chapter Prediction of wine sensorial quality: a classification problem
When dealing with a wine, it is of interest to be able to predict its quality based on chemical and/or sensory variables. There is no agreement on what wine quality means, or how it should be assessed and it is often viewed in intrinsic (physicochemical, sensory) or extrinsic (price, prestige, context) terms (Jackson, 2017). In this paper, the wine quality was evaluated by experienced judges who scored the wine on the base of a 0-10 scale, with 0 meaning very bad and 10 excellent, so, the resulting variable was categorical. The models applied to predict this variable provide the prediction of the occurrence probabilities of each of its categories. Nevertheless, jointly with this probabilities’ record, the practitioners need the predicted value (category) of the variable, so the statistical problem to be covered refers to the way in which this probabilities’ record is transformed into a single value. In this paper we compare the predictive performances of the default method (Bayes Classifier - BC), which assigns a unit to the most likely category, and other two methods (Maximum Difference Classifier and Maximum Ratio Classifier). The BC is the optimal criterion if one is interested in the accuracy of the classification, but, given that it favors the prevalent category most, when there is not a category of interest, it cannot be the best choice. The data under study concern the quality of the red variant of the Portuguese "Vinho Verde" wine (Cortez et al., 2009), measured on a 0-10 scale. Nevertheless, only 6 scores were used, with 2 scores with a very few number of observations, so this is the right context for predictive performance comparisons. In the study, we investigated different merging of categories and we used 11 explanatory variables to estimate the probabilities’ record of the wine quality variable.