Identification of patterns that influence low industrial yields in sugar manufacturing
Keywords:
Minería de Datos, Análisis Predictivo, Rendimiento Industrial Azucarero, Aprendizaje de Reglas, Árboles de DecisiónAbstract
The computerization of sugar industry processes generates a large amount of data that is increasingly stored over the years. Currently, the application of the programs of the agroindustrial platform existing in the Higher Management Organization for the Sugar Agroindustry (AZCUBA) has guaranteed the speed and quality of harvest information. The Cuban sugar industry requires the implementation of scientific tools and methods that allow the identification of hidden patterns and behaviors in its historical data. This article presents the use of knowledge extraction techniques from data to identify the causes that are influencing low industrial yields. Among the materials used are the databases of ten years of harvest (2010-2019), which each present more than 4 million transactional records and an average of 578 indicators per year. The methodology selected to establish a framework for the life cycle of the data mining process was CRISP-DM. The tool selected to apply the data mining techniques was the KNIME data analysis platform. A predictive analysis of the data was performed, in which symbolic methods were used. The metrics of seven machine learning algorithms were compared: CONJUNCTIVERULE, DECISIONTABLE, RIDOR, FURIA, PART, JRIP, J48, for the selection of features, and the selection of the algorithm for classification was determined. The attributes that influence low industrial performance are obtained and validated. The bases for a deeper analysis of the necessary organizational and control measures are created, with the objective of losing sugar in the industrial process. It is recommended to perform a prescriptive analysis of the data, to predict logistics scenarios.
References
Akhiat, Y., Asnaoui, Y., Chahhou, M., & Zinedine, A. (2020). A new graph feature selection approach. 2020 6th IEEE Congress on Information Science and Technology (CiSt), 156-161. Agadi Essaouira, Morocco: IEEE. https://doi.org/10.1109/CiSt49399.2021.9357067
Akhiat, Y., Manzali, Y., Chahhou, M., & Zinedine, A. (2021). A New Noisy Random Forest Based Method for Feature Selection. Cybernetics and Information Technologies, 21(2), 10-28. https://doi.org/10.2478/cait-2021-0016
Cala Jústiz, Y., Pacheco Feria, U., & Sánchez Jiménez, M. (2020). Análisis de indicadores de eficiencia productiva y perspectivas de la industria azucarera en Santiago de Cuba. Anuario Facultad de Ciencias Económicas y Empresariales, 91-106.
Casillas, J., Cordón, O., Del Jesus, M. J., & Herrera, F. (2001). Genetic feature selection in a fuzzy rule-based classification system learning process for high-dimensional problems. Information Sciences, 136(1), 135-157. https://doi.org/10.1016/S0020-0255(01)00147-5
Concepción Cruz, E., Caraballoso Torrecilla, V., Nápoles Alberto, R. G., Morales Fundora, L., Cruz Coca, O., & Viñas Quintero, Y. (2015). Problemas asociados al rendimiento agrícola de la caña de azúcar en la Cooperativa Potrerillo, provincia Sancti Spíritus. Centro Azúcar, 42(2), 83-92.
García Fernández, J. (2017). Modelos híbridos de aprendizaje basados en instancias y reglas para Clasificación Monotónica (Tesis de Doctorado, Jaén: Universidad de Jaén). Jaén: Universidad de Jaén. Recuperado de http://ruja.ujaen.es/jspui/handle/10953/864
García, S., Luengo, J., & Herrera, F. (2015). Data Preprocessing in Data Mining. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-10247-4
González Pérez, F., Castellanos Álvarez, J. A., & Puertas Fernández, J. F. (2010). Método para determinar la cantidad de agua de imbibición a utilizar en la industria de azúcar de caña. Ingeniería Mecánica, 13(1), 41-48.
Guerra González, J. D. (2019). La estructuración de las cepas y los cultivares de caña de azúcar en la Cooperativa de Producción Agropecuaria 10 de Octubre. (Thesis, Universidad de Matanzas. Facultad de Ciencias Agropecuarias). Universidad de Matanzas. Facultad de Ciencias Agropecuarias. Recuperado de http://rein.umcc.cu/handle/123456789/829
Hernández Orallo, J., Ramárez Quintana, M. J., & Ferri Ramírez, C. (2004). Introducción a la Minería de Datos. España: Pearson Educacion. S.A.
Huang, C., Huang, X., Fang, Y., Xu, J., Qu, Y., Zhai, P., … Li, J. (2020). Sample imbalance disease classification model based on association rule feature selection. Pattern Recognition Letters, 133, 280-286. https://doi.org/10.1016/j.patrec.2020.03.016
Lemarie, F. (2021). Capítulo 1. Técnicas de Análisis de Datos en WEKA. Recuperado de https://www.academia.edu/61030769/Cap%C3%ADtulo_1_T%C3%A9cnicas_de_An%C3%A1lisis_de_Datos_en_WEKA_CAP%C3%8DTULO_1_T%C3%89CNICAS_DE_AN%C3%81LISIS_DE_DATOS_EN_WEKA
Li, Y., & Wu, Z.-F. (2008). Fuzzy feature selection based on min–max learning rule and extension matrix. Pattern Recognition, 41(1), 217-226. https://doi.org/10.1016/j.patcog.2007.06.007
Matute, L., Bedoya, C., & Feo, J. (2012). Determinación de la concentración óptima de floculante a usar en la clarificación de jugos de caña en un central azucarero. Revista de la Facultad de Agronomía, 38(3). Recuperado de http://saber.ucv.ve/ojs/index.php/rev_agro/article/view/5903
Mauricio Munar, A., Rodríguez Carlosama, A., & Muñoz España, J. L. (2022). Potenciales áreas cultivables de pasifloras en una región tropical considerando escenarios de cambio climático. Revista de Investigación Agraria y Ambiental (RIAA), 3(1). Recuperado de http://portal.amelica.org/ameli/journal/130/1302674008/html/
Mesa Pérez, F. (2019). Estudio y análisis del funcionamiento de técnicas de minería de datos en conjuntos de datos relacionados con la Biología (Tesis de Grado, Universidad de Jaén). Universidad de Jaén, España. Recuperado de http://tauja.ujaen.es/jspui/handle/10953.1/10372
Peloia, P. R., Bocca, F. F., & Rodrigues, L. H. A. (2019). Identification of patterns for increasing production with decision trees in sugarcane mill data. Scientia Agricola, 76(4), 281-289. https://doi.org/10.1590/1678-992x-2017-0239
Prometeus GS-Editor Team. (2019, febrero 21). Análisis de datos predictivo, descriptivo y prescriptivo ¿En qué consisten? Recuperado 26 de febrero de 2023, de Prometeus Global Solutions website: https://prometeusgs.com/analisis-de-datos-diferencias/
Ribas García, M., Consuegra del Rey, R., & Alfonso Alfonso, M. (2016). Análisis de los factores que más inciden sobre el rendimiento industrial azucarero. Centro Azúcar, 43(1), 51-61.
Sasikanth, T., Krishnam Raju, M., Naveen Kumar, E., & Kurumalla, S. (2019). Prediction of crop yield using data mining techniques. IJESRT, 8(3), 6.
Siqueira, T., Rodrigues, L. H., Bocca, F., & Oliveira, M. (2017, octubre 21). Decision trees for knowledge discovery on the yield decline of sugarcane ratoons. https://doi.org/10.19146/pibic-2017-78279
Topouzelis, K., & Psyllos, A. (2012). Oil spill feature selection and classification using decision tree forest on SAR image data. ISPRS Journal of Photogrammetry and Remote Sensing, 68, 135-143. https://doi.org/10.1016/j.isprsjprs.2012.01.005
Vieira Ribeiro, N., Antunes Rodrigues, L. H., & Pires Gravina de Oliveira, M. (2017). Development of predictive models using Data Mining techniques to detect borer infestation (Diatraea saccharalis) in sugarcane culture | Galoá Proceedings. Presentado en XXV Congresso de Iniciação Científica da Unicamp, Brasil. Brasil. Recuperado de https://proceedings.science/unicamp-pibic/pibic-2017/papers/development-of-predictive-models-using-data-mining-techniques-to-detect-borer-infestation--diatraea-saccharalis--in-suga#
Zhou, L., Si, Y.-W., & Fujita, H. (2017). Predicting the listing statuses of Chinese-listed companies using decision trees combined with an improved filter feature selection method. Knowledge-Based Systems, 128, 93-101. https://doi.org/10.1016/j.knosys.2017.05.003
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Yohan Gil Rodríguez, Raisa Socorro Llanes, Lérida Hernández Nodarse
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.