Analisis Komparatif Metode Clustering dan Regresi untuk Prediksi Pola Curah Hujan Menggunakan Pendekatan Data Mining

Authors

  • Muhammad Najhan Tsaani Universitas Pamulang
  • Yasin Kamil Universitas Pamulang
  • Octaviana Anugrah Ade Purnama Universitas Pamulang

DOI:

https://doi.org/10.55606/jutiti.v5i2.5467

Keywords:

Clustering, Data Mining, Machine Learning, Rainfall, Regression

Abstract

This research implements data mining techniques to analyze rainfall patterns through two different approaches: clustering and regression. The first dataset contains monthly rainfall data from East Kalimantan in 2005, analyzed using three clustering algorithms—K-Means, Agglomerative, and MeanShift to identify seasonal patterns. The second dataset uses multiregional rainfall data since 1979, analyzed using three regression algorithms—Linear Regression, Random Forest, and Support Vector Regression (SVR)—to predict June rainfall based on data from the previous five months. Evaluation results show that K-Means and Agglomerative produce the same clustering performance with a silhouette score of 0.4913, successfully grouping data into three main seasonal clusters. Meanwhile, MeanShift produces five clusters but is less effective on small-scale datasets. For regression prediction, Random Forest shows the best performance with an R² score of 0.8921, followed by Linear Regression (0.8402), while SVR produces the lowest performance (0.0077). This research demonstrates that a combination of unsupervised and supervised learning methods can provide a more comprehensive understanding of seasonal patterns and quantitative rainfall estimation. These findings have potential applications in decision-making related to water resource management, agricultural planning, and climate risk mitigation.

 

Downloads

Download data is not yet available.

References

Aldrian, E., & Susanto, R. D. (2003). Identification of three dominant rainfall regions within Indonesia and their relationship to sea surface temperature. International Journal of Climatology, 23(12), 1435-1452.

Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 1027-1035).

Asra, A., & Rudiansyah. (2013). Statistika Terapan. Bogor: In Media.

Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197-227.

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York: Springer.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247-1250.

Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603-619.

Dajan, A. (1986). Metode Statistika Jilid 1. Jakarta: LP3ES.

Fukunaga, K., & Hostetler, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1), 32-40.

García, S., Luengo, J., & Herrera, F. (2015). Data preprocessing in data mining. Cham: Springer.

Hamada, J. I., Yamanaka, M. D., Matsumoto, J., Fukao, S., Winarso, P. A., & Sribimawati, T. (2002). Spatial and temporal variations of the rainy season over Indonesia and their link to ENSO. Journal of the Meteorological Society of Japan, 80(2), 285-310.

Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques (3rd ed.). Waltham: Morgan Kaufmann.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). New York: Springer.

Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification (Technical Report). Department of Computer Science, National Taiwan University.

Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666.

Jain, S. K., & Singh, V. P. (2003). Water resources systems planning and management. Developments in Water Science, 51. Amsterdam: Elsevier.

Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241-254.

Kaufman, L., & Rousseeuw, P. J. (2009). Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken: John Wiley & Sons.

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281-297).

Monahan, S. S. (1981). A multivariate approach to the problem of forecasting seasonal rainfall. Journal of Applied Meteorology, 20(11), 1315-1325.

Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to Linear Regression Analysis (5th ed.). Hoboken: John Wiley & Sons.

Murtagh, F., & Contreras, P. (2012). Algorithms for hierarchical clustering: An overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1), 86-97.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.

Qian, B., Corte-Real, J., & Xu, H. (2002). Multisite stochastic weather models for impact studies. International Journal of Climatology, 22(11), 1377-1397.

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.

Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199-222.

Steel, R. G. D., & Torrie, J. H. (1980). Principles and Procedures of Statistics (2nd ed.). New York: McGraw-Hill.

Tripathi, S., Srinivas, V. V., & Nanjundiah, R. S. (2006). Downscaling of precipitation for climate change scenarios: A support vector machine approach. Journal of Hydrology, 330(3-4), 621-640.

Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York: Springer.

Wang, X., Qiu, W., & Zamar, L. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, 52(1), 286-298.

Wilby, R. L., & Wigley, T. M. L. (1997). Downscaling general circulation model output: a review of methods and limitations. Progress in Physical Geography, 21(4), 530-548.

Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79-82.

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.). Burlington: Morgan Kaufmann.

Downloads

Published

2025-07-03

How to Cite

Muhammad Najhan Tsaani, Yasin Kamil, & Octaviana Anugrah Ade Purnama. (2025). Analisis Komparatif Metode Clustering dan Regresi untuk Prediksi Pola Curah Hujan Menggunakan Pendekatan Data Mining. Jurnal Teknik Informatika Dan Teknologi Informasi, 5(2), 71–86. https://doi.org/10.55606/jutiti.v5i2.5467

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.