A Hybrid Ensemble Framework for Probabilistic Earthquake Forecasting in Northern California in Support of SDG 11: Sustainable and Resilient Cities
Abstract
Forecasting earthquakes is still one of the most difficult problems in geophysics, mainly because seismic activity is irregular and often influenced by many factors that interact in complex ways. In this study, we develop a leakage-controlled hybrid ensemble model that combines CatBoost, LightGBM, XGBoost, and Gradient Boosting to predict five earthquake parameters: magnitude, depth, latitude, longitude, and a scaled inter-event interval in Northern California. These models were trained using USGS earthquake data ranging from 1900 to 2025 (M ≥ 4.0), with a process designed to prevent time leakage through strict time separation, a moving window feature, and prospective validation. Overall, the hybrid models produced consistently low MAE and RMSE values and very high R² values (above 0.99) for all target variables. While the estimates performed impressively, the results should be interpreted in a probabilistic context, with recognition of the inherent uncertainty of seismic processes. The framework proposed here provides a clear and replicable approach that can support the development of systems for more reliable short-term earthquake forecasting
Full text article
References
Abri, R., & Artuner, H. (2022). LSTM-based deep learning methods for prediction of earthquakes using ionospheric data. Gazi University Journal of Science, 35(4), 1417–1431. https://doi.org/10.35378/gujs.950387
Ahn, J. M., Kim, J., & Kim, K. (2023). Ensemble machine learning of gradient boosting (XGBoost, LightGBM, CatBoost) and attention-based CNN-LSTM for harmful algal blooms forecasting. Toxins, 15(10), 608. https://doi.org/10.3390/toxins15100608
Asim, K. M., Moustafa, S. S., Niaz, I. A., Elawadi, E. A., Iqbal, T., & Martínez-Álvarez, F. (2020). Seismicity analysis and machine learning models for short-term low magnitude seismic activity predictions in Cyprus. Soil Dynamics and Earthquake Engineering, 130, 105932. https://doi.org/10.1016/j.soildyn.2019.105932
Båth, M. (1965). Lateral inhomogeneities of the upper mantle. Tectonophysics, 2(6), 483–514. https://doi.org/10.1016/0040-1951(65)90003-X
Chitkeshwar, A. (2024). The role of machine learning in earthquake seismology: A review. Archives of Computational Methods in Engineering, 31(7), 3963–3975. https://doi.org/10.1007/s11831-024-10099-2
Dotse, S. Q., Larbi, I., Limantol, A. M., & De Silva, L. C. (2024). A review of the application of hybrid machine learning models to improve rainfall prediction. Modeling Earth Systems and Environment, 10(1), 19–44. https://doi.org/10.1007/s40808-023-01835-x
Geller, R. J., Jackson, D. D., Kagan, Y. Y., & Mulargia, F. (1997). Earthquakes cannot be predicted. Science, 275(5306), 1616–1617. https://doi.org/10.1126/science.275.5306.1616
Kagan, Y. Y. (1997). Seismic moment-frequency relation for shallow earthquakes: Regional comparison. Journal of Geophysical Research: Solid Earth, 102(B2), 2835–2852. https://doi.org/10.1029/96JB03386
Khalid, H., Khan, A., Zahid Khan, M., Mehmood, G., & Shuaib Qureshi, M. (2023). Machine learning hybrid model for the prediction of chronic kidney disease. Computational Intelligence and Neuroscience, 2023(1), 9266889. https://doi.org/10.1155/2023/9266889
Kong, Q., Trugman, D. T., Ross, Z. E., Bianco, M. J., Meade, B. J., & Gerstoft, P. (2019). Machine learning in seismology: Turning data into insights. Seismological Research Letters, 90(1), 3–14. https://doi.org/10.1785/0220180259
Kubo, H., Naoi, M., & Kano, M. (2024). Recent advances in earthquake seismology using machine learning. Earth, Planets and Space, 76, 36. https://doi.org/10.1186/s40623-024-01982-0
Marzocchi, W., & Taroni, M. (2014). Some thoughts on declustering in probabilistic seismic-hazard analysis. Bulletin of the Seismological Society of America, 104(4), 1838–1845. https://doi.org/10.1785/0120130300
Mignan, A., Werner, M. J., Wiemer, S., Chen, C.-C., & Wu, Y.-M. (2011). Bayesian estimation of the spatially varying completeness magnitude of earthquake catalogs. Bulletin of the Seismological Society of America, 101(3), 1371–1385. https://doi.org/10.1785/0120100223
Mousavi, S. M., & Beroza, G. C. (2022). Deep-learning seismology. Science, 377(6607), eabm4470. https://doi.org/10.1126/science.abm4470
Mousavi, S. M., & Beroza, G. C. (2023). Machine learning in earthquake seismology. Annual Review of Earth and Planetary Sciences, 51, 105–129. https://doi.org/10.1146/annurev-earth-071822-100323
Renuka, G. B., Lokesh, C., Tharaknath, P., Kumar, M. M., & Reddy, M. S. C. (2024). Earthquake forecasting with ML: A comprehensive approach. In Computer Science Engineering (pp. 393–400). CRC Press. https://doi.org/10.1201/9781032711157-45
Rhoades, D. A., & Gerstenberger, M. C. (2009). Mixture models for improved short-term earthquake forecasting. Bulletin of the Seismological Society of America, 99(2A), 636–646. https://doi.org/10.1785/0120080063
Sadhukhan, B., Chakraborty, S., Mukherjee, S., & Samanta, R. K. (2023). Climatic and seismic data-driven deep learning model for earthquake magnitude prediction. Frontiers in Earth Science, 11, 1082832. https://doi.org/10.3389/feart.2023.1082832
Schorlemmer, D., Werner, M. J., Marzocchi, W., Jordan, T. H., Ogata, Y., Jackson, D. D., & Zhuang, J. (2018). The collaboratory for the study of earthquake predictability: Achievements and priorities. Seismological Research Letters, 89(4), 1305–1313. https://doi.org/10.1785/0220180053
Taroni, M., Selva, J., & Zhuang, J. (2021). Estimation of the tapered Gutenberg-Richter distribution parameters for catalogs with variable completeness: An application to the Atlantic ridge seismicity. Applied Sciences, 11(24), 12166. https://doi.org/10.3390/app112412166
Thaler, D., Elezaj, L., Bamer, F., & Markert, B. (2022). Training data selection for machine learning-enhanced Monte Carlo simulations in structural dynamics. Applied Sciences, 12(2), 581. https://doi.org/10.3390/app12020581
U.S. Geological Survey. (2025). USGS earthquake catalog. Retrieved from https://earthquake.usgs.gov
Yousefzadeh, M., Hosseini, S. A., & Farnaghi, M. (2021). Spatiotemporally explicit earthquake prediction using deep neural network. Soil Dynamics and Earthquake Engineering, 144, 106663. https://doi.org/10.1016/j.soildyn.2021.106663
Zhao, Y., & Gorse, D. (2024). Earthquake prediction from seismic indicators using tree-based ensemble learning. Natural Hazards, 120, 2283–2309. https://doi.org/10.1007/s11069-023-06221-5
Zhou, Z., Zhao, L., Lin, A., Qin, W., Lu, Y., Li, J., & He, L. (2020). Exploring the potential of deep factorization machine and various gradient boosting models in modeling daily reference evapotranspiration in China. Arabian Journal of Geosciences, 13(24), 1287. https://doi.org/10.1007/s12517-020-06293-8