CYBER THREATS PREDICTON USING EXPERIENCE SHARING MODEL AND ENSEMBLE LEARNING ALGORITHM

Title
Authors
Abstract
Keywords
PDF
Conclusion
Reference
Footnotes

Title

Authors

1. Abubakar Bello, National Open University of Nigeria, Student, Nigeria

Abstract

The increasing complexity of cyber threats, particularly in critical industries such as oil and gas, necessitates proactive predictive models for early detection and response. Traditional frameworks such as the Common Vulnerability Scoring System (CVSS) are reactive, often addressing vulnerabilities post-incident, thereby exposing organizations to operational and financial risks. This study proposes a novel hybrid framework combining an experience-sharing model with ensemble machine learning algorithms, including bagging and boosting techniques. Using structured datasets such as VERIS and CAPEC, machine learning classifiers—logistic regression, k-Nearest Neighbors, and regression trees—were employed and validated using k-fold cross-validation. The results revealed a 94% prediction accuracy and a 0.96 AUC-ROC score with bagging ensembles, outperforming conventional models by 12%. A case study focused on Nigeria’s oil and gas infrastructure validated the model’s sector-specific applicability. This study contributes to cybersecurity analytics by demonstrating (1) the efficacy of ensemble learning, (2) a validated experience-sharing paradigm, and (3) the development of dynamic cyber-risk metrics suited for modern threats. The proposed framework offers cost-effective and scalable solutions for proactive threat mitigation.

Keywords

Cybersecurity ensemble learning threat prediction machine learning Oil and Gas sector Risk Assessment

PDF

Conclusion

This study demonstrates the applicability of ensemble machine learning models for predicting cybersecurity threats, with a focus on critical infrastructure such as the Nigerian oil and gas sector. Using structured datasets and cross-validated ensemble models, the research achieved high accuracy and reliability. Notably, Random Forest and Gradient Boosting models performed best across key evaluation metrics.

Key contributions include the development of a domain-specific cyber threat prediction model, integration of experience-sharing frameworks, and validation of ensemble methods for cyber-risk quantification. These outcomes are particularly relevant for sectors requiring preemptive resource allocation and security incident mitigation.

Future research should explore deep learning models, zero-day threat detection, and real-time deployment integration with SIEM platforms. Localized datasets and cross-organizational collaboration can further enhance the model's utility and adaptability.

Reference

1. 1. Axelrad, E. T., Sticha, P. J., Brdiczka, O., & Shen, J. (2013). A Bayesian network model for predicting insider threats. In 2013 IEEE Security and Privacy Workshops (pp. 82–89). IEEE. 2. Capgemini Research Institute. (2020). Reinventing cybersecurity with artificial intelligence: The new frontier in digital security. https://www.capgemini.com/research/reinventing-cybersecurity-with-ai/ 3. Dalal, D., & Rele, M. (n.d.). Cyber attack prediction using machine learning and sandbox environment. [Conference paper]. 4. Dalton, A., Bonnie, D., Leon, L., & Kristy, H. (2017). Improving cyber-attack predictions through information foraging. In 2017 IEEE International Conference on Big Data (BigData) (pp. 3326–3331). IEEE. 5. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. 6. Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Wiley. 7. Jaganathan, V., Cherurveettil, P., & Sivashanmugam, P. M. (2015). Using a prediction model to manage cyber security threats. The Scientific World Journal, 2015, Article ID 703713. 8. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R (6th ed.). Springer. 9. Khan, M. A., & Hameed, M. (2010). Cyber security quantification model. Bahria University Journal of Information and Communication Technology, 3(1), 23–27. 10. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). University of California Press. 11. Mehta, V., Bahadur, P., Kapoor, M., Singh, P., & Rajpoot, S. (2015). Threat prediction using honeypot and machine learning. In 1st International Conference on Futuristic Trends in Computational Analysis and Knowledge Management (ABLAZE-2015) (pp. 615–620). IEEE. 12. MITRE Corporation. (2022). Common Attack Pattern Enumeration and Classification (CAPEC). https://capec.mitre.org 13. Moore, A. P., Cappelli, D. M., & Trzeciak, R. F. (2013). A system dynamics model for investigating early detection of insider threat risk (CMU/SEI-2013-TR-004). Software Engineering Institute, Carnegie Mellon University. 14. Pathade, C., & Bhosale, T. (2021). Cyber threats prediction using machine learning. International Research Journal of Engineering and Technology (IRJET), 8(12), 1250–1255. 15. Ponemon Institute. (2020). 2019 Cost of a data breach report. IBM Security. https://www.ibm.com/security/data-breach 16. Predicting infection of organization endpoints by cybersecurity threats using ensemble machine learning techniques. (n.d.). [Unpublished manuscript]. 17. Sheyner, O., Haines, J., Jha, S., Lippmann, R., & Wing, J. M. (2002). Automated generation and analysis of attack graphs. In Proceedings 2002 IEEE Symposium on Security and Privacy (pp. 273–284). IEEE. 18. Tahia, A., Soujanya, T. S., & Vasavi, D. S. (2012). Study on techniques for providing enhanced security during online exams. International Journal of Engineering Inventions, 1(1), 32–37. 19. Tittel, E. (2013). Preventing and avoiding network security threats and vulnerabilities. Tom’s IT Pro. 20. van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. 21. VERIS Community. (2022). Vocabulary for Event Recording and Incident Sharing (VERIS). http://veriscommunity.net 22. Wu, J., Yin, L., & Guo, Y. (2012). Cyber attacks prediction model based on Bayesian network. In 2012 IEEE 18th International Conference on Parallel and Distributed Systems (pp. 730–731). IEEE. 23. Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms. CRC Press.

Author Contribution

A.B.: Conceptualization, Methodology (ensemble learning model), Writing – Original Draft. A.B.: Software (Python implementation), Data Curation (VERIS/CAPEC datasets), Formal Analysis. A.S.: Validation (k-fold cross-validation), Writing – Review & Editing. A.B.: Supervision, Project Administration

Funding

This research received no external funding

Software Information

This study was implemented using Python 3.8 with key libraries including Scikit-learn (v1.0) for ensemble learning algorithms (bagging/boosting), Pandas (v1.3) for data processing, and Matplotlib (v3.4) for visualization. The analysis was conducted in Jupyter Notebook and Google Colab environments. Anaconda (v2021.05) was used for package management

Conflict of Interest

The authors declare no conflict of interest

Acknowledge

We thank the Petroleum Technology Development Fund (PTDF), Nigeria, for their institutional support. We also acknowledge the VERIS and CAPEC communities for providing open-access datasets critical to this research. Special gratitude to ACETEL at National Open University of Nigeria for their technical guidance and to the anonymous reviewers for their constructive feedback.

Data availability

The datasets analyzed in this study—VERIS (Vocabulary for Event Recording and Incident Sharing) and CAPEC (Common Attack Pattern Enumeration and Classification)—are publicly available at their respective sources: VERIS Community Database and CAPEC MITRE Repository. The derived datasets and code used for ensemble learning analysis are available from the corresponding author upon reasonable request.