CYBER THREATS PREDICTON USING EXPERIENCE SHARING MODEL AND ENSEMBLE LEARNING ALGORITHM
1. Abubakar Bello, National Open University of Nigeria, Student, Nigeria
The increasing complexity of cyber
threats, particularly in critical industries such as oil and gas, necessitates
proactive predictive models for early detection and response. Traditional
frameworks such as the Common Vulnerability Scoring System (CVSS) are reactive,
often addressing vulnerabilities post-incident, thereby exposing organizations
to operational and financial risks. This study proposes a novel hybrid
framework combining an experience-sharing model with ensemble machine learning
algorithms, including bagging and boosting techniques. Using structured
datasets such as VERIS and CAPEC, machine learning classifiers—logistic
regression, k-Nearest Neighbors, and regression trees—were employed and
validated using k-fold cross-validation. The results revealed a 94% prediction
accuracy and a 0.96 AUC-ROC score with bagging ensembles, outperforming
conventional models by 12%. A case study focused on Nigeria’s oil and gas
infrastructure validated the model’s sector-specific applicability. This study
contributes to cybersecurity analytics by demonstrating (1) the efficacy of
ensemble learning, (2) a validated experience-sharing paradigm, and (3) the
development of dynamic cyber-risk metrics suited for modern threats. The
proposed framework offers cost-effective and scalable solutions for proactive
threat mitigation.
Cybersecurity ensemble learning threat prediction machine learning Oil and Gas sector Risk Assessment
This study demonstrates the applicability
of ensemble machine learning models for predicting cybersecurity threats, with
a focus on critical infrastructure such as the Nigerian oil and gas sector.
Using structured datasets and cross-validated ensemble models, the research
achieved high accuracy and reliability. Notably, Random Forest and Gradient
Boosting models performed best across key evaluation metrics.
Key contributions include the development
of a domain-specific cyber threat prediction model, integration of
experience-sharing frameworks, and validation of ensemble methods for
cyber-risk quantification. These outcomes are particularly relevant for sectors
requiring preemptive resource allocation and security incident mitigation.
Future research should explore deep
learning models, zero-day threat detection, and real-time deployment
integration with SIEM platforms. Localized datasets and cross-organizational
collaboration can further enhance the model's utility and adaptability.
1. 1. Axelrad, E. T., Sticha, P. J., Brdiczka, O., & Shen, J. (2013). A Bayesian network model for predicting insider threats. In 2013 IEEE Security and Privacy Workshops (pp. 82–89). IEEE. 2. Capgemini Research Institute. (2020). Reinventing cybersecurity with artificial intelligence: The new frontier in digital security. https://www.capgemini.com/research/reinventing-cybersecurity-with-ai/ 3. Dalal, D., & Rele, M. (n.d.). Cyber attack prediction using machine learning and sandbox environment. [Conference paper]. 4. Dalton, A., Bonnie, D., Leon, L., & Kristy, H. (2017). Improving cyber-attack predictions through information foraging. In 2017 IEEE International Conference on Big Data (BigData) (pp. 3326–3331). IEEE. 5. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. 6. Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Wiley. 7. Jaganathan, V., Cherurveettil, P., & Sivashanmugam, P. M. (2015). Using a prediction model to manage cyber security threats. The Scientific World Journal, 2015, Article ID 703713. 8. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R (6th ed.). Springer. 9. Khan, M. A., & Hameed, M. (2010). Cyber security quantification model. Bahria University Journal of Information and Communication Technology, 3(1), 23–27. 10. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). University of California Press. 11. Mehta, V., Bahadur, P., Kapoor, M., Singh, P., & Rajpoot, S. (2015). Threat prediction using honeypot and machine learning. In 1st International Conference on Futuristic Trends in Computational Analysis and Knowledge Management (ABLAZE-2015) (pp. 615–620). IEEE. 12. MITRE Corporation. (2022). Common Attack Pattern Enumeration and Classification (CAPEC). https://capec.mitre.org 13. Moore, A. P., Cappelli, D. M., & Trzeciak, R. F. (2013). A system dynamics model for investigating early detection of insider threat risk (CMU/SEI-2013-TR-004). Software Engineering Institute, Carnegie Mellon University. 14. Pathade, C., & Bhosale, T. (2021). Cyber threats prediction using machine learning. International Research Journal of Engineering and Technology (IRJET), 8(12), 1250–1255. 15. Ponemon Institute. (2020). 2019 Cost of a data breach report. IBM Security. https://www.ibm.com/security/data-breach 16. Predicting infection of organization endpoints by cybersecurity threats using ensemble machine learning techniques. (n.d.). [Unpublished manuscript]. 17. Sheyner, O., Haines, J., Jha, S., Lippmann, R., & Wing, J. M. (2002). Automated generation and analysis of attack graphs. In Proceedings 2002 IEEE Symposium on Security and Privacy (pp. 273–284). IEEE. 18. Tahia, A., Soujanya, T. S., & Vasavi, D. S. (2012). Study on techniques for providing enhanced security during online exams. International Journal of Engineering Inventions, 1(1), 32–37. 19. Tittel, E. (2013). Preventing and avoiding network security threats and vulnerabilities. Tom’s IT Pro. 20. van Buuren, S., & Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. 21. VERIS Community. (2022). Vocabulary for Event Recording and Incident Sharing (VERIS). http://veriscommunity.net 22. Wu, J., Yin, L., & Guo, Y. (2012). Cyber attacks prediction model based on Bayesian network. In 2012 IEEE 18th International Conference on Parallel and Distributed Systems (pp. 730–731). IEEE. 23. Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms. CRC Press.
A.B.: Conceptualization, Methodology (ensemble learning model), Writing – Original Draft. A.B.: Software (Python implementation), Data Curation (VERIS/CAPEC datasets), Formal Analysis. A.S.: Validation (k-fold cross-validation), Writing – Review & Editing. A.B.: Supervision, Project Administration
This research received no external funding
This study was implemented using Python 3.8 with key libraries including Scikit-learn (v1.0) for ensemble learning algorithms (bagging/boosting), Pandas (v1.3) for data processing, and Matplotlib (v3.4) for visualization. The analysis was conducted in Jupyter Notebook and Google Colab environments. Anaconda (v2021.05) was used for package management
The authors declare no conflict of interest
We thank the Petroleum Technology Development Fund (PTDF), Nigeria, for their institutional support. We also acknowledge the VERIS and CAPEC communities for providing open-access datasets critical to this research. Special gratitude to ACETEL at National Open University of Nigeria for their technical guidance and to the anonymous reviewers for their constructive feedback.
The datasets analyzed in this study—VERIS (Vocabulary for Event Recording and Incident Sharing) and CAPEC (Common Attack Pattern Enumeration and Classification)—are publicly available at their respective sources: VERIS Community Database and CAPEC MITRE Repository. The derived datasets and code used for ensemble learning analysis are available from the corresponding author upon reasonable request.