Enhancing Flight Delay Prediction through Feature Engineering in Machine Learning Classifiers: A Real Time Data Streams Case Study

Main Article Content

Shailaja B. Jadhav
D. V. Kodavade

Abstract

The process of creating and selecting features from raw data to enhance the accuracy of machine learning models is referred to as feature engineering. In the context of real-time data streams, feature engineering becomes particularly important because the data is constantly changing and the model must be able to adapt quickly. A case study of using feature engineering in a flight information system is described in this paper. We used feature engineering to improve the performance of machine learning classifiers for predicting flight delays and describe various techniques for extracting and constructing features from the raw data, including time-based features, trend-based features, and error-based features. Before applying these techniques, we applied feature pre-processing techniques, including the CTAO algorithm for feature pre-processing, followed by the SCSO (Sand cat swarm optimization) algorithm for feature extraction and the Enhanced harmony search for feature optimization. The resultant feature set contained the 9 most relevant features for deciding whether a flight would be delayed or not. Additionally, we evaluate the performance of various classifiers using these engineered features and contrast the results with those obtained using raw features. The results show that feature engineering significantly improves the performance of the classifiers and allows for more accurate prediction of flight delays in real-time.

Article Details

How to Cite
Jadhav, S. B. ., & Kodavade, D. V. . (2023). Enhancing Flight Delay Prediction through Feature Engineering in Machine Learning Classifiers: A Real Time Data Streams Case Study. International Journal on Recent and Innovation Trends in Computing and Communication, 11(2s), 212–218. https://doi.org/10.17762/ijritcc.v11i2s.6064
Section
Articles

References

X. Li, Y. Zhang, and X. Wang, "Time-based feature engineering for real-time data stream classification," Information Sciences, vol. 479, pp. 173-186, 2019.

X. Wang, X. Li, and Y. Zhang, "Trend-based feature engineering for real-time data stream classification," Knowledge-Based Systems, vol. 199, pp. 105908, 2020.

J. Zhang, Y. Liu, and D. Chen, "Error-based feature engineering for real-time data stream classification," Transportation Research Part C: Emerging Technologies, vol. 117, pp. 1-14, 2021.

D. Chen, Y. Liu, and J. Zhang, "Improving flight delay prediction using feature engineering in machine learning classifiers for real-time data streams," Transportation Research Part C: Emerging Technologies, vol. 121, pp. 1-14, 2022.

H. Kim, S. Lee, and J. Park, "Feature engineering for stock price prediction using machine learning classifiers in real-time data streams," Expert Systems with Applications, vol. 126, pp. 87-95, 2019.

L. Gao, Y. Li, and D. Xu, "Feature engineering for real-time social media network analysis," Expert Systems with Applications, vol. 145, 113702, 2020.

J. Brownlee, “A Gentle Introduction to Feature Engineering for Machine Learning,” Machine Learning Mastery, 2015.

V. Patel and N. Patel, “Feature Engineering for Flight Delay Prediction using Machine Learning,” Journal of Advanced Research in Dynamical and Control Systems, vol. 11, no. Special Issue on Recent Advances in Control Systems and Robotics, pp. 547-556, 2019.

P. Agrawal, H. F. Abutarboush, T. Ganesh, and A. W. Mohamed, "Metaheuristic Algorithms on Feature Selection: A Survey of One Decade of Research (2009-2019)," IEEE Access, vol. 9, pp. 10.1109/ACCESS.2021.3056407, 2021.

M. K. H. Doreswamy, M. K. Hooshmand, and I. Gad, "Feature selection approach using ensemble learning for network anomaly detection," CAAI Transactions on Intelligent Technology, vol. 5, pp. 283-293, 2020.

P. W. Khan and Y. C. Byun, "Genetic algorithm based optimized feature engineering and hybrid machine learning for effective energy consumption prediction," IEEE Access, vol. 8, pp. 196274-196286, 2020.

Z. L. Chia, M. Ptaszynski, F. Masui, G. Leliwa, and M. Wroczynski, "Machine Learning and feature engineering-based study into sarcasm and irony classification with application to cyberbullying detection," Information Processing & Management, vol. 58, no. 4, pp. 102600, 2021.

C. A. Ledezma, X. Zhou, B. Rodriguez, P. J. Tan, and V. Diaz-Zuccarini, "A modeling and machine learning approach to ECG feature engineering for the detection of ischemia using pseudo-ECG," PloS one, vol. 14, no. 8, pp. e0220294, 2019.

Z. Chen, P. Zhao, F. Li, T. T. Marquez-Lago, A. Leier, J. Revote, et al., "iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data," Briefings in bioinformatics, vol. 21, no. 3, pp. 1047-1057, 2020.

S. M. Kasongo and Y. Sun, "A deep learning method with filter-based feature engineering for wireless intrusion detection system," IEEE access, vol. 7, pp. 38597-38607, 2019.

C. Fan, Y. Sun, Y. Zhao, M. Song, and J. Wang, "Deep learning-based feature engineering methods for improved building energy prediction," Applied energy, vol. 240, pp. 35-45, 2019.

A. Ullah, K. Muhammad, I. U. Haq, and S. W. Baik, "Action recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environments," Future Generation Computer Systems, vol. 96, pp. 386-397, 2019.

S. Demir, K. Mincev, K. Kok, and N. G. Paterakis, "Introducing technical indicators to electricity price forecasting: A feature engineering study for linear, ensemble, and deep machine learning models," Applied Sciences, vol. 10, no. 1, pp. 255, 2019.

G. Li, J. Li, Z. Ju, Y. Sun, and J. Kong, "A novel feature extraction method for machine learning based on surface electromyography from healthy brain," Neural Computing and Applications, vol. 31, no. 12, pp. 9013-9022, 2019.

H. M. Gomes, J. Read, A. Bifet, J. P. Barddal, and J. Gama, "Machine learning for streaming data: state of the art, challenges, and opportunities," ACM SIGKDD Explorations Newsletter, vol. 21, no. 2, pp. 6-22, 2019.

Y. Zheng, G. Li, W. Zhang, Y. Li, and B. Wei, "Feature Selection with Ensemble Learning Based on Improved Dempster-Shafer Evidence Fusion," IEEE Access, vol. 6, pp. 10.1109/ACCESS.2018.2890549, 2018.

L. Yuan, B. Pfahringer, and J. P. Barddal, "Addressing Feature Drift in Data Streams Using Iterative Subset Selection," Applied Computing Review, vol. 19, no. 1, 2019.