World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Predicting Cyberattacks with Destination Port Through Various Input Feature Scenario

    https://doi.org/10.1142/S0218539322500036Cited by:1 (Source: Crossref)

    When analyzing cybersecurity datasets with machine learning, researchers commonly need to consider whether or not to include Destination Port as an input feature. We assess the impact of Destination Port as a predictive feature by building predictive models with three different input feature sets and four combinations of web attacks from the CSE-CIC-IDS2018 dataset. First, we use Destination Port as the only (single) input feature to our models. Second, all features (from CSE-CIC-IDS2018) are used without Destination Port to build the models. Third, all features plus (including) Destination Port are used to train and test the models. All three of these feature sets obtain respectable classification results in detecting web attacks with LightGBM and CatBoost classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) scores, with AUC scores exceeding 0.90 for all scenarios. We observe the best classification performance scores when Destination Port is combined with all of the other CSE-CIC-IDS2018 features. Although, classification performance is still respectable when only using Destination Port as the only (single) input feature. Additionally, we validate that Botnet attacks also have respectable AUC with Destination Port as the only input feature to our models. This highlights that practitioners must be mindful of whether or not to include Destination Port as an input feature if it experiences lopsided label distributions as we clearly identify in this study. Our brief survey of existing CSE-CIC-IDS2018 literature also discovered that many studies incorrectly treat Destination Port as a numerical input feature with machine learning models. Destination Port should be treated as a categorical input value to machine learning models, as its values do not represent numerical values which can be used in mathematical equations for the models.