Dataset used in the paper submitted to EDCC2021:
datasets_EDCC2021.zip (md5sum: a058d193d270e7e40dd77a06ba50706d)
The zip file above contains 3 types of files:
- Binary Classes: prefix sm-sat-multiclass-mozilla;
- Category Binary Classes: three options, for each one of the largest classes: Memory Management (sm-sat-mm-mozilla), Input Validation (sm-sat-iv-mozilla), and Permission (sm-sat-per-mozilla);
- Multiclass: prefix sm-sat-multiclass.
Each dataset contains 3 files, according to what is required by Propheticus. Regarding further information about Propheticus, please, check the following paper: Propheticus: Machine learning framework for the development of predictive models for reliable and secure software.
These files are:
- Info (suffix .info.txt): contains the number of samples per dataset;
- Headers (suffix: .headers.txt): contains a JSON object with all the features of the dataset, and its data type;
- Data (suffix: .data.txt): contains the samples separated by a space.
Each sample contains: (1) a description of the file instance, (2) all the 54 software metrics, (3) all the 228 alert types reported by cppcheck, (4) all the 123 alert types reported by flawfinder, and (5) a label indicating if the files is non-vulnerable (0) or non-vulnerable (value > 0).
Regarding the label in the multiclass dataset, 0 represents non-vulnerable, 1 represents vulnerable without an assigned category, 2 represents memory management, followed by the remaining categories of Table 1 of the paper.
Below you can find the detailed results about all the experiment instances with their metrics:
For further information, e-mail me: josep@dei.uc.pt