Hacktheplanet.ninja's data sets

As promised, here are the data sets.  These data sets were glossed over during the BH and Defcon presentations.  Spend time in the data and you will see interesting relationships.  Especially as projects matured over time.  As I consume new projects and interests, these datasets will be updated over time.  If you do not see a project or code on here, let me know.    

 

http://hacktheplanet.ninja/ApplicationLibrary.html

http://hacktheplanet.ninja/CoreLibrary.html

http://hacktheplanet.ninja/Crypto.html

http://hacktheplanet.ninja/Mail.html

http://hacktheplanet.ninja/OS.html

http://hacktheplanet.ninja/SampleOfProjects.html

http://hacktheplanet.ninja/Security.html

http://hacktheplanet.ninja/Time.html

http://hacktheplanet.ninja/WebServer.html

 

FAQ:

How do I read these datasets?

Lines of code: total number of lines of analyzed code

Executable lines of code: the total lines of executable code analyzed

Lines of Comments: the number of comments, by line, which appear in the code.

Total number of files: the total number of files analyzed

Critical Vulnerabilities: the total number of critical vulnerabilities.

High Vuln: Same def. as Critical Vulnerabilities but High.

Medium Vuln: Same def. as Critical Vulnerabilities but Medium.

Low Vuln: Same def. as Critical Vulnerabilities but Low.

Critical Density: the density of critical vulns per line of code (total critical vulns / executable lines of code)

High Density: The same definition as Critical Density but High

Medium Density: The same definition as Critical Density but Medium

Low Density: The same definition as Critical Density but Low

Code density: my internal metric to measure deltas.

Human hours spent reviewing code: If a human had infinite intellect and memory and operated like a flawless robot, the number of hours a human would spend in a code review on the project.  Effectively calculated number of lines reviewed per hour times the number of executable lines.

 

How do I understand the project names?

Don’t try.  Especially in First2k .  Some are named as the project name + change list.  Others are named as I was tuning the project, IE project name + number.   If it is just the project name, then assume it was the latest stable release as made available in the past quarter.  If it is project name + version, then that is the project’s version which was analyzed.  I will be cleaning up the reporting as I release more information into the public domain.  So expect the naming scheme to improve.