Secure Code Review for l33t hax0rs

Code review is, hopefully, part of regular development practices for any organization. Adding security elements to code review is the most effective measure in preventing vulnerabilities, even before the first commit.

This series of short articles is a primer that includes the basic elements to get you started. We will review several software examples in order to figure out the good from the ugly.

The Biggest Security Issues

OWASP Top 10 and MITRE Top 25 include lists of common programming flaws that lead to vulnerabilities and exploits in software. These lists are reviewed periodically by industry experts and are included in compliance standards such as PCI-DSS.

Our security code review examples will focus on the following OWASP Top 10and MITRE Top 25 entries:

  • Injection

  • Memory Flaws

  • Sensitive Data Exposure

  • Cross-Site Scripting

  • Broken Access Control.

The Basic Secure Coding Practices

One of the goals of regular code review is to spot missing best practices. Security code review is about identifying the missing secure coding practices. These practices are also known as software defences or in Threat Modeling terms countermeasures. There are many types of software defences but some are more important and effective than others.

The short list below is based on the number of security flaws that each defence prevents as well as the severity of the flaws based on OWASP Top 10and MITRE Top 25 industry classifications.

  • Input Validation, prevents a wide range of attacks such as Injection, Cross-Site Scripting and Path Manipulation.

  • Parameterized Statements, prevent Injection attacks when Input Validation cannot be employed. Injection is #1 in the OWASP Top 10.

  • Memory Safe Functions and Safe Memory Management practicesprevent memory flaws such as Buffer Overflow which is at #3 in the Mitre Top 25 immediately after SQL Injection and OS Command Injection.

  • Data Encryption in transit and at rest prevents data breaches which are covered at #3 in the OWASP Top 10 category Sensitive Data Exposure.

  • Neutralizing Output prevents Cross-Site Scripting flaws which plague web applications and should be the primary concern for front-end developers.

  • Indirect Object References prevent dangerous flaws associated with file management such as Path Traversal a vulnerability in the Broken Access Control OWASP Top 10 category.

There are many threat categories and best practices that will not be discussed in this series. It is important to realize security code review is not a replacement for other application security controls such as static analysis or penetration testing.

Security code review is also only a small part of the code review process. It should not take too long. As such we must prioritize the things we are looking for to get the most bang for the buck.

In addition many threat categories are handled in the application framework as opposed to every day code changes. For example, #2 in OWASP Top 10 is Broken Authentication. It is not likely that you will deal with this category often during code review because authentication is usually handled by the framework.

The articles will contain examples from Java, C/C++ and JavaScript programming languages. However the concepts can be easily transferred to other languages. Input Validation in Java is similar to input validation in Python and PHP.

In next article in this series we will review in detail the concept of Input Validation.

GIGO

Would you let someone in your house if you thought they should not be there?

You wouldn’t. So why allow any input to your application? Why allow symbols into a variable that is intended to be numeric?

Even when validation is used, a common mistake is to use black lists. For example an application will prevent symbols that are known to cause trouble. The weakness of this countermeasure is that some symbols may be overlooked.

Would you maintain a black list of people that cannot come to your house?

You wouldn’t. So why block quotes when you know the input should be numeric? You should only allow numbers.

In the two code samples below one has a security issue due to improper input validation. Can you tell which?

gigo.png

Both code samples, shell out, to execute OS commands, in this case sending a ping to a server. Shelling out is an insecure practice because it can lead to OS Command Injection, however the bottom code mitigates the issue because it uses an allow list to prevent hazardous characters while the top code uses a block list which is, in this case, insufficient since an attacker could pass in something like updateserver.com `command` or updateserver.com |commandand neither ` nor | have been included in the block list.

As you can see, allow lists are much more effective at preventing application security issues. A simple multi-purpose function that checks if the input is alphanumeric can prevent multiple types of flaws:

public class InputValidation {

/**
Sample java input validation function that validates that the input is alphanumeric or part of a list of allowed exceptions
 */
public static boolean isAlphanumOrExcepted(int maxSize, String val, char ... excepted){
    boolean result = true;
    int count = val.length();
    if(count>maxSize) return false;

    for(int i=0;i<count;i++){
        char c = val.charAt(i);
        boolean isOk = false;
        //if  alphabetic turns true , this works for Unicode chars
        isOk = isOk | Character.isAlphabetic(c);
        //if it's a digit turns true, this works for Unicode chars
        isOk = isOk | Character.isDigit(c);
        //if it's in the list of exceptions turns true
        for(char ex : excepted){
            isOk = isOk | ex==c;
        }

        if(isOk == false){ //if one of the characters didn't meet the requirements return false
            return false;
        }
    }
    return result;
}

And here is how to use this simple function to prevent a wide range of attacks:

`` protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {

String input = request.getParameter("input");

if(InputValidation.isAlphanumericOrExcepted(EXPECTED_SIZE,input,'.','-','_')){
    //process the input
}
else{
    //return 400 Invalid Input and log attack attempt
}

}

One little function can prevent multiple attack types. The table below demonstrates how the function prevents SQL Injection, OS Injection, Cross-Site Scripting and Path Traversal.

table.png

The function also works for multi-language support. For example the character è will be considered a letter and will be allowed.

In an HTTP request there are many parameters that are simply numeric or alphanumeric. Let’s analyze the URL below which is part of a Twitter API request generated by executing a search for security.

https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_composer_source=true&include_ext_alt_text=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&send_error_codes=true&q=security&count=20&query_source=typd&pc=1&spelling_corrections=1&ext=mediaStats%2ChighlightedLabel

The only parameter here that may need to be excluded from validation is q . More than 90% of the request parameters can benefit from an alphanumeric allow list. By applying input validation to 90% of the input on the request, we reduce 90% of the attack surface. This is why input validation is the most effective way of reducing vulnerabilities.

Binding Parameters

In the previous article we reviewed Input Validation. While Input Validationis an effective deterrent to a large number of attacks, including Injection, not all input can be filtered.

For example imagine someone’s name is O’Brien. The single quote in O’Brienhappens to also be part of SQL command syntax. For example a website may perform a database record search like this:

simple.png

Notice that the single quote in the name O’Brien is causing a syntax error. The SQL command processor considers the string ends with O and the rest, BRIEN%, is just an unrecognized command.

In order to work around this problem one must escape the single quote with a another single quote like in the image below.

simple2.png

However when this query is executed by a program, things look different. The name would be read from a variable and the database query would be constructed dynamically.

Let’s take a look at the following code snippet.

simple3.png

The variable lastName contains input coming from the user. It is concatenated to a constant SQL query string and the resulting command is passed to the database server.

There is no input validation and no input escaping whatsoever. This means that a user entering O’Brien would cause an SQL syntax error, which is a bug. However a malicious user would take advantage of this behaviour. What would happen if the user entered something like the string below?

simple4.png

The SQL query being passed to the database would end up being two different commands. One selects all users in the database, while the other deletes the users table.

simple5.png

If these concepts are new to you, now you can finally enjoy this old hacker joke about little Bobby Tables.

simple6.png

Preventing Injection

As the old comic suggests, sanitizing the user input could have prevented the issue. “Sanitizing” could have been done with Input Validation or escaping of the single quote. However not all input can be validated and not all SQL Injection is done with single quotes.

In the example below Injection takes advantage of an un-sanitized ORDER BY parameter:

simple7.png

In this case Input Validation could be employed because column names should be alphanumeric however this leads to a more complex defence strategy where the application must employ Input Validation for values going into the ORDER BY section and Escaping for values going into the WHEREclause.

There is however a simpler way. Do not use concatenation at all. This approach also holds true for other Injection scenarios like Command Injection.

simple8.png

This is a scenario where we can employ Occam’s Razor to identify the simplest most universal defence to Injection attacks. The solution here is using Parameterized Statements.

Parameterized Statements interact with the command processor to separate the variables from the SQL query, thus effectively neutralizing characters that may influence the query.

simple9.png

In Java this can be achieved by using a Prepared Statement as per the example belowThe question mark in the query string at line 3 is a placeholder for the parameter value.

simple10.png

When dealing with OS Commands, is best to avoid them completely and write the equivalent functionality in your programming language of choice. For example if you must read a file in Java use APIs such as File and BufferedReader rather than the Linux cat command. However in certain cases there’s no choice and the code must “Shell Out”. For those situations the equivalent of a Prepared Statement is passing command line arguments as a separate array, thus avoiding concatenation.

simple11.png

Object-Relational Mapping (ORM)

There’s an even better way to abstract SQL statements from application code. ORM frameworks allow developers to work with objects rather than SQL queries. Instead of working with the users table developers would work with a users collection and execute a method on that collection to return the corresponding record.

simple12.png

Under the covers the framework would perform a transformation similar to the diagram below.

simple13.png

Caution

There are cases where Injection will still occur in spite of Parameterized Statements being used. For example when the injection occurs in a database stored procedure or the OS shell script being executed

In order to reduce the likelihood of such scenarios occurring, Input Validation should still be used as much as possible.

To sum it all up

When reviewing a code change that involves a command processor look for the following:

  • Input Validation for known alphanumeric values

  • Employing Parameterized Statements

  • Using a ORM framework where applicable

Ticker Tape Challenges

Problems with Memory

Memory related vulnerabilities are very dangerous. They are often classified as 0-days. 0-days are vulnerabilities in common software leveraged by malware and viruses. The 0day name came from illegal software released by an organized group who called themselves 0day. Their software releases were the latest and greatest video games. Never before did the public see these games. Hence vulnerabilities later became known as 0days. The challenge with memory vulnerabilities is that potential operating system components and programs such as system services, browsers, crypto libraries and document readers are written in C/C++ and are particularly exposed to these types of flaws.

There are many flavors of memory vulnerabilities but we will focus on the most common mistakes:

  1. Buffer Copy without Checking Size of Input — CWE 120

  2. Incorrect Calculation of Buffer Size — CWE 131

  3. Uncontrolled Format String — CWE 134

  4. Off by One — CWE 193

How do we know they are the most common? All of these mistakes are documented by MITRE with the top 3 being included in the MITRE Sans Top 25.

They all lead to arbitrary access of memory outside the intended boundaries. This is also known as Overflow.

Memory Overflow Explained

Imagine the program uses two variables a and b. The contents and length of variable b are controlled by the user. The variable memory locations are represented in the simplistic example below. Each variable is intended to store 3 characters. At first variable a contains the string “Hi!”.

valid1.png

If the user enters the string AAAAA (5 As) for variable b the following will happen.

valid2.png

Variable b only had 3 locations reserved for its value plus one for the string terminator \0 . By entering 5 characters the user is now writing into space reserved for variable a.

Variables are pointers to memory locations. The value of b will now become AAAAAi! while the value of a will become Ai!. If the program outputs the value of b then the attacker will be able to know part of the value of a which is known as a memory leak.

If the user enters a sufficiently long value, they will hit the instruction area and alter the program code.

valid3.png

Memory SafeR Functions

Overflow can be prevented by controlling the number of characters read into the buffer. This is done using memory safe functions. It is important to note that simply using these functions will not prevent overflow. They also must be used correctly.

The image below lists several functions that allow the program to limit the size of the value read into a buffer: fgetssnprintfstrncpystrncmp.

valid4.png

If the BUFF_SIZE argument is larger than the size of the buffer, overflow will still occur.

In the example below we have two different code snippets that both read a password from the standard input. Can you spot the one that allows Buffer Overflow because it does not check the size of the input?

valid5.png

If you identified bottom.cpp as the vulnerable code you were correct. The top example, is making use of fgets and it restricts the number of characters to 9.

Incorrect Calculation of Buffer Size

Some of our keen readers may have noticed that if the size of userPass is less than 9, then overflow will still occur.

This is an example of Incorrect Calculation of Buffer Size. Defining constants for the size argument rather than using numerals and paying close attention during code review to the code checking boundaries, can prevent this type of flaw. Let’s take a look at the example below and see if we can spot the vulnerable code.

valid6.png

If you identified top.cpp as the vulnerable code you were correct. The bottom example, is making use of the constant BUFFER_SIZE to ensure consistency. Besides being a safer option the code is also easier to maintain if the constant value must be modified in the future. Many secure coding practices have other benefits besides security.

Off-by-one

Off-by-one is another variation of buffer size flaw. This type of programming mistake is introduced when employing comparison operators. A simple extra equal sign, for example using<= instead of< can lead to the program crashing. Let’s take a look at an example. Can you spot the vulnerable code?

valid7.png

If you spotted the error in top.cpp you are correct.

Memory Injection?

Format String Injection is a type of vulnerability caused by concatenating or using user input in a format parameter. Code that logs user values using functions such as printf is particularly exposed to this type of vulnerability.

Take for example the snippets below. Can you spot which of the two snippets allows the user to control the format string?

valid8.png

If you identified top.cpp as the code that allows Format String Injection you were correct. If the user includes %d or %p in the password, printf will take the next value following the format string and include it in the display. This can lead to information disclosure or program crashes.

By the way there are a couple of bonus insecure practices in both code snippets :) . Did you spot them? One is the use of printf, a dangerous function as mentioned earlier in the article. It’s also a bad practice to display the user password.

Compiler Flags

This is not really a code review item but is worth mentioning because is a countermeasure that can be applied at build time to prevent the exploitation of memory flaws.

Compiler flags enable operating system defences such as ASLR in Windows or PIE/SSP in Linux. They tell the operating system to employ countermeasures such as randomizing memory, which is making it hard for attackers to insert arbitrary code.

Even with compiler flags in place attackers can still crash the program so the main effect of compiler flags is reducing the impact of the attack. The best defence is to prevent the flaws in the code, from the start, by employing the best practices discussed in this article.

To sum it all up…

  • Safer functions allow limiting the number of bytes read into the buffer

  • Even with safe functions special attention should be paid to size specified

  • Use constants to prevent mistakes

  • Careful with the <= operator

  • Do not allow user input in format strings

Overly Simplistic Crypto Code review

Confidentiality is one of Information Security’s CIA tenets. Users entrust developers with their data. To earn and maintain their trust, we must employ security controls that protect data from unauthorized access.

Unfortunately the track record is not good. The number of user records exposed in the United States has been in the billions in 2016 and 2017. 2018 will likely be the same, once the final tally is calculated.

1.jpg

It is worth mentioning that not all of these breaches were caused by programming flaws. Some were caused by negligence, many were caused by phishing and malware attacks.

However at least one of the breaches of 2017 was caused by a programming flaw. The breach at the credit rating company Equifax was responsible for almost 10% of the damage: 146 million records. The attack leveraged a OGNL Injection vulnerability in the Apache Struts library and the subsequent failure to patch the affected systems. You can read about Injection flaws in one of the previous articles in this series.

Spotting Data Breaches During Code Review

Besides vulnerabilities in 3rd party components, there are programming flaws that specifically involve storage and transmission of data and they may be prevented during code review. They are as follows:

  • Cleartext Storage of Sensitive Information — CWE 312

  • Use of a One-Way Hash without a Salt — CWE 759

  • Cleartext Transmission of Sensitive Information — CWE 319

  • Use of a Broken or Risky Cryptographic Algorithm — CWE 327

We will review a few examples of these flaws and how they can be prevented through software security best practices.

Uniquely Identifying Data Without Knowing the Data

One of the strongest countermeasures that can be employed to prevent data breaches is not storing the data at all.

But what if you still need to work with the data? For example, what if you wanted to verify a user’s password without storing the actual password value?

You could transform the data in a non-reversible way. This can be done through a cryptographic operation known as hashing.

Hashing algorithms, such as the SHA-2 class of algorithms, convert data in a way that cannot be reversed. However this doesn’t prevent one from trying a large amount of possible values in order to reach the same outcome. This is known as cracking. Cracking takes a long time and requires a lot of computing resources. Hackers maintain lists of pre-computed hashes, known as rainbow tables, in order to avoid the computing cost.

2.png

The defence employed against rainbow tables is to complicate the calculation by adding a salt. A salt is a random value that is added to the data being transformed in order to alter the resulting hash.

"ABCDEFG" + "-32524..." -> sha256("ABCDEFG-32524...") -> 97AF3... 

Another defence against cracking is adaptive hashing. This involves re-hashing the data for a large amount of iterations, each iteration taking longer than the previous. This increases the computing time. For a single hash the time is negligible but for a cracking attack it results in millions of years. A largely adopted adaptive hashing algorithm is PBKDF2.

Let’s take a look at the following code snippets. Can you spot the security flaw?

3.png

If you identified the top example as being flawed you were correct. The top example requires the user password stored as is. The bottom application is storing the data hashed several times to increase computing time and is also using a salt.

Secure hashing may be employed for various other types of data. For example if an application needs to uniquely identify users for analytics purposes, it could construct a unique, non-reversible hash from the user name and their IP address. This process is known as Tokenization.

If the website analytics database is breached and the only thing obtained by the attackers are these hashes, the data is useless to them or too costly to reverse. However if the database contains user names and IPs and if the user names are actual e-mails (a practice employed by many sites), then e-mails will be sold on the dark web to be used for phishing and spam campaigns.

Securing the Transmission of Data

Hashing can be an effective way to secure the data, but what if the attacker is able to intercept the data before it is transformed? What if they can intercept the data even before it reaches the application?

Up until 2018 major news outlets such as CNN or FOX NEWS were accessible over unencrypted URLs. Those are site addresses that start with http:// .

One of the things that may have prompted many sites to change this behaviour was the addition of a Not Secure message to the address bar of the popular Google Chrome browser for all http:// addresses.

4.png

When a website uses clear text to communicate with its users, man-in-the-middle attacks are possible. These attacks are can be online or offline attacks. Offline attacks usually target the Confidentiality of the data, for example tracking a user’s activity online or stealing their credentials. Online attacks can also impact the Integrity of the data, like for example replacing the content of a trusted news outlet with malware.

Communication security protocols, indicated by https:// URLs, prevent man-in-the-middle attacks by encrypting the transmission and verifying the identity of the two parties involved in the communication.

There are many details to transmission security that may be better covered in a dedicated article, but we will focus primarily on one aspect that may come up during a code review. This is using https:// URLs.

Let’s take a look at a code example. Can you spot the code that transmits data insecurely?

5.png

If you identified the bottom example as insecure you were correct. Notice that the URL is prefixed with http:// meaning that the data is transmitted in clear text. Looking a bit to the right you can spot two pieces of sensitive information being transmitted in the URL.

Note that is bad practice to store sensitive information in the URL even if the url starts with https:// . Thats’s because the URL may be inadvertently bookmarked, sent to a 3rd party or stored in web server logs.

Sometimes developers change the code to ignore invalid certificates because the test environment they are using does not have a valid web server certificate. This is a bad practice because it practically violates the server identity verification and allows man-in-the-middle attackers to pretend they are the target website. It is recommended to configure the development environment to trust the test certificate instead of altering the program behavior.

Reversible Encryption

What if you must be able to work with the user data in clear text? For example, you operate an online shopping or other financial site and must store the user’s name, address and credit card information to process transactions.

First, I would like to emphasize that if this is the case, the site has a huge target painted on it. It is likely subject to daily attacks. This is likely the case for sites like Equifax or your banking site.

Unfortunately encryption would have not saved Equifax because the vulnerability allowed attackers to run code on the web site. This let the attackers make the web site do their bidding. Another thing that helped the attackers is that the breach was not observed for a long period of time, allowing them to exfiltrate information as users were interacting with the website.

However let’s say the attackers don’t have this kind of access, but they have access to the physical hard drive of the database server. A different common attack vector could be a SQL Injection vulnerability that allows attackers to alter the database queries in order to expose sensitive data.

In these cases encryption can be effective, provided that the attackers cannot retrieve the encryption keys. If encryption keys are stored on a separate server, also known as a Key Management Server or Service (KMS), then attackers must obtain access to this server as well, which complicates the attack. However if the encryption keys are stored in the same database, decrypting the data is trivial. This scenario is similar to hiding a key under the mat.

Let’s take a look at some code examples written in Node.js. Can you identify the vulnerable snippet?

6.png

If you identified the top example you are correct. The top example is storing the customer personal financial information in a AWS S3 bucket. S3 buckets have notoriously been exposed to data breaches in the past years. While S3 offers data encryption capabilities, this configuration may not be enabled and it does not protect from all types of attacks.

To sum it all up

Some key takeaways from this article:

  • Where possible employ secure hashing to transform the data in a way that cannot be reversed

  • Enforce secure communication between clients and servers

  • Encrypt sensitive data in databases to prevent physical data theft and mitigate SQL Injection

  • Store encryption keys in a KMS

  • Vulnerabilities that allow code execution on the server may still expose the data in spite of encryption, so all secure coding practices are data protection practices.

Due to the extensive nature of the data protection subject we are going to stop here for now. In the next article we will cover how to keep up to date with crypto algorithms, what are some of the government and compliance standard requirements for data protection and how do they affect coding.