Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from github:main #2

Open
wants to merge 10,000 commits into
base: main
Choose a base branch
from
Open

[pull] main from github:main #2

wants to merge 10,000 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Nov 28, 2022

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@github-actions github-actions bot added documentation Improvements or additions to documentation Java JS Python Ruby Swift labels Nov 28, 2022
@github-actions
Copy link

github-actions bot commented Nov 28, 2022

QHelp previews:

cpp/ql/src/experimental/Security/CWE/CWE-805/BufferAccessWithIncorrectLengthValue.qhelp

Buffer access with incorrect length value

Using a size argument that is larger than the buffer size will result in an out-of-bounds memory access and possibly overflow. You need to limit the value of the length argument.

Example

The following example shows the use of a function with and without an error in the size argument.

...
char buf[256];
X509_NAME_oneline(X509_get_subject_name(peer),buf,sizeof(buf)); // GOOD
...
char buf[256];
X509_NAME_oneline(X509_get_subject_name(peer),buf,1024); // BAD
...

References

go/ql/src/InconsistentCode/UnhandledCloseWritableHandle.qhelp

Writable file handle closed without error handling

Data written to a file handle may not immediately be flushed to the underlying storage medium by the operating system. Often, the data may be cached in memory until the handle is closed, or until a later point after that. Only calling os.File.Sync gives a reasonable guarantee that the data is flushed. Therefore, write errors may not occur until os.File.Close or os.File.Sync are called. If either is called and any errors returned by them are discarded, then the program may be unaware that data loss occurred.

Recommendation

Always check whether os.File.Close returned an error and handle it appropriately.

Example

In this first example, two calls to os.File.Close are made with the intention of closing the file after all work on it has been done or if writing to it fails. However, while errors that may arise during the call to os.File.WriteString are handled, any errors arising from the calls to os.File.Close are discarded silently:

package main

import (
	"os"
)

func example() error {
	f, err := os.OpenFile("file.txt", os.O_WRONLY|os.O_CREATE, 0666)

	if err != nil {
		return err
	}

	if _, err := f.WriteString("Hello"); err != nil {
		f.Close()
		return err
	}

	f.Close()

	return nil
}

In the second example, a call to os.File.Close is deferred in order to accomplish the same behaviour as in the first example. However, while errors that may arise during the call to os.File.WriteString are handled, any errors arising from os.File.Close are again discarded silently:

package main

import (
	"os"
)

func example() error {
	f, err := os.OpenFile("file.txt", os.O_WRONLY|os.O_CREATE, 0666)

	if err != nil {
		return err
	}

	defer f.Close()

	if _, err := f.WriteString("Hello"); err != nil {
		return err
	}

	return nil
}

To correct this issue, handle errors arising from calls to os.File.Close explicitly:

package main

import (
	"os"
)

func example() (exampleError error) {
	f, err := os.OpenFile("file.txt", os.O_WRONLY|os.O_CREATE, 0666)

	if err != nil {
		return err
	}

	defer func() {
		// try to close the file; if an error occurs, set the error returned by `example`
		// to that error, but only if `WriteString` didn't already set it to something first
		if err := f.Close(); exampleError == nil && err != nil {
			exampleError = err
		}
	}()

	if _, err := f.WriteString("Hello"); err != nil {
		exampleError = err
	}

	return
}

References

  • The Go Programming Language Specification: Defer statements.
  • The Go Programming Language Specification: Errors.
go/ql/src/Security/CWE-601/BadRedirectCheck.qhelp

Bad redirect check

Redirect URLs should be checked to ensure that user input cannot cause a site to redirect to arbitrary domains. This is often done with a check that the redirect URL begins with a slash, which most of the time is an absolute redirect on the same host. However, browsers interpret URLs beginning with // or /\ as absolute URLs. For example, a redirect to //example.com will redirect to https://example.com. Thus, redirect checks must also check the second character of redirect URLs.

Recommendation

Also disallow redirect URLs starting with // or /\.

Example

The following function validates a (presumably untrusted) redirect URL redir. If it does not begin with /, the harmless placeholder redirect URL / is returned to prevent an open redirect; otherwise redir itself is returned.

package main

func sanitizeUrl(redir string) string {
	if len(redir) > 0 && redir[0] == '/' {
		return redir
	}
	return "/"
}

While this check provides partial protection, it should be extended to cover // and /\ as well:

package main

func sanitizeUrl1(redir string) string {
	if len(redir) > 1 && redir[0] == '/' && redir[1] != '/' && redir[1] != '\\' {
		return redir
	}
	return "/"
}

References

java/ql/src/Security/CWE/CWE-023/PartialPathTraversal.qhelp

Partial path traversal vulnerability

A common way to check that a user-supplied path SUBDIR falls inside a directory DIR is to use getCanonicalPath() to remove any path-traversal elements and then check that DIR is a prefix. However, if DIR is not slash-terminated, this can unexpectedly allow access to siblings of DIR.

See also java/partial-path-traversal-from-remote, which is similar to this query but only flags instances with evidence of remote exploitability.

Recommendation

If the user should only access items within a certain directory DIR, ensure that DIR is slash-terminated before checking that DIR is a prefix of the user-provided path, SUBDIR. Note, Java's getCanonicalPath() returns a non-slash-terminated path string, so a slash must be added to DIR if that method is used.

Example

In this example, the if statement checks if parent.getCanonicalPath() is a prefix of dir.getCanonicalPath(). However, parent.getCanonicalPath() is not slash-terminated. This means that users that supply dir may be also allowed to access siblings of parent and not just children of parent, which is a security issue.

public class PartialPathTraversalBad {
    public void example(File dir, File parent) throws IOException {
        if (!dir.getCanonicalPath().startsWith(parent.getCanonicalPath())) {
            throw new IOException("Invalid directory: " + dir.getCanonicalPath());
        }
    }
}

In this example, the if statement checks if parent.getCanonicalPath() + File.separator is a prefix of dir.getCanonicalPath(). Because parent.getCanonicalPath() + File.separator is indeed slash-terminated, the user supplying dir can only access children of parent, as desired.

public class PartialPathTraversalGood {
    public void example(File dir, File parent) throws IOException {
        if (!dir.getCanonicalPath().startsWith(parent.getCanonicalPath() + File.separator)) {
            throw new IOException("Invalid directory: " + dir.getCanonicalPath());
        }
    }
}

References

java/ql/src/Security/CWE/CWE-023/PartialPathTraversalFromRemote.qhelp

Partial path traversal vulnerability from remote

A common way to check that a user-supplied path SUBDIR falls inside a directory DIR is to use getCanonicalPath() to remove any path-traversal elements and then check that DIR is a prefix. However, if DIR is not slash-terminated, this can unexpectedly allow accessing siblings of DIR.

See also java/partial-path-traversal, which is similar to this query, but may also flag non-remotely-exploitable instances of partial path traversal vulnerabilities.

Recommendation

If the user should only access items within a certain directory DIR, ensure that DIR is slash-terminated before checking that DIR is a prefix of the user-provided path, SUBDIR. Note, Java's getCanonicalPath() returns a non-slash-terminated path string, so a slash must be added to DIR if that method is used.

Example

In this example, the if statement checks if parent.getCanonicalPath() is a prefix of dir.getCanonicalPath(). However, parent.getCanonicalPath() is not slash-terminated. This means that users that supply dir may be also allowed to access siblings of parent and not just children of parent, which is a security issue.

public class PartialPathTraversalBad {
    public void example(File dir, File parent) throws IOException {
        if (!dir.getCanonicalPath().startsWith(parent.getCanonicalPath())) {
            throw new IOException("Invalid directory: " + dir.getCanonicalPath());
        }
    }
}

In this example, the if statement checks if parent.getCanonicalPath() + File.separator is a prefix of dir.getCanonicalPath(). Because parent.getCanonicalPath() + File.separator is indeed slash-terminated, the user supplying dir can only access children of parent, as desired.

public class PartialPathTraversalGood {
    public void example(File dir, File parent) throws IOException {
        if (!dir.getCanonicalPath().startsWith(parent.getCanonicalPath() + File.separator)) {
            throw new IOException("Invalid directory: " + dir.getCanonicalPath());
        }
    }
}

References

java/ql/src/Security/CWE/CWE-079/AndroidWebViewAddJavascriptInterface.qhelp

Access Java object methods through JavaScript exposure

Calling the addJavascriptInterface method of the android.webkit.WebView class allows the web pages of a WebView to access a Java object's methods via JavaScript.

Objects exposed to JavaScript are available in all frames of the WebView.

Recommendation

If you need to expose Java objects to JavaScript, guarantee that no untrusted third-party content is loaded into the WebView.

Example

In the following (bad) example, a Java object is exposed to JavaScript.

import android.webkit.JavascriptInterface;
import android.database.sqlite.SQLiteOpenHelper;

class ExposedObject extends SQLiteOpenHelper {
    @JavascriptInterface
    public String studentEmail(String studentName) {
        // SQL injection
        String query = "SELECT email FROM students WHERE studentname = '" + studentName + "'";

        Cursor cursor = db.rawQuery(query, null);
        cursor.moveToFirst();
        String email = cursor.getString(0);

        return email;
    }
}

webview.getSettings().setJavaScriptEnabled(true);
webview.addJavaScriptInterface(new ExposedObject(), "exposedObject");
webview.loadData("", "text/html", null);

String name = "Robert'; DROP TABLE students; --";
webview.loadUrl("javascript:alert(exposedObject.studentEmail(\""+ name +"\"))");

References

java/ql/src/Security/CWE/CWE-079/AndroidWebViewSettingsEnabledJavaScript.qhelp

Android WebView JavaScript settings

Enabling JavaScript in an Android WebView allows the execution of JavaScript code in the context of the running application. This creates a cross-site scripting vulnerability.

For example, if your application's WebView allows for visiting web pages that you do not trust, it is possible for an attacker to lead the user to a page which loads malicious JavaScript.

You can enable or disable Javascript execution using the setJavaScriptEnabled method of the settings of a WebView.

Recommendation

JavaScript execution is disabled by default. You can explicitly disable it by calling setJavaScriptEnabled(false) on the settings of the WebView.

If JavaScript is necessary, only load content from trusted servers using encrypted channels, such as HTTPS with certificate verification.

Example

In the following (bad) example, a WebView has JavaScript enabled in its settings:

WebSettings settings = webview.getSettings();
settings.setJavaScriptEnabled(true);

In the following (good) example, a WebView explicitly disallows JavaScript execution:

WebSettings settings = webview.getSettings();
settings.setJavaScriptEnabled(false);

References

java/ql/src/Security/CWE/CWE-089/SqlConcatenated.qhelp

Query built by concatenation with a possibly-untrusted string

Even when the components of a SQL query are not fully controlled by a user, it is a vulnerability to build the query by directly concatenating those components. Perhaps a separate vulnerability will allow the user to gain control of the component. As well, a user who cannot gain full control of an input might influence it enough to cause the SQL query to fail to run.

Recommendation

Usually, it is better to use a SQL prepared statement than to build a complete SQL query with string concatenation. A prepared statement can include a wildcard, written as a question mark (?), for each part of the SQL query that is expected to be filled in by a different value each time it is run. When the query is later executed, a value must be supplied for each wildcard in the query.

In the Java Persistence Query Language, it is better to use queries with parameters than to build a complete query with string concatenation. A Java Persistence query can include a parameter placeholder for each part of the query that is expected to be filled in by a different value when run. A parameter placeholder may be indicated by a colon (:) followed by a parameter name, or by a question mark (?) followed by an integer position. When the query is later executed, a value must be supplied for each parameter in the query, using the setParameter method. Specifying the query using the @NamedQuery annotation introduces an additional level of safety: the query must be a constant string literal, preventing construction by string concatenation, and the only way to fill in values for parts of the query is by setting positional parameters.

It is good practice to use prepared statements (in SQL) or query parameters (in the Java Persistence Query Language) for supplying parameter values to a query, whether or not any of the parameters are directly traceable to user input. Doing so avoids any need to worry about quoting and escaping.

Example

In the following example, the code runs a simple SQL query in two different ways.

The first way involves building a query, query1, by concatenating the result of getCategory with some string literals. The result of getCategory can include special characters, or it might be refactored later so that it may return something that contains special characters.

The second way, which shows good practice, involves building a query, query2, with a single string literal that includes a wildcard (?). The wildcard is then given a value by calling setString. This version is immune to injection attacks, because any special characters in the result of getCategory are not given any special treatment.

{
    // BAD: the category might have SQL special characters in it
    String category = getCategory();
    Statement statement = connection.createStatement();
    String query1 = "SELECT ITEM,PRICE FROM PRODUCT WHERE ITEM_CATEGORY='"
        + category + "' ORDER BY PRICE";
    ResultSet results = statement.executeQuery(query1);
}

{
    // GOOD: use a prepared query
    String category = getCategory();
    String query2 = "SELECT ITEM,PRICE FROM PRODUCT WHERE ITEM_CATEGORY=? ORDER BY PRICE";
    PreparedStatement statement = connection.prepareStatement(query2);
    statement.setString(1, category);
    ResultSet results = statement.executeQuery();
}

References

java/ql/src/Security/CWE/CWE-200/AndroidWebViewSettingsAllowsContentAccess.qhelp

Android WebView settings allows access to content links

Android can provide access to content providers within a WebView using the setAllowContentAccess setting.

Allowing access to content providers via content:// URLs may allow JavaScript to access protected content.

Recommendation

If your app does not require access to the content:// URL functionality, you should explicitly disable the setting by calling setAllowContentAccess(false) on the settings of the WebView.

Example

In the following (bad) example, access to content:// URLs is explicitly allowed.

WebSettings settings = webview.getSettings();

settings.setAllowContentAccess(true);

In the following (good) example, access to content:// URLs is explicitly denied.

WebSettings settings = webview.getSettings();

settings.setAllowContentAccess(false);

References

java/ql/src/Security/CWE/CWE-200/AndroidWebViewSettingsFileAccess.qhelp

Android WebSettings file access

Allowing file access in an Android WebView can expose a device's file system to the JavaScript running in that WebView. If the JavaScript contains vulnerabilities or the WebView loads untrusted content, file access allows an attacker to steal the user's data.

Recommendation

When possible, do not allow file access. The file access settings are disabled by default. You can explicitly disable file access by setting the following settings to false:

  • setAllowFileAccess
  • setAllowFileAccessFromFileURLs
  • setAllowUniversalAccessFromFileURLs
    If your application requires access to the file system, it is best to avoid using file:// URLs. Instead, use an alternative that loads files via HTTPS, such as androidx.webkit.WebViewAssetLoader.

Example

In the following (bad) example, the WebView is configured with settings that allow local file access.

WebSettings settings = view.getSettings();

settings.setAllowFileAccess(true);
settings.setAllowFileAccessFromURLs(true);
settings.setAllowUniversalAccessFromURLs(true);

In the following (good) example, the WebView is configured to disallow file access.

WebSettings settings = view.getSettings();

settings.setAllowFileAccess(false);
settings.setAllowFileAccessFromURLs(false);
settings.setAllowUniversalAccessFromURLs(false);

As mentioned previously, asset loaders can load files without file system access. In the following (good) example, an asset loader is configured to load assets over HTTPS.

WebViewAssetLoader loader = new WebViewAssetLoader.Builder()
    // Replace the domain with a domain you control, or use the default
    // appassets.androidplatform.com
    .setDomain("appassets.example.com")
    .addPathHandler("/resources", new AssetsPathHandler(this))
    .build();

webView.setWebViewClient(new WebViewClientCompat() {
    @Override
    public WebResourceResponse shouldInterceptRequest(WebView view, WebResourceRequest request) {
        return assetLoader.shouldInterceptRequest(request.getUrl());
    }
});

webView.loadUrl("https://appassets.example.com/resources/www/index.html");

References

java/ql/src/Security/CWE/CWE-295/AndroidMissingCertificatePinning.qhelp

Android missing certificate pinning

Certificate pinning is the practice of only trusting a specific set of SSL certificates, rather than those that the device trusts by default. In Android applications, it is reccomended to use certificate pinning when communicating over the network, in order to minimize the risk of machine-in-the-middle attacks from a compromised CA.

Recommendation

The easiest way to implement certificate pinning is to declare your pins in a network-security-config XML file. This will automatically provide certificate pinning for any network connection made by the app.

Another way to implement certificate pinning is to use the `CertificatePinner` class from the `okhttp` library.

A final way to implement certificate pinning is to use a TrustManager, initialized from a KeyStore loaded with only the necessary certificates.

Example

In the first (bad) case below, a network call is performed with no certificate pinning implemented. The other (good) cases demonstrate the different ways to implement certificate pinning.

// BAD - By default, this network call does not use certificate pinning
URLConnection conn = new URL("https://example.com").openConnection();
<!-- GOOD: Certificate pinning implemented via a Network Security Config file -->

<!-- In AndroidManifest.xml -->
<manifest xmlns:android="http://schemas.android.com/apk/res/android"
    package="com.example.app">

    <application android:networkSecurityConfig="@xml/NetworkSecurityConfig">
        ...
    </application>

</manifest>

<!-- In res/xml/NetworkSecurityConfig.xml -->
<network-security-config>
    <domain-config>
        <domain>good.example.com</domain>
        <pin-set expiration="2038/1/19">
            <pin digest="SHA-256">...</pin>
        </pin-set>
    </domain-config>
</network-security-config>
// GOOD: Certificate pinning implemented via okhttp3.CertificatePinner 
CertificatePinner certificatePinner = new CertificatePinner.Builder()
    .add("example.com", "sha256/AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=")
    .build();
OkHttpClient client = new OkHttpClient.Builder()
    .certificatePinner(certificatePinner)
    .build();

client.newCall(new Request.Builder().url("https://example.com").build()).execute();



// GOOD: Certificate pinning implemented via a TrustManager
KeyStore keyStore = KeyStore.getInstance("BKS");
keyStore.load(resources.openRawResource(R.raw.cert), null);

TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
tmf.init(keyStore);

SSLContext sslContext = SSLContext.getInstance("TLS");
sslContext.init(null, tmf.getTrustManagers(), null);

URL url = new URL("http://www.example.com/");
HttpsURLConnection urlConnection = (HttpsURLConnection) url.openConnection(); 

urlConnection.setSSLSocketFactory(sslContext.getSocketFactory());

References

java/ql/src/Security/CWE/CWE-502/UnsafeDeserialization.qhelp

Deserialization of user-controlled data

Deserializing untrusted data using any deserialization framework that allows the construction of arbitrary serializable objects is easily exploitable and in many cases allows an attacker to execute arbitrary code. Even before a deserialized object is returned to the caller of a deserialization method a lot of code may have been executed, including static initializers, constructors, and finalizers. Automatic deserialization of fields means that an attacker may craft a nested combination of objects on which the executed initialization code may have unforeseen effects, such as the execution of arbitrary code.

There are many different serialization frameworks. This query currently supports Kryo, XmlDecoder, XStream, SnakeYaml, JYaml, JsonIO, YAMLBeans, HessianBurlap, Castor, Burlap, Jackson, Jabsorb, Jodd JSON, Flexjson, Gson and Java IO serialization through ObjectInputStream/ObjectOutputStream.

Recommendation

Avoid deserialization of untrusted data if at all possible. If the architecture permits it then use other formats instead of serialized objects, for example JSON or XML. However, these formats should not be deserialized into complex objects because this provides further opportunities for attack. For example, XML-based deserialization attacks are possible through libraries such as XStream and XmlDecoder.

Alternatively, a tightly controlled whitelist can limit the vulnerability of code, but be aware of the existence of so-called Bypass Gadgets, which can circumvent such protection measures.

Recommendations specific to particular frameworks supported by this query:

FastJson - com.alibaba:fastjson

  • Secure by Default: Partially
  • Recommendation: Call com.alibaba.fastjson.parser.ParserConfig#setSafeMode with the argument true before deserializing untrusted data.

FasterXML - com.fasterxml.jackson.core:jackson-databind

  • Secure by Default: Yes
  • Recommendation: Don't call com.fasterxml.jackson.databind.ObjectMapper#enableDefaultTyping and don't annotate any object fields with com.fasterxml.jackson.annotation.JsonTypeInfo passing either the CLASS or MINIMAL_CLASS values to the annotation. Read this guide.

Kryo - com.esotericsoftware:kryo and com.esotericsoftware:kryo5

  • Secure by Default: Yes for com.esotericsoftware:kryo5 and for com.esotericsoftware:kryo >= v5.0.0
  • Recommendation: Don't call com.esotericsoftware.kryo(5).Kryo#setRegistrationRequired with the argument false on any Kryo instance that may deserialize untrusted data.

ObjectInputStream - Java Standard Library

  • Secure by Default: No
  • Recommendation: Use a validating input stream, such as org.apache.commons.io.serialization.ValidatingObjectInputStream.

SnakeYAML - org.yaml:snakeyaml

  • Secure by Default: No
  • Recommendation: Pass an instance of org.yaml.snakeyaml.constructor.SafeConstructor to org.yaml.snakeyaml.Yaml's constructor before using it to deserialize untrusted data.

XML Decoder - Standard Java Library

  • Secure by Default: No
  • Recommendation: Do not use with untrusted user input.

Example

The following example calls readObject directly on an ObjectInputStream that is constructed from untrusted data, and is therefore inherently unsafe.

public MyObject {
  public int field;
  MyObject(int field) {
    this.field = field;
  }
}

public MyObject deserialize(Socket sock) {
  try(ObjectInputStream in = new ObjectInputStream(sock.getInputStream())) {
    return (MyObject)in.readObject(); // unsafe
  }
}

Rewriting the communication protocol to only rely on reading primitive types from the input stream removes the vulnerability.

public MyObject deserialize(Socket sock) {
  try(DataInputStream in = new DataInputStream(sock.getInputStream())) {
    return new MyObject(in.readInt());
  }
}

References

java/ql/src/Security/CWE/CWE-611/XXELocal.qhelp

Resolving XML external entity in user-controlled data from local source

Parsing untrusted XML files with a weakly configured XML parser may lead to an XML External Entity (XXE) attack. This type of attack uses external entity references to access arbitrary files on a system, carry out denial of service, or server side request forgery. Even when the result of parsing is not returned to the user, out-of-band data retrieval techniques may allow attackers to steal sensitive data. Denial of services can also be carried out in this situation.

There are many XML parsers for Java, and most of them are vulnerable to XXE because their default settings enable parsing of external entities. This query currently identifies vulnerable XML parsing from the following parsers: javax.xml.parsers.DocumentBuilder, javax.xml.stream.XMLStreamReader, org.jdom.input.SAXBuilder/org.jdom2.input.SAXBuilder, javax.xml.parsers.SAXParser,org.dom4j.io.SAXReader, org.xml.sax.XMLReader, javax.xml.transform.sax.SAXSource, javax.xml.transform.TransformerFactory, javax.xml.transform.sax.SAXTransformerFactory, javax.xml.validation.SchemaFactory, javax.xml.bind.Unmarshaller and javax.xml.xpath.XPathExpression.

Recommendation

The best way to prevent XXE attacks is to disable the parsing of any Document Type Declarations (DTDs) in untrusted data. If this is not possible you should disable the parsing of external general entities and external parameter entities. This improves security but the code will still be at risk of denial of service and server side request forgery attacks. Protection against denial of service attacks may also be implemented by setting entity expansion limits, which is done by default in recent JDK and JRE implementations. Because there are many different ways to disable external entity retrieval with varying support between different providers, in this query we choose to specifically check for the OWASP recommended way to disable external entity retrieval for a particular parser. There may be other ways of making a particular parser safe which deviate from these guidelines, in which case this query will continue to flag the parser as potentially dangerous.

Example

The following example calls parse on a DocumentBuilder that is not safely configured on untrusted data, and is therefore inherently unsafe.

public void parse(Socket sock) throws Exception {
  DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
  DocumentBuilder builder = factory.newDocumentBuilder();
  builder.parse(sock.getInputStream()); //unsafe
}

In this example, the DocumentBuilder is created with DTD disabled, securing it against XXE attack.

public void disableDTDParse(Socket sock) throws Exception {
  DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
  factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
  DocumentBuilder builder = factory.newDocumentBuilder();
  builder.parse(sock.getInputStream()); //safe
}

References

java/ql/src/Security/CWE/CWE-730/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Java uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Note that Java versions 9 and above have some mitigations against ReDoS; however they aren't perfect and more complex regular expressions can still be affected by this problem.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. Alternatively, an alternate regex library that guarantees linear time execution, such as Google's RE2J, may be used.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			Pattern.compile("^\\s+|\\s+$").matcher(text).replaceAll("") // BAD
		

The sub-expression "\\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind ("^\\s+|(?<!\\s)\\s+$"), or just by using the built-in trim method (text.trim()).

Note that the sub-expression "^\\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			"^0\\.\\d+E?\\d+$"" 
		

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: "^0\\.\\d+(E\\d+)?$".

References

java/ql/src/Security/CWE/CWE-730/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Java uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Note that Java versions 9 and above have some mitigations against ReDoS; however they aren't perfect and more complex regular expressions can still be affected by this problem.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter. Alternatively, an alternate regex library that guarantees linear time execution, such as Google's RE2J, may be used.

Example

Consider this regular expression:

			^_(__|.)+_$
		

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

			^_(__|[^_])+_$
		

References

java/ql/src/Security/CWE/CWE-927/SensitiveResultReceiver.qhelp

Leaking sensitive information through a ResultReceiver

If a ResultReceiver is obtained from an untrusted source, such as an Intent received by an exported component, do not send it sensitive data. Otherwise, the information may be leaked to a malicious application.

Recommendation

Do not send sensitive data to an untrusted ResultReceiver.

Example

In the following (bad) example, sensitive data is sent to an untrusted ResultReceiver.

// BAD: Sensitive data is sent to an untrusted result receiver 
void bad(String password) {
    Intent intent = getIntent();
    ResultReceiver rec = intent.getParcelableExtra("Receiver");
    Bundle b = new Bundle();
    b.putCharSequence("pass", password);
    rec.send(0, b); 
}

References

  • Common Weakness Enumeration: CWE-927.
java/ql/src/experimental/Security/CWE/CWE-400/LocalThreadResourceAbuse.qhelp

Uncontrolled thread resource consumption from local input source

The Thread.sleep method is used to pause the execution of current thread for specified time. When the sleep time is user-controlled, especially in the web application context, it can be abused to cause all of a server's threads to sleep, leading to denial of service.

Recommendation

To guard against this attack, consider specifying an upper range of allowed sleep time or adopting the producer/consumer design pattern with Object.wait method to avoid performance problems or even resource exhaustion. For more information, refer to the concurrency tutorial of Oracle listed below or java/ql/src/Likely Bugs/Concurrency queries of CodeQL.

Example

The following example shows a bad situation and a good situation respectively. In the bad situation, a thread sleep time comes directly from user input. In the good situation, an upper range check on the maximum sleep time allowed is enforced.

class SleepTest {
	public void test(int userSuppliedWaitTime) throws Exception {
		// BAD: no boundary check on wait time
		Thread.sleep(userSuppliedWaitTime);

		// GOOD: enforce an upper limit on wait time
		if (userSuppliedWaitTime > 0 && userSuppliedWaitTime < 5000) {
			Thread.sleep(userSuppliedWaitTime);
		}
	}
}

References

javascript/ql/src/Performance/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			text.replace(/^\s+|\s+$/g, ''); // BAD
		

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/g), or just by using the built-in trim method (text.trim()).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			/^0\.\d+E?\d+$/.test(str) // BAD
		

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: ^0\.\d+(E\d+)?$.

References

javascript/ql/src/Performance/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engines provided by many popular JavaScript platforms use backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

			/^_(__|.)+_$/
		

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

			/^_(__|[^_])+_$/
		

References

javascript/ql/src/Security/CWE-020/UntrustedDataToExternalAPI.qhelp

Untrusted data passed to external API

Using unsanitized untrusted data in an external API can cause a variety of security issues. This query reports external APIs that use untrusted data. The results are not filtered so that you can audit all examples. The query provides data for security reviews of the application and you can also use it to identify external APIs that should be modeled as either taint steps, or sinks for specific problems.

An external API is defined as a method call to a method that is not defined in the source code, not overridden in the source code, and is not modeled as a taint step in the default taint library. External APIs may be from the third-party dependencies or from internal dependencies. The query reports uses of untrusted data one of the arguments of external API call or in the return value from a callback passed to an external API.

Recommendation

For each result:

  • If the result highlights a known sink, confirm that the result is reported by the relevant query, or that the result is a false positive because this data is sanitized.
  • If the result highlights an unknown sink for a problem, then add modeling for the sink to the relevant query, and confirm that the result is either found, or is safe due to appropriate sanitization.
  • If the result represents a call to an external API that transfers taint, add the appropriate modeling, and re-run the query to determine what new results have appeared due to this additional modeling.
    Otherwise, the result is likely uninteresting. Custom versions of this query can extend the SafeExternalAPIMethod class to exclude known safe external APIs from future analysis.

Example

In this first example, a query parameter is read from the req parameter and then ultimately used in a call to the res.send external API:

express().get('/news', (req, res) => {
    let topic = req.query.topic;
    res.send(`<h1>${topic}</h1>`);
});

This is a reflected XSS sink. The XSS query should therefore be reviewed to confirm that this sink is appropriately modeled, and if it is, to confirm that the query reports this particular result, or that the result is a false positive due to some existing sanitization.

In this second example, again a query parameter is read from req.

let path = require('path');

express().get('/data', (req, res) => {
    let file = path.join(HOME_DIR, 'public', req.query.file);
    res.sendFile(file);
});

If the query reported the call to path.join on line 4, this would suggest that this external API is not currently modeled as a taint step in the taint tracking library. The next step would be to model this as a taint step, then re-run the query to determine what additional results might be found. In this example, it seems the result of the path.join will be used as a file path, leading to a path traversal vulnerability.

Note that both examples are correctly handled by the standard taint tracking library and security queries.

References

  • Common Weakness Enumeration: CWE-20.
javascript/ql/src/Security/CWE-078/CommandInjection.qhelp

Uncontrolled command line

Code that passes user input directly to require('child_process').exec, or some other library routine that executes a command, allows the user to execute malicious code.

Recommendation

If possible, use hard-coded string literals to specify the command to run or library to load. Instead of passing the user input directly to the process or library function, examine the user input and then choose among hard-coded string literals.

If the applicable libraries or commands cannot be determined at compile time, then add code to verify that the user input string is safe before using it.

Example

The following example shows code that takes a shell script that can be changed maliciously by a user, and passes it straight to child_process.exec without examining it first.

var cp = require("child_process"),
    http = require('http'),
    url = require('url');

var server = http.createServer(function(req, res) {
    let cmd = url.parse(req.url, true).query.path;

    cp.exec(cmd); // BAD
});

References

javascript/ql/src/Security/CWE-079/Xss.qhelp

Client-side cross-site scripting

Directly writing user input (for example, a URL query parameter) to a webpage without properly sanitizing the input first, allows for a cross-site scripting vulnerability.

This kind of vulnerability is also called DOM-based cross-site scripting, to distinguish it from other types of cross-site scripting.

Recommendation

To guard against cross-site scripting, consider using contextual output encoding/escaping before writing user input to the page, or one of the other solutions that are mentioned in the references.

Example

The following example shows part of the page URL being written directly to the document, leaving the website vulnerable to cross-site scripting.

function setLanguageOptions() {
    var href = document.location.href,
        deflt = href.substring(href.indexOf("default=")+8);
    document.write("<OPTION value=1>"+deflt+"</OPTION>");
    document.write("<OPTION value=2>English</OPTION>");
}

References

javascript/ql/src/Security/CWE-089/SqlInjection.qhelp

Database query built from user-controlled sources

If a database query (such as a SQL or NoSQL query) is built from user-provided data without sufficient sanitization, a malicious user may be able to run malicious database queries.

Recommendation

Most database connector libraries offer a way of safely embedding untrusted data into a query by means of query parameters or prepared statements.

For NoSQL queries, make use of an operator like MongoDB's $eq to ensure that untrusted data is interpreted as a literal value and not as a query object.

Example

In the following example, assume the function handler is an HTTP request handler in a web application, whose parameter req contains the request object.

The handler constructs two copies of the same SQL query involving user input taken from the request object, once unsafely using string concatenation, and once safely using query parameters.

In the first case, the query string query1 is built by directly concatenating a user-supplied request parameter with some string literals. The parameter may include quote characters, so this code is vulnerable to a SQL injection attack.

In the second case, the parameter is embedded into the query string query2 using query parameters. In this example, we use the API offered by the pg Postgres database connector library, but other libraries offer similar features. This version is immune to injection attacks.

const app = require("express")(),
      pg = require("pg"),
      pool = new pg.Pool(config);

app.get("search", function handler(req, res) {
  // BAD: the category might have SQL special characters in it
  var query1 =
    "SELECT ITEM,PRICE FROM PRODUCT WHERE ITEM_CATEGORY='" +
    req.params.category +
    "' ORDER BY PRICE";
  pool.query(query1, [], function(err, results) {
    // process results
  });

  // GOOD: use parameters
  var query2 =
    "SELECT ITEM,PRICE FROM PRODUCT WHERE ITEM_CATEGORY=$1" + " ORDER BY PRICE";
  pool.query(query2, [req.params.category], function(err, results) {
    // process results
  });
});

References

javascript/ql/src/Security/CWE-094/CodeInjection.qhelp

Code injection

Directly evaluating user input (for example, an HTTP request parameter) as code without properly sanitizing the input first allows an attacker arbitrary code execution. This can occur when user input is treated as JavaScript, or passed to a framework which interprets it as an expression to be evaluated. Examples include AngularJS expressions or JQuery selectors.

Recommendation

Avoid including user input in any expression which may be dynamically evaluated. If user input must be included, use context-specific escaping before including it. It is important that the correct escaping is used for the type of evaluation that will occur.

Example

The following example shows part of the page URL being evaluated as JavaScript code. This allows an attacker to provide JavaScript within the URL. If an attacker can persuade a user to click on a link to such a URL, the attacker can evaluate arbitrary JavaScript in the browser of the user to, for example, steal cookies containing session information.

eval(document.location.href.substring(document.location.href.indexOf("default=")+8))

The following example shows a Pug template being constructed from user input, allowing attackers to run arbitrary code via a payload such as #{global.process.exit(1)}.

const express = require('express')
var pug = require('pug');
const app = express()

app.post('/', (req, res) => {
    var input = req.query.username;
    var template = `
doctype
html
head
    title= 'Hello world'
body
    form(action='/' method='post')
        input#name.form-control(type='text)
        button.btn.btn-primary(type='submit') Submit
    p Hello `+ input
    var fn = pug.compile(template);
    var html = fn();
    res.send(html);
})

Below is an example of how to use a template engine without any risk of template injection. The user input is included via an interpolation expression #{username} whose value is provided as an option to the template, instead of being part of the template string itself:

const express = require('express')
var pug = require('pug');
const app = express()

app.post('/', (req, res) => {
    var input = req.query.username;
    var template = `
doctype
html
head
    title= 'Hello world'
body
    form(action='/' method='post')
        input#name.form-control(type='text)
        button.btn.btn-primary(type='submit') Submit
    p Hello #{username}`
    var fn = pug.compile(template);
    var html = fn({username: input});
    res.send(html);
})

References

javascript/ql/src/Security/CWE-117/LogInjection.qhelp

Log injection

If unsanitized user input is written to a log entry, a malicious user may be able to forge new log entries.

Forgery can occur if a user provides some input with characters that are interpreted when the log output is displayed. If the log is displayed as a plain text file, then new line characters can be used by a malicious user. If the log is displayed as HTML, then arbitrary HTML may be included to spoof log entries.

Recommendation

User input should be suitably sanitized before it is logged.

If the log entries are in plain text then line breaks should be removed from user input, using String.prototype.replace or similar. Care should also be taken that user input is clearly marked in log entries.

For log entries that will be displayed in HTML, user input should be HTML-encoded before being logged, to prevent forgery and other forms of HTML injection.

Example

In the first example, a username, provided by the user, is logged using `console.info`. In the first case, it is logged without any sanitization. In the second case, the username is used to build an error that is logged using `console.error`. If a malicious user provides `username=Guest%0a[INFO]+User:+Admin%0a` as a username parameter, the log entry will be splitted in two different lines, where the second line will be `[INFO]+User:+Admin`.

const http = require('http');
const url = require('url');

const server = http.createServer((req, res) => {
    let q = url.parse(req.url, true);

    console.info(`[INFO] User: ${q.query.username}`); // BAD: User input logged as-is
})

server.listen(3000, '127.0.0.1', () => {});

In the second example, String.prototype.replace is used to ensure no line endings are present in the user input.

const http = require('http');
const url = require('url');

const server = http.createServer((req, res) => {
    let q = url.parse(req.url, true);

    // GOOD: remove newlines from user controlled input before logging
    let username = q.query.username.replace(/\n|\r/g, "");

    console.info(`[INFO] User: ${username}`);
});

server.listen(3000, '127.0.0.1', () => {});

References

javascript/ql/src/Security/CWE-134/TaintedFormatString.qhelp

Use of externally-controlled format string

Functions like the Node.js standard library function util.format accept a format string that is used to format the remaining arguments by providing inline format specifiers. If the format string contains unsanitized input from an untrusted source, then that string may contain unexpected format specifiers that cause garbled output.

Recommendation

Either sanitize the input before including it in the format string, or use a %s specifier in the format string, and pass the untrusted data as corresponding argument.

Example

The following program snippet logs information about an unauthorized access attempt. The log message includes the user name, and the user's IP address is passed as an additional argument to console.log to be appended to the message:

const app = require("express")();

app.get("unauthorized", function handler(req, res) {
  let user = req.query.user;
  let ip = req.connection.remoteAddress;
  console.log("Unauthorized access attempt by " + user, ip);
});

However, if a malicious user provides %d as their user name, console.log will instead attempt to format the ip argument as a number. Since IP addresses are not valid numbers, the result of this conversion is NaN. The resulting log message will read "Unauthorized access attempt by NaN", missing all the information that it was trying to log in the first place.

Instead, the user name should be included using the %s specifier:

const app = require("express")();

app.get("unauthorized", function handler(req, res) {
  let user = req.query.user;
  let ip = req.connection.remoteAddress;
  console.log("Unauthorized access attempt by %s", user, ip);
});

References

javascript/ql/src/Security/CWE-312/CleartextLogging.qhelp

Clear-text logging of sensitive information

If sensitive data is written to a log entry it could be exposed to an attacker who gains access to the logs.

Potential attackers can obtain sensitive user data when the log output is displayed. Additionally that data may expose system information such as full path names, system information, and sometimes usernames and passwords.

Recommendation

Sensitive data should not be logged.

Example

In the example the entire process environment is logged using `console.info`. Regular users of the production deployed application should not have access to this much information about the environment configuration.

// BAD: Logging cleartext sensitive data
console.info(`[INFO] Environment: ${process.env}`);

In the second example the data that is logged is not sensitive.

let not_sensitive_data = { a: 1, b : 2} 
// GOOD: it is fine to log data that is not sensitive
console.info(`[INFO] Some object contains: ${not_sensitive_data}`);

References

javascript/ql/src/Security/CWE-346/CorsMisconfigurationForCredentials.qhelp

CORS misconfiguration for credentials transfer

A server can send the "Access-Control-Allow-Credentials" CORS header to control when a browser may send user credentials in Cross-Origin HTTP requests.

When the Access-Control-Allow-Credentials header is "true", the Access-Control-Allow-Origin header must have a value different from "*" in order to make browsers accept the header. Therefore, to allow multiple origins for Cross-Origin requests with credentials, the server must dynamically compute the value of the "Access-Control-Allow-Origin" header. Computing this header value from information in the request to the server can therefore potentially allow an attacker to control the origins that the browser sends credentials to.

Recommendation

When the Access-Control-Allow-Credentials header value is "true", a dynamic computation of the Access-Control-Allow-Origin header must involve sanitization if it relies on user-controlled input.

Since the "null" origin is easy to obtain for an attacker, it is never safe to use "null" as the value of the Access-Control-Allow-Origin header when the Access-Control-Allow-Credentials header value is "true".

Example

In the example below, the server allows the browser to send user credentials in a Cross-Origin request. The request header origins controls the allowed origins for such a Cross-Origin request.

var https = require('https'),
    url = require('url');

var server = https.createServer(function(){});

server.on('request', function(req, res) {
    let origin = url.parse(req.url, true).query.origin;
     // BAD: attacker can choose the value of origin
    res.setHeader("Access-Control-Allow-Origin", origin);
    res.setHeader("Access-Control-Allow-Credentials", true);

    // ...
});

This is not secure, since an attacker can choose the value of the origin request header to make the browser send credentials to their own server. The use of a whitelist containing allowed origins for the Cross-Origin request fixes the issue:

var https = require('https'),
    url = require('url');

var server = https.createServer(function(){});

server.on('request', function(req, res) {
    let origin = url.parse(req.url, true).query.origin,
        whitelist = {
            "https://example.com": true,
            "https://subdomain.example.com": true,
            "https://example.com:1337": true
        };

    if (origin in whitelist) {
        // GOOD: the origin is in the whitelist
        res.setHeader("Access-Control-Allow-Origin", origin);
        res.setHeader("Access-Control-Allow-Credentials", true);
    }

    // ...
});

References

javascript/ql/src/Security/CWE-352/MissingCsrfMiddleware.qhelp

Missing CSRF middleware

Websites that rely on cookie-based authentication may be vulnerable to cross-site request forgery (CSRF). Specifically, a state-changing request should include a secret token so the request can't be forged by an attacker. Otherwise, unwanted requests can be submitted on behalf of a user who visits a malicious website.

This is typically mitigated by embedding a session-specific secret token in each request. This token is then checked as an additional authentication measure. A malicious website should have no way of guessing the correct token to embed in the request.

Recommendation

Use a middleware package such as lusca.csrf to protect against CSRF attacks.

Example

In the example below, the server authenticates users before performing the changeEmail POST action:

const app = require("express")(),
  cookieParser = require("cookie-parser"),
  bodyParser = require("body-parser"),
  session = require("express-session");

app.use(cookieParser());
app.use(bodyParser.urlencoded({ extended: false }));
app.use(session({ secret: process.env['SECRET'], cookie: { maxAge: 60000 } }));

// ...

app.post("/changeEmail", function(req, res) {
  const userId = req.session.id;
  const email = req.body["email"];
  // ... update email associated with userId
});

This is not secure. An attacker can submit a POST changeEmail request on behalf of a user who visited a malicious website. Since authentication happens without any action from the user, the changeEmail action would be executed, despite not being initiated by the user.

This vulnerability can be mitigated by installing a CSRF protecting middleware handler:

const app = require("express")(),
  cookieParser = require("cookie-parser"),
  bodyParser = require("body-parser"),
  session = require("express-session"),
  csrf = require('lusca').csrf;

app.use(cookieParser());
app.use(bodyParser.urlencoded({ extended: false }));
app.use(session({ secret: process.env['SECRET'], cookie: { maxAge: 60000 } }));
app.use(csrf());

// ...

app.post("/changeEmail", function(req, res) {
  const userId = req.session.id;
  const email = req.body["email"];
  // ... update email associated with userId
});

References

javascript/ql/src/Security/CWE-400/RemotePropertyInjection.qhelp

Remote property injection

Dynamically computing object property names from untrusted input may have multiple undesired consequences. For example, if the property access is used as part of a write, an attacker may overwrite vital properties of objects, such as __proto__. This attack is known as prototype pollution attack and may serve as a vehicle for denial-of-service attacks. A similar attack vector, is to replace the toString property of an object with a primitive. Whenever toString is then called on that object, either explicitly or implicitly as part of a type coercion, an exception will be raised.

Moreover, if the name of an HTTP header is user-controlled, an attacker may exploit this to overwrite security-critical headers such as Access-Control-Allow-Origin or Content-Security-Policy.

Recommendation

The most common case in which prototype pollution vulnerabilities arise is when JavaScript objects are used for implementing map data structures. This case should be avoided whenever possible by using the ECMAScript 2015 Map instead. When this is not possible, an alternative fix is to prepend untrusted input with a marker character such as $, before using it in properties accesses. In this way, the attacker does not have access to built-in properties which do not start with the chosen character.

When using user input as part of a header name, a sanitization step should be performed on the input to ensure that the name does not clash with existing header names such as Content-Security-Policy.

Example

In the example below, the dynamically computed property prop is accessed on myObj using a user-controlled value.

var express = require('express');

var app = express();
var myObj = {}

app.get('/user/:id', function(req, res) {
	var prop = req.query.userControlled; // BAD
	myObj[prop] = function() {};
	console.log("Request object " + myObj);
});

This is not secure since an attacker may exploit this code to overwrite the property __proto__ with an empty function. If this happens, the concatenation in the console.log argument will fail with a confusing message such as "Function.prototype.toString is not generic". If the application does not properly handle this error, this scenario may result in a serious denial-of-service attack. The fix is to prepend the user-controlled string with a marker character such as $ which will prevent arbitrary property names from being overwritten.

var express = require('express');

var app = express();
var myObj = {}

app.get('/user/:id', function(req, res) {
	var prop = "$" + req.query.userControlled; // GOOD
	myObj[prop] = function() {};
	console.log("Request object " + myObj);
});

References

javascript/ql/src/Security/CWE-502/UnsafeDeserialization.qhelp

Deserialization of user-controlled data

Deserializing untrusted data using any deserialization framework that allows the construction of arbitrary functions is easily exploitable and, in many cases, allows an attacker to execute arbitrary code.

Recommendation

Avoid deserialization of untrusted data if at all possible. If the architecture permits it, then use formats like JSON or XML that cannot represent functions. When using YAML or other formats that support the serialization and deserialization of functions, ensure that the parser is configured to disable deserialization of arbitrary functions.

Example

The following example calls the load function of the popular js-yaml package on data that comes from an HTTP request and hence is inherently unsafe.

const app = require("express")(),
  jsyaml = require("js-yaml");

app.get("load", function(req, res) {
  let data = jsyaml.load(req.params.data);
  // ...
});

Using the safeLoad function instead (which does not deserialize YAML-encoded functions) removes the vulnerability.

const app = require("express")(),
  jsyaml = require("js-yaml");

app.get("load", function(req, res) {
  let data = jsyaml.safeLoad(req.params.data);
  // ...
});

References

javascript/ql/src/Security/CWE-611/Xxe.qhelp

XML external entity expansion

Parsing untrusted XML files with a weakly configured XML parser may lead to an XML External Entity (XXE) attack. This type of attack uses external entity references to access arbitrary files on a system, carry out denial-of-service (DoS) attacks, or server-side request forgery. Even when the result of parsing is not returned to the user, DoS attacks are still possible and out-of-band data retrieval techniques may allow attackers to steal sensitive data.

Recommendation

The easiest way to prevent XXE attacks is to disable external entity handling when parsing untrusted data. How this is done depends on the library being used. Note that some libraries, such as recent versions of libxml, disable entity expansion by default, so unless you have explicitly enabled entity expansion, no further action needs to be taken.

Example

The following example uses the libxml XML parser to parse a string xmlSrc. If that string is from an untrusted source, this code may be vulnerable to an XXE attack, since the parser is invoked with the noent option set to true:

const app = require("express")(),
  libxml = require("libxmljs");

app.post("upload", (req, res) => {
  let xmlSrc = req.body,
    doc = libxml.parseXml(xmlSrc, { noent: true });
});

To guard against XXE attacks, the noent option should be omitted or set to false. This means that no entity expansion is undertaken at all, not even for standard internal entities such as &amp; or &gt;. If desired, these entities can be expanded in a separate step using utility functions provided by libraries such as underscore, lodash or he.

const app = require("express")(),
  libxml = require("libxmljs");

app.post("upload", (req, res) => {
  let xmlSrc = req.body,
    doc = libxml.parseXml(xmlSrc);
});

References

javascript/ql/src/Security/CWE-643/XpathInjection.qhelp

XPath injection

If an XPath expression is built using string concatenation, and the components of the concatenation include user input, it makes it very easy for a user to create a malicious XPath expression.

Recommendation

If user input must be included in an XPath expression, either sanitize the data or use variable references to safely embed it without altering the structure of the expression.

Example

In this example, the code accepts a user name specified by the user, and uses this unvalidated and unsanitized value in an XPath expression constructed using the xpath package. This is vulnerable to the user providing special characters or string sequences that change the meaning of the XPath expression to search for different values.

const express = require('express');
const xpath = require('xpath');
const app = express();

app.get('/some/route', function(req, res) {
  let userName = req.param("userName");

  // BAD: Use user-provided data directly in an XPath expression
  let badXPathExpr = xpath.parse("//users/user[login/text()='" + userName + "']/home_dir/text()");
  badXPathExpr.select({
    node: root
  });
});

Instead, embed the user input using the variable replacement mechanism offered by xpath:

const express = require('express');
const xpath = require('xpath');
const app = express();

app.get('/some/route', function(req, res) {
  let userName = req.param("userName");

  // GOOD: Embed user-provided data using variables
  let goodXPathExpr = xpath.parse("//users/user[login/text()=$userName]/home_dir/text()");
  goodXPathExpr.select({
    node: root,
    variables: { userName: userName }
  });
});

References

javascript/ql/src/Security/CWE-730/RegExpInjection.qhelp

Regular expression injection

Constructing a regular expression with unsanitized user input is dangerous as a malicious user may be able to modify the meaning of the expression. In particular, such a user may be able to provide a regular expression fragment that takes exponential time in the worst case, and use that to perform a Denial of Service attack.

Recommendation

Before embedding user input into a regular expression, use a sanitization function such as lodash's _.escapeRegExp to escape meta-characters that have special meaning.

Example

The following example shows a HTTP request parameter that is used to construct a regular expression without sanitizing it first:

var express = require('express');
var app = express();

app.get('/findKey', function(req, res) {
  var key = req.param("key"), input = req.param("input");

  // BAD: Unsanitized user input is used to construct a regular expression
  var re = new RegExp("\\b" + key + "=(.*)\n");
});

Instead, the request parameter should be sanitized first, for example using the function _.escapeRegExp from the lodash package. This ensures that the user cannot insert characters which have a special meaning in regular expressions.

var express = require('express');
var _ = require('lodash');
var app = express();

app.get('/findKey', function(req, res) {
  var key = req.param("key"), input = req.param("input");

  // GOOD: User input is sanitized before constructing the regex
  var safeKey = _.escapeRegExp(key);
  var re = new RegExp("\\b" + safeKey + "=(.*)\n");
});

References

javascript/ql/src/Security/CWE-770/ResourceExhaustion.qhelp

Resource exhaustion

Applications are constrained by how many resources they can make use of. Failing to respect these constraints may cause the application to be unresponsive or crash. It is therefore problematic if attackers can control the sizes or lifetimes of allocated objects.

Recommendation

Ensure that attackers can not control object sizes and their lifetimes. If object sizes and lifetimes must be controlled by external parties, ensure you restrict the object sizes and lifetimes so that they are within acceptable ranges.

Example

The following example allocates a buffer with a user-controlled size.

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var size = parseInt(url.parse(req.url, true).query.size);

	let buffer = Buffer.alloc(size); // BAD

	// ... use the buffer
});

This is problematic since an attacker can choose a size that makes the application run out of memory. Even worse, in older versions of Node.js, this could leak confidential memory. To prevent such attacks, limit the buffer size:

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var size = parseInt(url.parse(req.url, true).query.size);

	if (size > 1024) {
		res.statusCode = 400;
		res.end("Bad request.");
		return;
	}

	let buffer = Buffer.alloc(size); // GOOD

	// ... use the buffer
});

Example

As another example, consider an application that allocates an array with a user-controlled size, and then fills it with values:

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var size = parseInt(url.parse(req.url, true).query.size);

	let dogs = new Array(size).fill("dog"); // BAD

	// ... use the dog
});

The allocation of the array itself is not problematic since arrays are allocated sparsely, but the subsequent filling of the array will take a long time, causing the application to be unresponsive, or even run out of memory. Again, a limit on the size will prevent the attack:

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var size = parseInt(url.parse(req.url, true).query.size);

	if (size > 1024) {
		res.statusCode = 400;
		res.end("Bad request.");
		return;
	}

	let dogs = new Array(size).fill("dog"); // GOOD

	// ... use the dogs
});

Example

Finally, the following example lets a user choose a delay after which a function is executed:

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var delay = parseInt(url.parse(req.url, true).query.delay);

	setTimeout(f, delay); // BAD

});

This is problematic because a large delay essentially makes the application wait indefinitely before executing the function. Repeated registrations of such delays will therefore use up all of the memory in the application. A limit on the delay will prevent the attack:

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var delay = parseInt(url.parse(req.url, true).query.delay);

	if (delay > 1000) {
		res.statusCode = 400;
		res.end("Bad request.");
		return;
	}

	setTimeout(f, delay); // GOOD

});

References

javascript/ql/src/Security/CWE-776/XmlBomb.qhelp

XML internal entity expansion

Parsing untrusted XML files with a weakly configured XML parser may be vulnerable to denial-of-service (DoS) attacks exploiting uncontrolled internal entity expansion.

In XML, so-called internal entities are a mechanism for introducing an abbreviation for a piece of text or part of a document. When a parser that has been configured to expand entities encounters a reference to an internal entity, it replaces the entity by the data it represents. The replacement text may itself contain other entity references, which are expanded recursively. This means that entity expansion can increase document size dramatically.

If untrusted XML is parsed with entity expansion enabled, a malicious attacker could submit a document that contains very deeply nested entity definitions, causing the parser to take a very long time or use large amounts of memory. This is sometimes called an XML bomb attack.

Recommendation

The safest way to prevent XML bomb attacks is to disable entity expansion when parsing untrusted data. How this is done depends on the library being used. Note that some libraries, such as recent versions of libxmljs (though not its SAX parser API), disable entity expansion by default, so unless you have explicitly enabled entity expansion, no further action is needed.

Example

The following example uses the XML parser provided by the node-expat package to parse a string xmlSrc. If that string is from an untrusted source, this code may be vulnerable to a DoS attack, since node-expat expands internal entities by default:

const app = require("express")(),
  expat = require("node-expat");

app.post("upload", (req, res) => {
  let xmlSrc = req.body,
    parser = new expat.Parser();
  parser.on("startElement", handleStart);
  parser.on("text", handleText);
  parser.write(xmlSrc);
});

At the time of writing, node-expat does not provide a way of controlling entity expansion, but the example could be rewritten to use the sax package instead, which only expands standard entities such as &amp;:

const app = require("express")(),
  sax = require("sax");

app.post("upload", (req, res) => {
  let xmlSrc = req.body,
    parser = sax.parser(true);
  parser.onopentag = handleStart;
  parser.ontext = handleText;
  parser.write(xmlSrc);
});

References

javascript/ql/src/Security/CWE-807/ConditionalBypass.qhelp

User-controlled bypass of security check

Using user-controlled data in a permissions check may allow a user to gain unauthorized access to protected functionality or data.

Recommendation

When checking whether a user is authorized for a particular activity, do not use data that is entirely controlled by that user in the permissions check. If necessary, always validate the input, ideally against a fixed list of expected values.

Similarly, do not decide which permission to check for, based on user data. In particular, avoid using computation to decide which permissions to check for. Use fixed permissions for particular actions, rather than generating the permission to check for.

Example

In this example, we have a server that shows private information for a user, based on the request parameter userId. For privacy reasons, users may only view their own private information, so the server checks that the request parameter userId matches a cookie value for the user who is logged in.

var express = require('express');
var app = express();
// ...
app.get('/full-profile/:userId', function(req, res) {

    if (req.cookies.loggedInUserId !== req.params.userId) {
        // BAD: login decision made based on user controlled data
        requireLogin();
    } else {
        // ... show private information
    }

});

This security check is, however, insufficient since an attacker can craft his cookie values to match those of any user. To prevent this, the server can cryptographically sign the security critical cookie values:

var express = require('express');
var app = express();
// ...
app.get('/full-profile/:userId', function(req, res) {

    if (req.signedCookies.loggedInUserId !== req.params.userId) {
        // GOOD: login decision made based on server controlled data
        requireLogin();
    } else {
        // ... show private information
    }

});

References

  • Common Weakness Enumeration: CWE-807.
  • Common Weakness Enumeration: CWE-290.
javascript/ql/src/Security/CWE-915/PrototypePollutingAssignment.qhelp

Prototype-polluting assignment

Most JavaScript objects inherit the properties of the built-in Object.prototype object. Prototype pollution is a type of vulnerability in which an attacker is able to modify Object.prototype. Since most objects inherit from the compromised Object.prototype object, the attacker can use this to tamper with the application logic, and often escalate to remote code execution or cross-site scripting.

One way to cause prototype pollution is by modifying an object obtained via a user-controlled property name. Most objects have a special __proto__ property that refers to Object.prototype. An attacker can abuse this special property to trick the application into performing unintended modifications of Object.prototype.

Recommendation

Use an associative data structure that is resilient to untrusted key values, such as a Map. In some cases, a prototype-less object created with Object.create(null) may be preferable.

Alternatively, restrict the computed property name so it can't clash with a built-in property, either by prefixing it with a constant string, or by rejecting inputs that don't conform to the expected format.

Example

In the example below, the untrusted value req.params.id is used as the property name req.session.todos[id]. If a malicious user passes in the ID value __proto__, the variable todo will then refer to Object.prototype. Finally, the modification of todo then allows the attacker to inject arbitrary properties onto Object.prototype.

let express = require('express');
let app = express()

app.put('/todos/:id', (req, res) => {
    let id = req.params.id;
    let items = req.session.todos[id];
    if (!items) {
        items = req.session.todos[id] = {};
    }
    items[req.query.name] = req.query.text;
    res.end(200);
});

One way to fix this is to use Map objects to associate key/value pairs instead of regular objects, as shown below:

let express = require('express');
let app = express()

app.put('/todos/:id', (req, res) => {
    let id = req.params.id;
    let items = req.session.todos.get(id);
    if (!items) {
        items = new Map();
        req.sessions.todos.set(id, items);
    }
    items.set(req.query.name, req.query.text);
    res.end(200);
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-020/UntrustedDataToExternalAPI.qhelp

Untrusted data passed to external API with additional heuristic sources

Using unsanitized untrusted data in an external API can cause a variety of security issues. This query reports external APIs that use untrusted data. The results are not filtered so that you can audit all examples. The query provides data for security reviews of the application and you can also use it to identify external APIs that should be modeled as either taint steps, or sinks for specific problems.

An external API is defined as a method call to a method that is not defined in the source code, not overridden in the source code, and is not modeled as a taint step in the default taint library. External APIs may be from the third-party dependencies or from internal dependencies. The query reports uses of untrusted data one of the arguments of external API call or in the return value from a callback passed to an external API.

Recommendation

For each result:

  • If the result highlights a known sink, confirm that the result is reported by the relevant query, or that the result is a false positive because this data is sanitized.
  • If the result highlights an unknown sink for a problem, then add modeling for the sink to the relevant query, and confirm that the result is either found, or is safe due to appropriate sanitization.
  • If the result represents a call to an external API that transfers taint, add the appropriate modeling, and re-run the query to determine what new results have appeared due to this additional modeling.
    Otherwise, the result is likely uninteresting. Custom versions of this query can extend the SafeExternalAPIMethod class to exclude known safe external APIs from future analysis.

Example

In this first example, a query parameter is read from the req parameter and then ultimately used in a call to the res.send external API:

express().get('/news', (req, res) => {
    let topic = req.query.topic;
    res.send(`<h1>${topic}</h1>`);
});

This is a reflected XSS sink. The XSS query should therefore be reviewed to confirm that this sink is appropriately modeled, and if it is, to confirm that the query reports this particular result, or that the result is a false positive due to some existing sanitization.

In this second example, again a query parameter is read from req.

let path = require('path');

express().get('/data', (req, res) => {
    let file = path.join(HOME_DIR, 'public', req.query.file);
    res.sendFile(file);
});

If the query reported the call to path.join on line 4, this would suggest that this external API is not currently modeled as a taint step in the taint tracking library. The next step would be to model this as a taint step, then re-run the query to determine what additional results might be found. In this example, it seems the result of the path.join will be used as a file path, leading to a path traversal vulnerability.

Note that both examples are correctly handled by the standard taint tracking library and security queries.

References

  • Common Weakness Enumeration: CWE-20.
javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-078/CommandInjection.qhelp

Uncontrolled command line with additional heuristic sources

Code that passes user input directly to require('child_process').exec, or some other library routine that executes a command, allows the user to execute malicious code.

Recommendation

If possible, use hard-coded string literals to specify the command to run or library to load. Instead of passing the user input directly to the process or library function, examine the user input and then choose among hard-coded string literals.

If the applicable libraries or commands cannot be determined at compile time, then add code to verify that the user input string is safe before using it.

Example

The following example shows code that takes a shell script that can be changed maliciously by a user, and passes it straight to child_process.exec without examining it first.

var cp = require("child_process"),
    http = require('http'),
    url = require('url');

var server = http.createServer(function(req, res) {
    let cmd = url.parse(req.url, true).query.path;

    cp.exec(cmd); // BAD
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-079/Xss.qhelp

Client-side cross-site scripting with additional heuristic sources

Directly writing user input (for example, a URL query parameter) to a webpage without properly sanitizing the input first, allows for a cross-site scripting vulnerability.

This kind of vulnerability is also called DOM-based cross-site scripting, to distinguish it from other types of cross-site scripting.

Recommendation

To guard against cross-site scripting, consider using contextual output encoding/escaping before writing user input to the page, or one of the other solutions that are mentioned in the references.

Example

The following example shows part of the page URL being written directly to the document, leaving the website vulnerable to cross-site scripting.

function setLanguageOptions() {
    var href = document.location.href,
        deflt = href.substring(href.indexOf("default=")+8);
    document.write("<OPTION value=1>"+deflt+"</OPTION>");
    document.write("<OPTION value=2>English</OPTION>");
}

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-089/SqlInjection.qhelp

Database query built from user-controlled sources with additional heuristic sources

If a database query (such as a SQL or NoSQL query) is built from user-provided data without sufficient sanitization, a malicious user may be able to run malicious database queries.

Recommendation

Most database connector libraries offer a way of safely embedding untrusted data into a query by means of query parameters or prepared statements.

For NoSQL queries, make use of an operator like MongoDB's $eq to ensure that untrusted data is interpreted as a literal value and not as a query object.

Example

In the following example, assume the function handler is an HTTP request handler in a web application, whose parameter req contains the request object.

The handler constructs two copies of the same SQL query involving user input taken from the request object, once unsafely using string concatenation, and once safely using query parameters.

In the first case, the query string query1 is built by directly concatenating a user-supplied request parameter with some string literals. The parameter may include quote characters, so this code is vulnerable to a SQL injection attack.

In the second case, the parameter is embedded into the query string query2 using query parameters. In this example, we use the API offered by the pg Postgres database connector library, but other libraries offer similar features. This version is immune to injection attacks.

const app = require("express")(),
      pg = require("pg"),
      pool = new pg.Pool(config);

app.get("search", function handler(req, res) {
  // BAD: the category might have SQL special characters in it
  var query1 =
    "SELECT ITEM,PRICE FROM PRODUCT WHERE ITEM_CATEGORY='" +
    req.params.category +
    "' ORDER BY PRICE";
  pool.query(query1, [], function(err, results) {
    // process results
  });

  // GOOD: use parameters
  var query2 =
    "SELECT ITEM,PRICE FROM PRODUCT WHERE ITEM_CATEGORY=$1" + " ORDER BY PRICE";
  pool.query(query2, [req.params.category], function(err, results) {
    // process results
  });
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-094/CodeInjection.qhelp

Code injection with additional heuristic sources

Directly evaluating user input (for example, an HTTP request parameter) as code without properly sanitizing the input first allows an attacker arbitrary code execution. This can occur when user input is treated as JavaScript, or passed to a framework which interprets it as an expression to be evaluated. Examples include AngularJS expressions or JQuery selectors.

Recommendation

Avoid including user input in any expression which may be dynamically evaluated. If user input must be included, use context-specific escaping before including it. It is important that the correct escaping is used for the type of evaluation that will occur.

Example

The following example shows part of the page URL being evaluated as JavaScript code. This allows an attacker to provide JavaScript within the URL. If an attacker can persuade a user to click on a link to such a URL, the attacker can evaluate arbitrary JavaScript in the browser of the user to, for example, steal cookies containing session information.

eval(document.location.href.substring(document.location.href.indexOf("default=")+8))

The following example shows a Pug template being constructed from user input, allowing attackers to run arbitrary code via a payload such as #{global.process.exit(1)}.

const express = require('express')
var pug = require('pug');
const app = express()

app.post('/', (req, res) => {
    var input = req.query.username;
    var template = `
doctype
html
head
    title= 'Hello world'
body
    form(action='/' method='post')
        input#name.form-control(type='text)
        button.btn.btn-primary(type='submit') Submit
    p Hello `+ input
    var fn = pug.compile(template);
    var html = fn();
    res.send(html);
})

Below is an example of how to use a template engine without any risk of template injection. The user input is included via an interpolation expression #{username} whose value is provided as an option to the template, instead of being part of the template string itself:

const express = require('express')
var pug = require('pug');
const app = express()

app.post('/', (req, res) => {
    var input = req.query.username;
    var template = `
doctype
html
head
    title= 'Hello world'
body
    form(action='/' method='post')
        input#name.form-control(type='text)
        button.btn.btn-primary(type='submit') Submit
    p Hello #{username}`
    var fn = pug.compile(template);
    var html = fn({username: input});
    res.send(html);
})

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-117/LogInjection.qhelp

Log injection with additional heuristic sources

If unsanitized user input is written to a log entry, a malicious user may be able to forge new log entries.

Forgery can occur if a user provides some input with characters that are interpreted when the log output is displayed. If the log is displayed as a plain text file, then new line characters can be used by a malicious user. If the log is displayed as HTML, then arbitrary HTML may be included to spoof log entries.

Recommendation

User input should be suitably sanitized before it is logged.

If the log entries are in plain text then line breaks should be removed from user input, using String.prototype.replace or similar. Care should also be taken that user input is clearly marked in log entries.

For log entries that will be displayed in HTML, user input should be HTML-encoded before being logged, to prevent forgery and other forms of HTML injection.

Example

In the first example, a username, provided by the user, is logged using `console.info`. In the first case, it is logged without any sanitization. In the second case, the username is used to build an error that is logged using `console.error`. If a malicious user provides `username=Guest%0a[INFO]+User:+Admin%0a` as a username parameter, the log entry will be splitted in two different lines, where the second line will be `[INFO]+User:+Admin`.

const http = require('http');
const url = require('url');

const server = http.createServer((req, res) => {
    let q = url.parse(req.url, true);

    console.info(`[INFO] User: ${q.query.username}`); // BAD: User input logged as-is
})

server.listen(3000, '127.0.0.1', () => {});

In the second example, String.prototype.replace is used to ensure no line endings are present in the user input.

const http = require('http');
const url = require('url');

const server = http.createServer((req, res) => {
    let q = url.parse(req.url, true);

    // GOOD: remove newlines from user controlled input before logging
    let username = q.query.username.replace(/\n|\r/g, "");

    console.info(`[INFO] User: ${username}`);
});

server.listen(3000, '127.0.0.1', () => {});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-134/TaintedFormatString.qhelp

Use of externally-controlled format string with additional heuristic sources

Functions like the Node.js standard library function util.format accept a format string that is used to format the remaining arguments by providing inline format specifiers. If the format string contains unsanitized input from an untrusted source, then that string may contain unexpected format specifiers that cause garbled output.

Recommendation

Either sanitize the input before including it in the format string, or use a %s specifier in the format string, and pass the untrusted data as corresponding argument.

Example

The following program snippet logs information about an unauthorized access attempt. The log message includes the user name, and the user's IP address is passed as an additional argument to console.log to be appended to the message:

const app = require("express")();

app.get("unauthorized", function handler(req, res) {
  let user = req.query.user;
  let ip = req.connection.remoteAddress;
  console.log("Unauthorized access attempt by " + user, ip);
});

However, if a malicious user provides %d as their user name, console.log will instead attempt to format the ip argument as a number. Since IP addresses are not valid numbers, the result of this conversion is NaN. The resulting log message will read "Unauthorized access attempt by NaN", missing all the information that it was trying to log in the first place.

Instead, the user name should be included using the %s specifier:

const app = require("express")();

app.get("unauthorized", function handler(req, res) {
  let user = req.query.user;
  let ip = req.connection.remoteAddress;
  console.log("Unauthorized access attempt by %s", user, ip);
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-346/CorsMisconfigurationForCredentials.qhelp

CORS misconfiguration for credentials transfer with additional heuristic sources

A server can send the "Access-Control-Allow-Credentials" CORS header to control when a browser may send user credentials in Cross-Origin HTTP requests.

When the Access-Control-Allow-Credentials header is "true", the Access-Control-Allow-Origin header must have a value different from "*" in order to make browsers accept the header. Therefore, to allow multiple origins for Cross-Origin requests with credentials, the server must dynamically compute the value of the "Access-Control-Allow-Origin" header. Computing this header value from information in the request to the server can therefore potentially allow an attacker to control the origins that the browser sends credentials to.

Recommendation

When the Access-Control-Allow-Credentials header value is "true", a dynamic computation of the Access-Control-Allow-Origin header must involve sanitization if it relies on user-controlled input.

Since the "null" origin is easy to obtain for an attacker, it is never safe to use "null" as the value of the Access-Control-Allow-Origin header when the Access-Control-Allow-Credentials header value is "true".

Example

In the example below, the server allows the browser to send user credentials in a Cross-Origin request. The request header origins controls the allowed origins for such a Cross-Origin request.

var https = require('https'),
    url = require('url');

var server = https.createServer(function(){});

server.on('request', function(req, res) {
    let origin = url.parse(req.url, true).query.origin;
     // BAD: attacker can choose the value of origin
    res.setHeader("Access-Control-Allow-Origin", origin);
    res.setHeader("Access-Control-Allow-Credentials", true);

    // ...
});

This is not secure, since an attacker can choose the value of the origin request header to make the browser send credentials to their own server. The use of a whitelist containing allowed origins for the Cross-Origin request fixes the issue:

var https = require('https'),
    url = require('url');

var server = https.createServer(function(){});

server.on('request', function(req, res) {
    let origin = url.parse(req.url, true).query.origin,
        whitelist = {
            "https://example.com": true,
            "https://subdomain.example.com": true,
            "https://example.com:1337": true
        };

    if (origin in whitelist) {
        // GOOD: the origin is in the whitelist
        res.setHeader("Access-Control-Allow-Origin", origin);
        res.setHeader("Access-Control-Allow-Credentials", true);
    }

    // ...
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-400/RemotePropertyInjection.qhelp

Remote property injection with additional heuristic sources

Dynamically computing object property names from untrusted input may have multiple undesired consequences. For example, if the property access is used as part of a write, an attacker may overwrite vital properties of objects, such as __proto__. This attack is known as prototype pollution attack and may serve as a vehicle for denial-of-service attacks. A similar attack vector, is to replace the toString property of an object with a primitive. Whenever toString is then called on that object, either explicitly or implicitly as part of a type coercion, an exception will be raised.

Moreover, if the name of an HTTP header is user-controlled, an attacker may exploit this to overwrite security-critical headers such as Access-Control-Allow-Origin or Content-Security-Policy.

Recommendation

The most common case in which prototype pollution vulnerabilities arise is when JavaScript objects are used for implementing map data structures. This case should be avoided whenever possible by using the ECMAScript 2015 Map instead. When this is not possible, an alternative fix is to prepend untrusted input with a marker character such as $, before using it in properties accesses. In this way, the attacker does not have access to built-in properties which do not start with the chosen character.

When using user input as part of a header name, a sanitization step should be performed on the input to ensure that the name does not clash with existing header names such as Content-Security-Policy.

Example

In the example below, the dynamically computed property prop is accessed on myObj using a user-controlled value.

var express = require('express');

var app = express();
var myObj = {}

app.get('/user/:id', function(req, res) {
	var prop = req.query.userControlled; // BAD
	myObj[prop] = function() {};
	console.log("Request object " + myObj);
});

This is not secure since an attacker may exploit this code to overwrite the property __proto__ with an empty function. If this happens, the concatenation in the console.log argument will fail with a confusing message such as "Function.prototype.toString is not generic". If the application does not properly handle this error, this scenario may result in a serious denial-of-service attack. The fix is to prepend the user-controlled string with a marker character such as $ which will prevent arbitrary property names from being overwritten.

var express = require('express');

var app = express();
var myObj = {}

app.get('/user/:id', function(req, res) {
	var prop = "$" + req.query.userControlled; // GOOD
	myObj[prop] = function() {};
	console.log("Request object " + myObj);
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-502/UnsafeDeserialization.qhelp

Deserialization of user-controlled data with additional heuristic sources

Deserializing untrusted data using any deserialization framework that allows the construction of arbitrary functions is easily exploitable and, in many cases, allows an attacker to execute arbitrary code.

Recommendation

Avoid deserialization of untrusted data if at all possible. If the architecture permits it, then use formats like JSON or XML that cannot represent functions. When using YAML or other formats that support the serialization and deserialization of functions, ensure that the parser is configured to disable deserialization of arbitrary functions.

Example

The following example calls the load function of the popular js-yaml package on data that comes from an HTTP request and hence is inherently unsafe.

const app = require("express")(),
  jsyaml = require("js-yaml");

app.get("load", function(req, res) {
  let data = jsyaml.load(req.params.data);
  // ...
});

Using the safeLoad function instead (which does not deserialize YAML-encoded functions) removes the vulnerability.

const app = require("express")(),
  jsyaml = require("js-yaml");

app.get("load", function(req, res) {
  let data = jsyaml.safeLoad(req.params.data);
  // ...
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-611/Xxe.qhelp

XML external entity expansion with additional heuristic sources

Parsing untrusted XML files with a weakly configured XML parser may lead to an XML External Entity (XXE) attack. This type of attack uses external entity references to access arbitrary files on a system, carry out denial-of-service (DoS) attacks, or server-side request forgery. Even when the result of parsing is not returned to the user, DoS attacks are still possible and out-of-band data retrieval techniques may allow attackers to steal sensitive data.

Recommendation

The easiest way to prevent XXE attacks is to disable external entity handling when parsing untrusted data. How this is done depends on the library being used. Note that some libraries, such as recent versions of libxml, disable entity expansion by default, so unless you have explicitly enabled entity expansion, no further action needs to be taken.

Example

The following example uses the libxml XML parser to parse a string xmlSrc. If that string is from an untrusted source, this code may be vulnerable to an XXE attack, since the parser is invoked with the noent option set to true:

const app = require("express")(),
  libxml = require("libxmljs");

app.post("upload", (req, res) => {
  let xmlSrc = req.body,
    doc = libxml.parseXml(xmlSrc, { noent: true });
});

To guard against XXE attacks, the noent option should be omitted or set to false. This means that no entity expansion is undertaken at all, not even for standard internal entities such as &amp; or &gt;. If desired, these entities can be expanded in a separate step using utility functions provided by libraries such as underscore, lodash or he.

const app = require("express")(),
  libxml = require("libxmljs");

app.post("upload", (req, res) => {
  let xmlSrc = req.body,
    doc = libxml.parseXml(xmlSrc);
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-643/XpathInjection.qhelp

XPath injection with additional heuristic sources

If an XPath expression is built using string concatenation, and the components of the concatenation include user input, it makes it very easy for a user to create a malicious XPath expression.

Recommendation

If user input must be included in an XPath expression, either sanitize the data or use variable references to safely embed it without altering the structure of the expression.

Example

In this example, the code accepts a user name specified by the user, and uses this unvalidated and unsanitized value in an XPath expression constructed using the xpath package. This is vulnerable to the user providing special characters or string sequences that change the meaning of the XPath expression to search for different values.

const express = require('express');
const xpath = require('xpath');
const app = express();

app.get('/some/route', function(req, res) {
  let userName = req.param("userName");

  // BAD: Use user-provided data directly in an XPath expression
  let badXPathExpr = xpath.parse("//users/user[login/text()='" + userName + "']/home_dir/text()");
  badXPathExpr.select({
    node: root
  });
});

Instead, embed the user input using the variable replacement mechanism offered by xpath:

const express = require('express');
const xpath = require('xpath');
const app = express();

app.get('/some/route', function(req, res) {
  let userName = req.param("userName");

  // GOOD: Embed user-provided data using variables
  let goodXPathExpr = xpath.parse("//users/user[login/text()=$userName]/home_dir/text()");
  goodXPathExpr.select({
    node: root,
    variables: { userName: userName }
  });
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-730/RegExpInjection.qhelp

Regular expression injection with additional heuristic sources

Constructing a regular expression with unsanitized user input is dangerous as a malicious user may be able to modify the meaning of the expression. In particular, such a user may be able to provide a regular expression fragment that takes exponential time in the worst case, and use that to perform a Denial of Service attack.

Recommendation

Before embedding user input into a regular expression, use a sanitization function such as lodash's _.escapeRegExp to escape meta-characters that have special meaning.

Example

The following example shows a HTTP request parameter that is used to construct a regular expression without sanitizing it first:

var express = require('express');
var app = express();

app.get('/findKey', function(req, res) {
  var key = req.param("key"), input = req.param("input");

  // BAD: Unsanitized user input is used to construct a regular expression
  var re = new RegExp("\\b" + key + "=(.*)\n");
});

Instead, the request parameter should be sanitized first, for example using the function _.escapeRegExp from the lodash package. This ensures that the user cannot insert characters which have a special meaning in regular expressions.

var express = require('express');
var _ = require('lodash');
var app = express();

app.get('/findKey', function(req, res) {
  var key = req.param("key"), input = req.param("input");

  // GOOD: User input is sanitized before constructing the regex
  var safeKey = _.escapeRegExp(key);
  var re = new RegExp("\\b" + safeKey + "=(.*)\n");
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-770/ResourceExhaustion.qhelp

Resource exhaustion with additional heuristic sources

Applications are constrained by how many resources they can make use of. Failing to respect these constraints may cause the application to be unresponsive or crash. It is therefore problematic if attackers can control the sizes or lifetimes of allocated objects.

Recommendation

Ensure that attackers can not control object sizes and their lifetimes. If object sizes and lifetimes must be controlled by external parties, ensure you restrict the object sizes and lifetimes so that they are within acceptable ranges.

Example

The following example allocates a buffer with a user-controlled size.

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var size = parseInt(url.parse(req.url, true).query.size);

	let buffer = Buffer.alloc(size); // BAD

	// ... use the buffer
});

This is problematic since an attacker can choose a size that makes the application run out of memory. Even worse, in older versions of Node.js, this could leak confidential memory. To prevent such attacks, limit the buffer size:

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var size = parseInt(url.parse(req.url, true).query.size);

	if (size > 1024) {
		res.statusCode = 400;
		res.end("Bad request.");
		return;
	}

	let buffer = Buffer.alloc(size); // GOOD

	// ... use the buffer
});

Example

As another example, consider an application that allocates an array with a user-controlled size, and then fills it with values:

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var size = parseInt(url.parse(req.url, true).query.size);

	let dogs = new Array(size).fill("dog"); // BAD

	// ... use the dog
});

The allocation of the array itself is not problematic since arrays are allocated sparsely, but the subsequent filling of the array will take a long time, causing the application to be unresponsive, or even run out of memory. Again, a limit on the size will prevent the attack:

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var size = parseInt(url.parse(req.url, true).query.size);

	if (size > 1024) {
		res.statusCode = 400;
		res.end("Bad request.");
		return;
	}

	let dogs = new Array(size).fill("dog"); // GOOD

	// ... use the dogs
});

Example

Finally, the following example lets a user choose a delay after which a function is executed:

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var delay = parseInt(url.parse(req.url, true).query.delay);

	setTimeout(f, delay); // BAD

});

This is problematic because a large delay essentially makes the application wait indefinitely before executing the function. Repeated registrations of such delays will therefore use up all of the memory in the application. A limit on the delay will prevent the attack:

var http = require("http"),
    url = require("url");

var server = http.createServer(function(req, res) {
	var delay = parseInt(url.parse(req.url, true).query.delay);

	if (delay > 1000) {
		res.statusCode = 400;
		res.end("Bad request.");
		return;
	}

	setTimeout(f, delay); // GOOD

});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-776/XmlBomb.qhelp

XML internal entity expansion with additional heuristic sources

Parsing untrusted XML files with a weakly configured XML parser may be vulnerable to denial-of-service (DoS) attacks exploiting uncontrolled internal entity expansion.

In XML, so-called internal entities are a mechanism for introducing an abbreviation for a piece of text or part of a document. When a parser that has been configured to expand entities encounters a reference to an internal entity, it replaces the entity by the data it represents. The replacement text may itself contain other entity references, which are expanded recursively. This means that entity expansion can increase document size dramatically.

If untrusted XML is parsed with entity expansion enabled, a malicious attacker could submit a document that contains very deeply nested entity definitions, causing the parser to take a very long time or use large amounts of memory. This is sometimes called an XML bomb attack.

Recommendation

The safest way to prevent XML bomb attacks is to disable entity expansion when parsing untrusted data. How this is done depends on the library being used. Note that some libraries, such as recent versions of libxmljs (though not its SAX parser API), disable entity expansion by default, so unless you have explicitly enabled entity expansion, no further action is needed.

Example

The following example uses the XML parser provided by the node-expat package to parse a string xmlSrc. If that string is from an untrusted source, this code may be vulnerable to a DoS attack, since node-expat expands internal entities by default:

const app = require("express")(),
  expat = require("node-expat");

app.post("upload", (req, res) => {
  let xmlSrc = req.body,
    parser = new expat.Parser();
  parser.on("startElement", handleStart);
  parser.on("text", handleText);
  parser.write(xmlSrc);
});

At the time of writing, node-expat does not provide a way of controlling entity expansion, but the example could be rewritten to use the sax package instead, which only expands standard entities such as &amp;:

const app = require("express")(),
  sax = require("sax");

app.post("upload", (req, res) => {
  let xmlSrc = req.body,
    parser = sax.parser(true);
  parser.onopentag = handleStart;
  parser.ontext = handleText;
  parser.write(xmlSrc);
});

References

javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-807/ConditionalBypass.qhelp

User-controlled bypass of security check with additional heuristic sources

Using user-controlled data in a permissions check may allow a user to gain unauthorized access to protected functionality or data.

Recommendation

When checking whether a user is authorized for a particular activity, do not use data that is entirely controlled by that user in the permissions check. If necessary, always validate the input, ideally against a fixed list of expected values.

Similarly, do not decide which permission to check for, based on user data. In particular, avoid using computation to decide which permissions to check for. Use fixed permissions for particular actions, rather than generating the permission to check for.

Example

In this example, we have a server that shows private information for a user, based on the request parameter userId. For privacy reasons, users may only view their own private information, so the server checks that the request parameter userId matches a cookie value for the user who is logged in.

var express = require('express');
var app = express();
// ...
app.get('/full-profile/:userId', function(req, res) {

    if (req.cookies.loggedInUserId !== req.params.userId) {
        // BAD: login decision made based on user controlled data
        requireLogin();
    } else {
        // ... show private information
    }

});

This security check is, however, insufficient since an attacker can craft his cookie values to match those of any user. To prevent this, the server can cryptographically sign the security critical cookie values:

var express = require('express');
var app = express();
// ...
app.get('/full-profile/:userId', function(req, res) {

    if (req.signedCookies.loggedInUserId !== req.params.userId) {
        // GOOD: login decision made based on server controlled data
        requireLogin();
    } else {
        // ... show private information
    }

});

References

  • Common Weakness Enumeration: CWE-807.
  • Common Weakness Enumeration: CWE-290.
javascript/ql/src/experimental/heuristics/ql/src/Security/CWE-915/PrototypePollutingAssignment.qhelp

Prototype-polluting assignment with additional heuristic sources

Most JavaScript objects inherit the properties of the built-in Object.prototype object. Prototype pollution is a type of vulnerability in which an attacker is able to modify Object.prototype. Since most objects inherit from the compromised Object.prototype object, the attacker can use this to tamper with the application logic, and often escalate to remote code execution or cross-site scripting.

One way to cause prototype pollution is by modifying an object obtained via a user-controlled property name. Most objects have a special __proto__ property that refers to Object.prototype. An attacker can abuse this special property to trick the application into performing unintended modifications of Object.prototype.

Recommendation

Use an associative data structure that is resilient to untrusted key values, such as a Map. In some cases, a prototype-less object created with Object.create(null) may be preferable.

Alternatively, restrict the computed property name so it can't clash with a built-in property, either by prefixing it with a constant string, or by rejecting inputs that don't conform to the expected format.

Example

In the example below, the untrusted value req.params.id is used as the property name req.session.todos[id]. If a malicious user passes in the ID value __proto__, the variable todo will then refer to Object.prototype. Finally, the modification of todo then allows the attacker to inject arbitrary properties onto Object.prototype.

let express = require('express');
let app = express()

app.put('/todos/:id', (req, res) => {
    let id = req.params.id;
    let items = req.session.todos[id];
    if (!items) {
        items = req.session.todos[id] = {};
    }
    items[req.query.name] = req.query.text;
    res.end(200);
});

One way to fix this is to use Map objects to associate key/value pairs instead of regular objects, as shown below:

let express = require('express');
let app = express()

app.put('/todos/:id', (req, res) => {
    let id = req.params.id;
    let items = req.session.todos.get(id);
    if (!items) {
        items = new Map();
        req.sessions.todos.set(id, items);
    }
    items.set(req.query.name, req.query.text);
    res.end(200);
});

References

python/ql/src/Imports/SyntaxError.qhelp

Syntax error

Syntax errors prevent a module being evaluated and thus imported. An attempt to import a module with invalid syntax will fail; a SyntaxError will be raised.

A common cause of syntax errors is the difference in syntax between Python 2 and Python 3. In particular, a syntax error may be alerted if a Python 3 file is assumed to be compatible with Python 2 (or vice versa). Explicitly specifying the expected Python version can help prevent this.

The existence of a syntax error in a module may suggest other problems as well. Either the module is never imported in practice and could be deleted or a try statement around the import is mistakenly discarding the SyntaxError.

Recommendation

Fixing the syntax error is the obvious fix. However, it is worth investigating why a module containing a syntax error was able to persist and address that problem as well.

If you suspect that the syntax error is caused by the analysis using the wrong version of Python, consider specifying the version explicitly. When you run code scanning using the CodeQL action, you can configure the Python version to use. For more information, see Analyzing Python dependencies.

References

python/ql/src/Security/CWE-326/WeakCryptoKey.qhelp

Use of weak cryptographic key

Modern encryption relies on it being computationally infeasible to break the cipher and decode a message without the key. As computational power increases, the ability to break ciphers grows and keys need to become larger.

The three main asymmetric key algorithms currently in use are Rivest–Shamir–Adleman (RSA) cryptography, Digital Signature Algorithm (DSA), and Elliptic-curve cryptography (ECC). With current technology, key sizes of 2048 bits for RSA and DSA, or 256 bits for ECC, are regarded as unbreakable.

Recommendation

Increase the key size to the recommended amount or larger. For RSA or DSA this is at least 2048 bits, for ECC this is at least 256 bits.

References

python/ql/src/Security/CWE-730/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			re.sub(r"^\s+|\s+$", "", text) # BAD
		

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (^\s+|(?<!\s)\s+$), or just by using the built-in strip method (text.strip()).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			^0\.\d+E?\d+$ # BAD
		

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: ^0\.\d+(E\d+)?$.

References

python/ql/src/Security/CWE-730/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine provided by Python uses a backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

			^_(__|.)+_$
		

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

			^_(__|[^_])+_$
		

References

python/ql/src/experimental/Security/CWE-022bis/UnsafeUnpack.qhelp

Arbitrary file write during a tarball extraction from a user controlled source

Extracting files from a malicious tarball without validating that the destination file path is within the destination directory using shutil.unpack_archive() can cause files outside the destination directory to be overwritten, due to the possible presence of directory traversal elements (..) in archive path names.

Tarball contain archive entries representing each file in the archive. These entries include a file path for the entry, but these file paths are not restricted and may contain unexpected special elements such as the directory traversal element (..). If these file paths are used to determine an output file to write the contents of the archive item to, then the file may be written to an unexpected location. This can result in sensitive information being revealed or deleted, or an attacker being able to influence behavior by modifying unexpected files.

For example, if a tarball contains a file entry ../sneaky-file.txt, and the tarball is extracted to the directory /tmp/tmp123, then naively combining the paths would result in an output file path of /tmp/tmp123/../sneaky-file.txt, which would cause the file to be written to /tmp/.

Recommendation

Ensure that output paths constructed from tarball entries are validated to prevent writing files to unexpected locations.

Consider using a safer module, such as: zipfile

Example

In this example an archive is extracted without validating file paths.

import requests
import shutil

url = "https://www.someremote.location/tarball.tar.gz"
response = requests.get(url, stream=True)

tarpath = "/tmp/tmp456/tarball.tar.gz"
with open(tarpath, "wb") as f:
      f.write(response.raw.read())

untarredpath = "/tmp/tmp123"
shutil.unpack_archive(tarpath, untarredpath)

To fix this vulnerability, we need to call the function tarfile.extract() on each member after verifying that it does not contain either .. or startswith /.

import requests
import tarfile

url = "https://www.someremote.location/tarball.tar.gz"
response = requests.get(url, stream=True)

tarpath = "/tmp/tmp456/tarball.tar.gz"
with open(tarpath, "wb") as f:
      f.write(response.raw.read())

untarredpath = "/tmp/tmp123"
with tarfile.open(tarpath) as tar:
	for member in tar.getmembers():
		if member.name.startswith("/") or ".." in member.name:
			raise Exception("Path traversal identified in tarball")

		tar.extract(untarredpath, member)

References

ruby/ql/src/queries/security/cwe-020/MissingFullAnchor.qhelp

Badly anchored regular expression

Regular expressions in Ruby can use anchors to match the beginning and end of a string. However, if the ^ and $ anchors are used, the regular expression can match a single line of a multi-line string. This allows bad actors to bypass your regular expression checks and inject malicious input.

Recommendation

Use the \A and \z anchors since these anchors will always match the beginning and end of the string, even if the string contains newlines.

Example

The following (bad) example code uses a regular expression to check that a string contains only digits.

def bad(input) 
    raise "Bad input" unless input =~ /^[0-9]+$/

    # ....
end

The regular expression /^[0-9]+$/ will match a single line of a multi-line string, which may not be the intended behavior. The following (good) example code uses the regular expression \A[0-9]+\z to match the entire input string.

def good(input)
    raise "Bad input" unless input =~ /\A[0-9]+\z/

    # ....
end

References

  • Ruby documentation: Anchors
  • Common Weakness Enumeration: CWE-20.
ruby/ql/src/queries/security/cwe-078/KernelOpen.qhelp

Use of Kernel.open, IO.read or similar sinks with user-controlled input

If Kernel.open is given a file name that starts with a | character, it will execute the remaining string as a shell command. If a malicious user can control the file name, they can execute arbitrary code. The same vulnerability applies to IO.read, IO.write, IO.binread, IO.binwrite, IO.foreach, IO.readlines and URI.open.

Recommendation

Use File.open instead of Kernel.open, as the former does not have this vulnerability. Similarly, use the methods from the File class instead of the IO class e.g. File.read instead of IO.read.

Instead of URI.open use URI(..).open or an HTTP Client.

Example

The following example shows code that calls Kernel.open on a user-supplied file path.

class UsersController < ActionController::Base
  def create
    filename = params[:filename]
    open(filename) # BAD
  end
end  

Instead, File.open should be used, as in the following example.

class UsersController < ActionController::Base
    def create
      filename = params[:filename]
      File.open(filename)
    end
  end  

References

ruby/ql/src/queries/security/cwe-078/NonConstantKernelOpen.qhelp

Use of Kernel.open or IO.read or similar sinks with a non-constant value

If Kernel.open is given a file name that starts with a | character, it will execute the remaining string as a shell command. If a malicious user can control the file name, they can execute arbitrary code. The same vulnerability applies to IO.read, IO.write, IO.binread, IO.binwrite, IO.foreach, IO.readlines and URI.open.

Recommendation

Use File.open instead of Kernel.open, as the former does not have this vulnerability. Similarly, use the methods from the File class instead of the IO class e.g. File.read instead of IO.read.

Instead of URI.open use URI(..).open or an HTTP Client.

Example

The following example shows code that calls Kernel.open on a user-supplied file path.

class UsersController < ActionController::Base
  def create
    filename = params[:filename]
    open(filename) # BAD
  end
end  

Instead, File.open should be used, as in the following example.

class UsersController < ActionController::Base
    def create
      filename = params[:filename]
      File.open(filename)
    end
  end  

References

ruby/ql/src/queries/security/cwe-079/UnsafeHtmlConstruction.qhelp

Unsafe HTML constructed from library input

When a library function dynamically constructs HTML in a potentially unsafe way, then it's important to document to clients of the library that the function should only be used with trusted inputs. If the function is not documented as being potentially unsafe, then a client may inadvertently use inputs containing unsafe HTML fragments, and thereby leave the client vulnerable to cross-site scripting attacks.

Recommendation

Document all library functions that can lead to cross-site scripting attacks, and guard against unsafe inputs where dynamic HTML construction is not intended.

Example

The following example has a library function that renders a boldface name by creating a string containing a <b> with the name embedded in it.

class UsersController < ActionController::Base
  # BAD - create a user description, where the name is not escaped
  def create_user_description (name)
    "<b>#{name}</b>".html_safe
  end
end

This library function, however, does not escape unsafe HTML, and a client that calls the function with user-supplied input may be vulnerable to cross-site scripting attacks.

The library could either document that this function should not be used with unsafe inputs, or escape the input before embedding it in the HTML fragment.

class UsersController < ActionController::Base
  # Good - create a user description, where the name is escaped
  def create_user_description (name)
    "<b>#{ERB::Util.html_escape(name)}</b>".html_safe
  end
end

References

ruby/ql/src/queries/security/cwe-094/UnsafeCodeConstruction.qhelp

Unsafe code constructed from library input

When a library function dynamically constructs code in a potentially unsafe way, it's important to document to clients of the library that the function should only be used with trusted inputs. If the function is not documented as being potentially unsafe, then a client may incorrectly use inputs containing unsafe code fragments, and thereby leave the client vulnerable to code-injection attacks.

Recommendation

Properly document library functions that construct code from unsanitized inputs, or avoid constructing code in the first place.

Example

The following example shows two methods implemented using eval: a simple deserialization routine and a getter method. If untrusted inputs are used with these methods, then an attacker might be able to execute arbitrary code on the system.

module MyLib
    def unsafeDeserialize(value)
        eval("foo = #{value}")
        foo
    end

    def unsafeGetter(obj, path)
        eval("obj.#{path}")
    end
end

To avoid this problem, either properly document that the function is potentially unsafe, or use an alternative solution such as JSON.parse or another library that does not allow arbitrary code to be executed.

require 'json'

module MyLib
    def safeDeserialize(value)
        JSON.parse(value)
    end

    def safeGetter(obj, path)
        obj.dig(*path.split("."))
    end
end

Example

As another example, consider the below code which dynamically constructs a class that has a getter method with a custom name.

require 'json'

module BadMakeGetter
  # Makes a class with a method named `getter_name` that returns `val`
  def self.define_getter_class(getter_name, val)
    new_class = Class.new
    new_class.module_eval <<-END
      def #{getter_name}
        #{JSON.dump(val)}
      end
    END
    new_class
  end
end

one = BadMakeGetter.define_getter_class(:one, "foo")
puts "One is #{one.new.one}"

The example dynamically constructs a string which is then executed using module_eval. This code will break if the specified name is not a valid Ruby identifier, and if the value is controlled by an attacker, then this could lead to code-injection.

A more robust implementation, that is also immune to code-injection, can be made by using module_eval with a block and using define_method to define the getter method.

# Uses `define_method` instead of constructing a string
module GoodMakeGetter
  def self.define_getter_class(getter_name, val)
    new_class = Class.new
    new_class.module_eval do
      define_method(getter_name) { val }
    end
    new_class
  end
end

two = GoodMakeGetter.define_getter_class(:two, "bar")
puts "Two is #{two.new.two}"

Example

This example dynamically registers a method on another class which forwards its arguments to a target object. This approach uses module_eval and string interpolation to construct class variables and methods.

module Invoker
  def attach(klass, name, target)
    klass.module_eval <<-CODE
      @@#{name} = target

      def #{name}(*args)
        @@#{name}.#{name}(*args)
      end
    CODE
  end
end

A safer approach is to use class_variable_set and class_variable_get along with define_method. String interpolation is still used to construct the class variable name, but this is safe because class_variable_set is not susceptible to code-injection.

send is used to dynamically call the method specified by name. This is a more robust alternative than the previous example, because it does not allow arbitrary code to be executed, but it does still allow for any method to be called on the target object.

module Invoker
  def attach(klass, name, target)
    var = :"@@#{name}"
    klass.class_variable_set(var, target)
    klass.define_method(name) do |*args|
      self.class.class_variable_get(var).send(name, *args)
    end
  end
end

References

ruby/ql/src/queries/security/cwe-1333/PolynomialReDoS.qhelp

Polynomial regular expression used on uncontrolled data

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Note that Ruby 3.2 and later have implemented a caching mechanism that completely eliminates the worst-case time complexity for the regular expressions flagged by this query. The regular expressions flagged by this query are therefore only problematic for Ruby versions prior to 3.2.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this use of a regular expression, which removes all leading and trailing whitespace in a string:

			text.gsub!(/^\s+|\s+$/, '') # BAD
		

The sub-expression "\s+$" will match the whitespace characters in text from left to right, but it can start matching anywhere within a whitespace sequence. This is problematic for strings that do not end with a whitespace character. Such a string will force the regular expression engine to process each whitespace sequence once per whitespace character in the sequence.

This ultimately means that the time cost of trimming a string is quadratic in the length of the string. So a string like "a b" will take milliseconds to process, but a similar string with a million spaces instead of just one will take several minutes.

Avoid this problem by rewriting the regular expression to not contain the ambiguity about when to start matching whitespace sequences. For instance, by using a negative look-behind (/^\s+|(?<!\s)\s+$/), or just by using the built-in strip method (text.strip!).

Note that the sub-expression "^\s+" is not problematic as the ^ anchor restricts when that sub-expression can start matching, and as the regular expression engine matches from left to right.

Example

As a similar, but slightly subtler problem, consider the regular expression that matches lines with numbers, possibly written using scientific notation:

			/^0\.\d+E?\d+$/ # BAD
		

The problem with this regular expression is in the sub-expression \d+E?\d+ because the second \d+ can start matching digits anywhere after the first match of the first \d+ if there is no E in the input string.

This is problematic for strings that do not end with a digit. Such a string will force the regular expression engine to process each digit sequence once per digit in the sequence, again leading to a quadratic time complexity.

To make the processing faster, the regular expression should be rewritten such that the two \d+ sub-expressions do not have overlapping matches: /^0\.\d+(E\d+)?$/.

References

ruby/ql/src/queries/security/cwe-1333/ReDoS.qhelp

Inefficient regular expression

Some regular expressions take a long time to match certain input strings to the point where the time it takes to match a string of length n is proportional to nk or even 2n. Such regular expressions can negatively affect performance, or even allow a malicious user to perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular expression to match.

The regular expression engine used by the Ruby interpreter (MRI) uses backtracking non-deterministic finite automata to implement regular expression matching. While this approach is space-efficient and allows supporting advanced features like capture groups, it is not time-efficient in general. The worst-case time complexity of such an automaton can be polynomial or even exponential, meaning that for strings of a certain shape, increasing the input length by ten characters may make the automaton about 1000 times slower.

Note that Ruby 3.2 and later have implemented a caching mechanism that completely eliminates the worst-case time complexity for the regular expressions flagged by this query. The regular expressions flagged by this query are therefore only problematic for Ruby versions prior to 3.2.

Typically, a regular expression is affected by this problem if it contains a repetition of the form r* or r+ where the sub-expression r is ambiguous in the sense that it can match some string in multiple ways. More information about the precise circumstances can be found in the references.

Recommendation

Modify the regular expression to remove the ambiguity, or ensure that the strings matched with the regular expression are short enough that the time-complexity does not matter.

Example

Consider this regular expression:

      /^_(__|.)+_$/
    

Its sub-expression "(__|.)+?" can match the string "__" either by the first alternative "__" to the left of the "|" operator, or by two repetitions of the second alternative "." to the right. Thus, a string consisting of an odd number of underscores followed by some other character will cause the regular expression engine to run for an exponential amount of time before rejecting the input.

This problem can be avoided by rewriting the regular expression to remove the ambiguity between the two branches of the alternative inside the repetition:

      /^_(__|[^_])+_$/
    

References

ruby/ql/src/queries/security/cwe-209/StackTraceExposure.qhelp

Information exposure through an exception

Software developers often add stack traces to error messages, as a debugging aid. Whenever that error message occurs for an end user, the developer can use the stack trace to help identify how to fix the problem. In particular, stack traces can tell the developer more about the sequence of events that led to a failure, as opposed to merely the final state of the software when the error occurred.

Unfortunately, the same information can be useful to an attacker. The sequence of class or method names in a stack trace can reveal the structure of the application as well as any internal components it relies on. Furthermore, the error message at the top of a stack trace can include information such as server-side file names and SQL code that the application relies on, allowing an attacker to fine-tune a subsequent injection attack.

Recommendation

Send the user a more generic error message that reveals less information. Either suppress the stack trace entirely, or log it only on the server.

Example

In the following example, an exception is handled in two different ways. In the first version, labeled BAD, the exception is exposed to the remote user by rendering it as an HTTP response. As such, the user is able to see a detailed stack trace, which may contain sensitive information. In the second version, the error message is logged only on the server, and a generic error message is displayed to the user. That way, the developers can still access and use the error log, but remote users will not see the information.

class UsersController < ApplicationController

  def update_bad(id)
    do_computation()
  rescue => e
    # BAD
    render body: e.backtrace, content_type: "text/plain"
  end

  def update_good(id)
    do_computation()
  rescue => e
    # GOOD
    logger.error e.backtrace
    render body: "Computation failed", content_type: "text/plain"
  end

end

References

swift/ql/src/queries/Security/CWE-022/PathInjection.qhelp

Uncontrolled data used in path expression

Accessing paths controlled by users can expose resources to attackers.

Paths that are naively constructed from data controlled by a user may contain unexpected special characters, such as ... Such a path could point to any directory on the file system.

Recommendation

Validate user input before using it to construct a file path. Ideally, follow these rules:

  • Do not allow more than a single . character.
  • Do not allow directory separators such as / or \ (depending on the file system).
  • Do not rely on simply replacing problematic sequences such as ../. For example, after applying this filter to .../...// the resulting string would still be ../.
  • Use a whitelist of known good patterns.

Example

The following code shows two bad examples.

let fm = FileManager.default
let path = try String(contentsOf: URL(string: "http://example.com/")!)

// BAD
return fm.contents(atPath: path)

// BAD
if (path.hasPrefix(NSHomeDirectory() + "/Library/Caches")) {
    return fm.contents(atPath: path)
}

In the first, a file name is read from an HTTP request and then used to access a file. In this case, a malicious response could include a file name that is an absolute path, such as "/Applications/(current_application)/Documents/sensitive.data".

In the second bad example, it appears that the user is restricted to opening a file within the "/Library/Caches" home directory. In this case, a malicious response could contain a file name containing special characters. For example, the string "../../Documents/sensitive.data" will result in the code reading the file located at "/Applications/(current_application)/Library/Caches/../../Documents/sensitive.data", which contains users' sensitive data. This file may then be made accessible to an attacker, giving them access to all this data.

In the following (good) example, the path used to access the file system is normalized before being checked against a known prefix. This ensures that regardless of the user input, the resulting path is safe.

let fm = FileManager.default
let path = try String(contentsOf: URL(string: "http://example.com/")!)

// GOOD
let filePath = FilePath(stringLiteral: path)
if (filePath.lexicallyNormalized().starts(with: FilePath(stringLiteral: NSHomeDirectory() + "/Library/Caches"))) {
    return fm.contents(atPath: path)
}

References

swift/ql/src/queries/Security/CWE-134/UncontrolledFormatString.qhelp

Uncontrolled format string

Passing untrusted format strings to functions that use printf style formatting can lead to buffer overflows and data representation problems. An attacker may be able to exploit this weakness to crash the program or obtain sensitive information from its internal state.

Recommendation

Use a constant string literal for the format string to prevent the possibility of data flow from an untrusted source. This also helps to prevent errors where the format arguments do not match the format string.

If the format string cannot be constant, ensure that it comes from a secure data source or is compiled into the source code. If you need to include a string value from the user, use an appropriate specifier (such as %@) in the format string and include the user provided value as a format argument.

Example

In this example, the format string includes a user-controlled inputString:


print(String(format: "User input: " + inputString)) // vulnerable

To fix it, make inputString a format argument rather than part of the format string, as in the following code:


print(String(format: "User input: %@", inputString)) // fixed

References

swift/ql/src/queries/Security/CWE-312/CleartextLogging.qhelp

Cleartext logging of sensitive information

Attackers could gain access to sensitive information that is logged unencrypted.

Recommendation

Always make sure to encrypt or obfuscate sensitive information before you log it.

Generally, you should decrypt sensitive information only at the point where it is necessary for it to be used in cleartext.

Be aware that external processes often store the standard output and standard error streams of the application. This will include logged sensitive information.

Example

The following example code logs user credentials (in this case, their password) in plaintext:

let password = "P@ssw0rd"
NSLog("User password changed to \(password)")

Instead, you should encrypt or obfuscate the credentials, or omit them entirely:

let password = "P@ssw0rd"
NSLog("User password changed")

References

  • M. Dowd, J. McDonald and J. Schuhm, The Art of Software Security Assessment, 1st Edition, Chapter 2 - 'Common Vulnerabilities of Encryption', p. 43. Addison Wesley, 2006.
  • M. Howard and D. LeBlanc, Writing Secure Code, 2nd Edition, Chapter 9 - 'Protecting Secret Data', p. 299. Microsoft, 2002.
  • OWASP: Password Plaintext Storage.
  • Common Weakness Enumeration: CWE-312.
  • Common Weakness Enumeration: CWE-359.
  • Common Weakness Enumeration: CWE-532.
swift/ql/src/queries/Security/CWE-321/HardcodedEncryptionKey.qhelp

Hard-coded encryption key

Hardcoded keys should not be used for creating encryption ciphers. Data encrypted using hardcoded keys are more vulnerable to the possibility of recovering them.

Recommendation

Use randomly generated key material to initialize the encryption cipher.

Example

The following example shows a few cases of instantiating a cipher with various encryption keys. In the 'BAD' cases, the key material is hardcoded, making the encrypted data vulnerable to recovery. In the 'GOOD' cases, the key material is randomly generated and not hardcoded, which protects the encrypted data against recovery.


func encrypt(padding : Padding) {
	// ...

	// BAD: Using hardcoded keys for encryption
	let key: Array<UInt8> = [0x2a, 0x3a, 0x80, 0x05]
	let keyString = "this is a constant string"
	let ivString = getRandomIV()
	_ = try AES(key: key, blockMode: CBC(), padding: padding)
	_ = try AES(key: keyString, iv: ivString)
	_ = try Blowfish(key: key, blockMode: CBC(), padding: padding)
	_ = try Blowfish(key: keyString, iv: ivString)


	// GOOD: Using randomly generated keys for encryption
	let key = (0..<10).map({ _ in UInt8.random(in: 0...UInt8.max) })
	let keyString = String(cString: key)
	let ivString = getRandomIV()
	_ = try AES(key: key, blockMode: CBC(), padding: padding)
	_ = try AES(key: keyString, iv: ivString)
	_ = try Blowfish(key: key, blockMode: CBC(), padding: padding)
	_ = try Blowfish(key: keyString, iv: ivString)

	// ...
}

References

swift/ql/src/queries/Security/CWE-943/PredicateInjection.qhelp

Predicate built from user-controlled sources

Predicates represent logical conditions that can be used to check whether an object matches them. If a predicate is built from user-provided data without sufficient sanitization, an attacker may be able to change the overall meaning of the predicate.

Recommendation

When building a predicate from untrusted data, you should either pass it to the appropriate arguments parameter during initialization, or as an array of substitution variables before evaluation. You should not append or concatenate it to the body of the predicate.

Example

In the following insecure example, NSPredicate is built directly from data obtained from an HTTP request. This is untrusted, and can be arbitrarily set by an attacker to alter the meaning of the predicate:

let remoteString = try String(contentsOf: URL(string: "https://example.com/")!)

let filenames: [String] = ["img1.png", "img2.png", "img3.png", "img.txt", "img.csv"]

let predicate = NSPredicate(format: "SELF LIKE \(remoteString)")
let filtered = filenames.filter(){ filename in
    predicate.evaluate(with: filename)
}
print(filtered)

A better way to do this is to use the arguments parameter of NSPredicate's constructor. This prevents attackers from altering the meaning of the predicate, even if they control the externally obtained data, as seen in the following secure example:

let remoteString = try String(contentsOf: URL(string: "https://example.com/")!)

let filenames: [String] = ["img1.png", "img2.png", "img3.png", "img.txt", "img.csv"]

let predicate = NSPredicate(format: "SELF LIKE %@", remoteString)
let filtered = filenames.filter(){ filename in
    predicate.evaluate(with: filename)
}
print(filtered)

References

@changeset-bot
Copy link

changeset-bot bot commented Jan 16, 2023

⚠️ No Changeset found

Latest commit: ff36d19

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions github-actions bot removed the Ruby label Jan 23, 2023
owen-mc and others added 30 commits February 24, 2025 23:39
Java: Add XSS Sanitizer for `HttpServletResponse.setContentType` with safe values
This is a result of the commit "SSA: Fix bug in guards for ssa input
nodes."
C#: Proper handling of value tuples in `cs/call-to-object-tostring`.
Rust: Test and model some string and iterator methods
Reported here: #17743

Without this change on the query provided by the user:
```
[2025-02-25 12:42:01] Evaluated non-recursive predicate quickquery::UnrealFunctionAnnotation.annotates/1#dispred#9cd6c269@c668c8tv in 23846ms (size: 20381473).
Evaluated relational algebra for predicate quickquery::UnrealFunctionAnnotation.annotates/1#dispred#9cd6c269@c668c8tv with tuple counts:
                 1   ~0%    {0} r1 = CONSTANT()[]
             27323   ~0%    {2}    | JOIN WITH `Location::Location.getEndLine/0#dispred#83af84ae#bf` CARTESIAN PRODUCT OUTPUT Rhs.0, Rhs.1
        6162566035   ~0%    {4}    | JOIN WITH `Location::Location.getStartLine/0#d54f9e6c` CARTESIAN PRODUCT OUTPUT Lhs.0, Lhs.1, Rhs.0, Rhs.1
                            {4}    | REWRITE WITH TEST InOut.1 < InOut.3
        3894825644   ~5%    {2}    | SCAN OUTPUT In.2, In.0
          73148692   ~0%    {3}    | JOIN WITH fun_decls_40#join_rhs ON FIRST 1 OUTPUT Lhs.1, Lhs.0, Rhs.1
          73148692   ~0%    {4}    | JOIN WITH `Location::Location.getFile/0#dispred#d1f8b5d1` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Lhs.0, Lhs.2
            864579   ~0%    {2}    | JOIN WITH `Location::Location.getFile/0#dispred#d1f8b5d1` ON FIRST 2 OUTPUT Lhs.2, Lhs.3
          13010742   ~1%    {2}    | JOIN WITH macroinvocations_20#join_rhs ON FIRST 1 OUTPUT Rhs.1, Lhs.1
          20653781   ~0%    {3}    | JOIN WITH `Macro::MacroAccess.getOutermostMacroAccess/0#d58b05db_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, _, Lhs.1
          20653781   ~4%    {3}    | REWRITE WITH Out.1 := 1
          20381473   ~8%    {2}    | JOIN WITH macroinvocations_03#join_rhs ON FIRST 2 OUTPUT Lhs.0, Lhs.2
                            return r1
```

With this change:
```
[2025-02-25 12:43:10] Evaluated non-recursive predicate quickquery::UnrealFunctionAnnotation.annotates/1#dispred#9cd6c269@11bf8956 in 928ms (size: 20381473).
Evaluated relational algebra for predicate quickquery::UnrealFunctionAnnotation.annotates/1#dispred#9cd6c269@11bf8956 with tuple counts:
            6873   ~3%    {2} r1 = SCAN fun_decls OUTPUT In.4, In.0
            6857   ~0%    {3}    | JOIN WITH `Location::Location.getStartLine/0#d54f9e6c` ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Rhs.1
            6857   ~2%    {3}    | JOIN WITH `Location::Location.getFile/0#dispred#d1f8b5d1` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
         6193961   ~0%    {3}    | JOIN WITH `Location::Location.getFile/0#dispred#d1f8b5d1_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2
        27389714   ~1%    {4}    | JOIN WITH macroinvocations_20#join_rhs ON FIRST 1 OUTPUT Lhs.0, Lhs.1, Lhs.2, Rhs.1
        27389714   ~1%    {4}    | JOIN WITH locations_default ON FIRST 1 OUTPUT Lhs.1, Lhs.2, Lhs.3, Rhs.4
                          {4}    | REWRITE WITH TEST InOut.3 < InOut.1
        13010742   ~1%    {2}    | SCAN OUTPUT In.2, In.0
        20653781   ~0%    {3}    | JOIN WITH `Macro::MacroAccess.getOutermostMacroAccess/0#d58b05db_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, _, Lhs.1
        20653781   ~4%    {3}    | REWRITE WITH Out.1 := 1
        20381473   ~8%    {2}    | JOIN WITH macroinvocations_03#join_rhs ON FIRST 2 OUTPUT Lhs.0, Lhs.2
                          return r1
```
…cess

Java: StaticInitializationVector with postprocess
Co-authored-by: Asgerf <[email protected]>
Java: Switch BaseSSA to use shared SSA lib.
Ssa: Refactor the data flow integration module
C++: Fix join-order problem with `isBefore`
Rust: Update some inline expectation comments
JS: Support 'response' threat model and @tanstack/react-query
Test: Add support for RelatedLocation tag and use in a JS query
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⤵️ pull C++ documentation Improvements or additions to documentation QL-for-QL
Projects
None yet
Development

Successfully merging this pull request may close these issues.