Aqua Blog

Real-world Cyber Attacks Targeting Data Science Tools

Real-world Cyber Attacks Targeting Data Science Tools

With the accelerated move to the cloud, organizations increasingly rely on large data teams to make data-driven business decisions. In their job, data professionals are given high privileges and access to development and production environments. But what are the security threats that target data tools? And, more importantly, are organizations prepared to deal with these challenges? Our research at Team Nautilus revealed that many data tools are exposed to threats and are being actively attacked in the wild.

We transformed our notions into a series of blogs exploring the threats that face data practitioners, based on actual attacks in the wild, and providing recommendations on how to mitigate these risks.

As part of this research, we discovered the first-ever Python ransomware attack targeting Jupyter Notebooks. In this next post in the series, we detail the attacks on Jupyter Notebooks and other popular open source tools that data practitioners use to analyze and manipulate data.

Who are data practitioners?

Data practitioners collect, manipulate, and analyze the company’s data to guide and empower the business with actionable insights. There’s a large variety of data roles in modern organizations. The top three are data engineers, data scientists, and data analysts.

  • Data engineers design, build, and maintain the organization’s data infrastructure, including data acquisition and collection. They integrate data from various sources and make it fit the organization’s data platforms.
  • Data scientists design and build data tools. They apply complex math and modeling to build advanced data tools to perform large-scale analytics tasks.
  • Data analysts perform data analysis and report the results to the stakeholders in the organization. In their day-to-day role, they maintain dashboards, generate reports, and prepare data visualizations.

To do their job, data practitioners usually require high-privileged access to the company’s databases and computing resources. Therefore, they get exposed to cyber risks that can cause significant damage to their organization.

What are data science tools?

Data practitioners are responsible for obtaining, cleaning, storing, preprocessing of data, and building dashboards, reports, and other deliverables based on it. To do so, they require knowledge and experience with various tools, frameworks, and programming languages, such as Jupyter notebook, Apache Spark, Hadoop, Airflow DAG, Redis, MySQL, and more. Typically, these tools are used by different kinds of data roles across an organization to deliver different products. However, from a security perspective, they all create an attack surface that can be exploited by adversaries.

Summary of the observed attacks that target data science tools

We’ve seen various attacks that target popular tools such as Jupyter notebooks, a web-based user interface to work with data, write and execute code, and visualize the results.

Most of the attacks got initial access via misconfigured environments. After gaining access, adversaries attempted to achieve persistence by creating a new user in the notebook or adding Secure Shell (SSH) keys. Then, most of the attacks executed a cryptominer, trying to get a quick gain. Below, we analyze these attacks in detail.

Detection methods and mitigation recommendations

There are a few recommendations organizations can follow to mitigate these risks and protect their data tools:

  • Use tokens or another authentication method to control access to data tools.
  • Limit inbound traffic to the application by blocking the internet access completely or, if the environment requires internet access, by using network rules or VPN to control inbound traffic. Limiting outbound access is also recommended. For example, in the Aqua platform, you can set network rules to limit access to your resources.
  • Run applications with a non-privileged user or one with limited access.
  • Ensure that all the Jupyter Notebook users are known. You can query the users in an SQLite3 database, which should be found in this path: ./root/.local/share/jupyter/nbsignatures.db
    We also recommend to look for SSH authorized keys files to find any unknown users or keys.
  • Monitor the running processes on the host to detect suspicious processes or cryptominers that hijack resources, which will show up in a high CPU usage.

For instance, in the screenshot above, you can see that processes Python 2.7 and Python 3 are consuming a very high CPU. In fact, these are cryptominers. Normally, Python processes don’t consume high CPU for an extended period of time — or at least shouldn’t. If this is happening for a long time, it could indicate a cryptominer if the environment was compromised. This might slow the machine down, disrupt its operation, and cause high cloud bills.

Similarly, the process xmr below indicates Monero cryptomining activity since it’s denoted as XMR crypto coin.

You can check the running processes manually, but this approach isn’t feasible or scalable. In this case, you can monitor events on the host by using Tracee, an open source runtime security and forensics tool for Linux, built to address common Linux security issues. On GitHub, you can find Tracee-eBPF, a Linux tracing and forensics tool based on eBPF, and Tracee-rules, a runtime security detection engine that helps detect malicious events.

For example, when using Tracee to detect the reverse shell attack described below, you can see the data collected below as the process python3 initiated the event socket dup with IPv4 address 172.247.113.170 and port 8854.

data collected as the process python3 initiated the event socket dup with IPv4 address 172.247.113.170 and port 8854.

Tracee-rules detected this event as a reverse shell over socket:

Tracee-rules detected this event as a reverse shell over socket

Reverse shells are used to bypass security mechanisms, such as firewalls, to allow an attacker to gain access to a target machine. In this case, the target machine is the one that initiates the connection to the attacker’s machine, while the attacker’s machine is set to listen for incoming connections on a specified port. In the attack example, a Jupyter notebook was compromised, and the adversaries executed a remote code that initiates a connection to their machine, and thus gained better control over the target.

To detect suspicious or malicious events as they occur, a better solution for large teams could be Aqua’s Cloud Native Detection and Response (CNDR), which allows the detection and prevention of attacks in runtime. Read more about CNDR’s detection capabilities and how CNDR stopped a DeamBus botnet attack.

Attacks in the wild that target data tools

Jupyter Notebook

Jupyter Notebook is an open source web application that allows data practitioners to create and share documents. Notebook integrates live code, equations, computational output, visualizations, and other multimedia resources, along with explanatory text in a single document.

When we queried Shodan, a search engine for internet-connected devices, we found 10,000 visible Jupyter notebooks that can be reached through the internet. While most of them require authentication, we found 70 that were easily accessible without any authentication requirements.

Once we put their IP address and port in our browser, the Jupyter notebooks allowed full visibility and control over the host. Anyone can see the files in the active directory or download files from a remote source:

Anyone can see the files in the active directory or download files from a remote source:

We created a Jupyter Notebook honeypot, listening on the default port 8888, and documented the attacks. Below are highlights of the attacks we saw.

Complex attack by TeamTNT

We found an active botnet by TeamTNT attacking Jupyter notebooks as described in the graphic below:

active botnet by TeamTNT attacking Jupyter notebooks

1 – Detecting exposed Jupyter notebooks and running a script in terminal

2 – Downloading from remote C2 server a script that downloads the attack components

3 – The file m8priv:

3.1 Killing competing cryptominer processes on the host

3.2 Installing a cryptominer

4 – The file ldm:

4.1 Creating a backdoor with SSH access to the target

4.2 Killing competing cryptominer processes on the host

4.3 Downloading from remote source using TOR exit node

4.4 Creating persistency with a cron job on the target

4.5 Installing additional applications (including wget, net-tools, and zip)

4.6 Collecting local credentials and access keys

4.7 Downloading additional malicious binaries (1.sh and 3.sh), as can be seen in the figures below:

4.7.1 Pty1: Tsunami malware (MD5: 7d3f686801ae3f90f36aae17f7a66478)

4.7.2 pty2: Tsunami malware (MD5: 6f3d7c01c25decca73f8e7c7d998ff4a)

4.7.3 pty3: Tsunami malware (MD5: 1db40b7e18cf228169d2f5d73bf27ef7)

4.7.4 pty4: Tsunami malware (MD5: 9b22dc965582572dd8f07f452228b18b)

4.7.5 pty5: Tsunami malware (MD5: ff171712ab8816f3d7600fe75bb18052)

4.7.6 pty6: Tsunami malware (MD5: a4f9761b5f9d8804ef4315c5ecec64f6)

4.7.7 pty7: Tsunami malware (MD5: ebd827a6e50766508b87f51d7ce6be5c)

4.7.8 pty8: Tsunami malware (MD5: 71e644015f646f7532c7dd2c3c847364)

4.7.9 pty9: Tsunami malware (MD5: a45599d81cbf25b7bf0968d49c9ced68)

4.7.10 pty10: Tsunami malware (MD5: f5271f6b20fda39c213fd9579ad7e1fb)

4.7.11 pty11: Tsunami malware (MD5: 96364eef5116a5825e16b1c28eecb6b5)

image-8-1

5 – The file ptyw64: This is the Tsunami malware

6 – The file aws2.sh: This script is designed to query the instance’s cloud metadata. Once obtained, the data is exfiltrated into TeamTNT’s C2 server:

Cryptomining attacks

We also observed simple and straightforward attacks, in which attackers manually accessed a Jupyter notebook, downloaded a cryptominer and configurations, and launched a mining process:

Adding credentials to the Jupyter notebook

We’ve also seen attacks in which the adversaries added a secret into the nbsignatures table. This means that even if the user adds a signature, the attacker will still have access with these credentials.

Encrypting files for a ransom

We detected a simple and straightforward attack involving the first Python-based ransomware targeting Jupyter notebooks. The threat actor manually accessed a misconfigured Jupyter notebook, opened a Python file, and copied a code into it. The code gets two arguments, encryption password and a path, from which it traverses over the server file system. In addition, there are two functions: the first goes over the file system, and the second encrypts everything and deletes any unencrypted files.

Advanced attack tools: Cobalt Strike

Cobalt Strike is a powerful commercial offensive security tool, originally developed for ethical hacking. In reality, it’s also used by cybercriminals. This framework offers many useful tools aimed at conducting network attacks, social engineering, and binary and code on-the-fly deployment mechanisms.

During our research, we’ve discovered an attack involving Cobalt Strike. The adversary accessed the server via Jupyter notebook and downloaded the file CrossC2-test, a small payload containing malware with backdoor capabilities that is hard to trace and detect.

In addition, another binary file was downloaded to /tmp, which is a packed Cobalt Strike payload (MD5= d9c9c6777932a6c627a9dd34e1932efb). Cobalt Strike is a powerful tool that attackers can use to gain backdoor access, explore the server, get root privileges, and more.

Exploiting vulnerabilities to gain higher privileges

In another case, we’ve seen an attempt to exploit two recent vulnerabilities for privilege escalation. By compromising an exposed Jupyter notebook, the attackers gained access to the server and actively attempted to elevate the privileges to root. Then, they used exploits from GitHub to take advantage of the sudoedit vulnerability (CVE-2021-3165) and the Dirty Pipe vulnerability (CVE-2022-0847).

Attackers seek root privileges on the server to achieve more control over the target environment and expand the blast radius of an attack. To the best of our knowledge, this is the first-ever exploitation of the Dirty Pipe vulnerability seen in the wild.

Reverse shell attacks

We’ve seen attacks that include the launch of a new notebook and running commands that create a reverse shell to the host:

JupyterLab

JupyterLab is a web-based interactive development environment for notebooks, code, and data. It provides access to the Linux terminal and Jupyter Notebook.

We queried Shodan and found more than 70 JupyterLab instances exposed to the internet. Most of them didn’t require any authentication and were already infected with malware. We created several honeypots that allowed access to JupyterLab instances. Below are some samples of the attacks we saw in the wild.

In the screenshot above, you can see an attack that created a Jupyter notebook with the script spam on the JupyterLab platform, downloaded xmrig from GitHub, and executed it on the instance.

In the screenshot above, you can see two attacks. In one of them, an attacker manually accessed the terminal and downloaded the Python2.7 and 1.json files, which are an xmrig cryptominer and its configuration file.

The second attack is a Mirai malware attack. The file whoareyou.x86 contains the Mirai malware, designed to launch a distributed denial of service (DDoS) attack. Attackers used the following script to download and execute Mirai:

In the screenshot above, you can see an attack that used a shell script to download and execute a cryptominer.

Finally, in the screenshot above, you can see the scope of cryptomining processes running on the attacked JupyterLab instance.

CoCalc

CoCalc is an online collaborative workspace for math and research that offers data science and scientific Python stack, including Jupyter Notebook, R Statistics, and Octave. It also offers a web-based Linux terminal and X11 graphical desktop.

There’s a commercial version that offers hosting and support, and an open source version can be downloaded from GitHub.

By querying Shodan, we saw 66 CoCalc instances exposed to the network, 10 of which were completely exposed and allowed unauthenticated access.

Some of these instances allowed attackers to create an account or log anonymously into an existing account and open a Linux terminal. The privileges were limited, but an attacker might exploit this platform to escalate the privileges and gain further access to the host. However, we haven’t seen any active attacks targeting any of these hosts. Attackers may be less familiar with this platform or less keen to exploit it.

Unauthenticated access allows opening a Linux terminal or an X11 desktop.

Mapping these campaigns to the MITRE ATT&CK framework

Here we map each component of the attacks to the corresponding techniques of the MITRE ATT&CK framework:

MITRE-Attack-Framework-Diagram

Indicators of compromise (IOCs):

IOCs

The 2023 Annual Cloud Native Threat Report
Assaf Morag
Assaf is a Lead Data Analyst at Aqua Nautilus research team, he focuses on supporting the data needs of the team, obtaining threat intelligence and helping Aqua and the industry stay at the forefront of new threats and methodologies for protection. His work has been published in leading info security publications and journals across the globe, and most recently he contributed to the new MITRE ATT&CK Container Framework.