Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library

TL;DR Highlight

PyTorch Lightning packages 2.6.2 and 2.6.3 delivered credential-stealing malware via a supply chain attack.

Who Should Read

AI/ML developers and MLOps engineers building model training pipelines with PyTorch Lightning or managing Python dependencies for ML projects.

Core Mechanics

Versions 2.6.2 and 2.6.3 of the 'lightning' package, distributed on PyPI, were compromised by a supply chain attack on April 30, 2024. Supply chain attacks target users by injecting malicious code into the software distribution process.
The malware is hidden as an obfuscated JavaScript payload within a '_runtime' directory inside the package, and a single `pip install lightning` command is enough to establish infection.
The malware executes not just during installation, but when the package is imported, bypassing existing defenses that only scan installation scripts.
Because this package is used in diverse ML workloads—image classification, LLM fine-tuning, diffusion models, time series prediction—it may be present somewhere in your dependency tree even if you didn't install it directly.
The malware is themed around 'Shai-Hulud,' the giant sandworm from the Dune novels, and the GitHub search query 'A Mini Shai-Hulud has Appeared' spawned over 2,200 repositories within a day.
The Lightning-AI team is investigating community reports and recommends downgrading to version 2.6.1 until 2.6.4 is released.
The compromised version 2.6.2 was also found in the nixpkgs unstable channel, impacting NixOS users.
A GitHub issue referencing a blocked 2.6.2 release due to 'internal reasons' surfaced on April 20th, raising questions about whether the community was aware of the issue earlier.

Evidence

"Recent observations on HN noted a surge in supply chain attack reports, drawing parallels to the left-pad incident from a decade ago. The concern is that attack success rates and value have increased while detection tools remain difficult for non-experts to use. AI coding assistants like Claude Code recommending `pip install` commands without scrutiny raise concerns, as models are trained on data months old and cannot know about current package compromises. The ML ecosystem has far more third-party dependencies than web frontends and is in an early 'wild west' state regarding security practices, exemplified by the ability to execute arbitrary code when loading Python pickle files. The Lightning-AI team officially responded in the comments, recommending the use of 2.6.1 until 2.6.4 is available and sharing a link to their security advisory. Questions remain about how the package was compromised (PR approval path, mirror server hack) and what the stolen AWS credentials are being used for (crypto mining, ransomware, etc.)."

How to Apply

"If you are currently using the lightning package, immediately check the installed version with `pip show lightning` or `pip list`. If it's 2.6.2 or 2.6.3, downgrade to `pip install lightning==2.6.1`. If the lightning version in your `requirements.txt` or `pyproject.toml` is not fixed and uses a range like `>=2.6.0`, your CI/CD pipeline may have automatically installed the compromised version, so re-examine your deployment logs and environment. To automate dependency security checks for ML projects, add tools like Semgrep Supply Chain or pip-audit to your CI pipeline to scan for malicious packages before installation. If you are using the nixpkgs unstable channel and have installed lightning, switch to the nixpkgs stable channel or manually pin the version."

Code Example

snippet

# Check the currently installed version
pip show lightning

# Downgrade immediately if using the compromised version (2.6.2, 2.6.3)
pip install lightning==2.6.1

# Example of version pinning in requirements.txt
lightning==2.6.1  # Versions 2.6.2 and 2.6.3 were compromised by a supply chain attack

# Check dependencies for vulnerabilities with pip-audit
pip install pip-audit
pip-audit

Terminology

Supply Chain AttackAn attack that compromises software not by directly hacking it, but by compromising its dependencies—libraries or build tools—indirectly affecting all users, similar to poisoning a food supply.

PyPIThe official Python package repository from which packages are downloaded using the `pip install` command.

ObfuscationA technique for intentionally making code complex and difficult to analyze, often used by malware to evade detection.

PayloadThe core part of malware that performs the malicious action. Think of it as the explosive inside a delivery box, not the box itself.

PickleA Python serialization format for saving and loading objects to files. Commonly used for ML model deployment, but insecure as it can execute arbitrary code upon loading.

nixpkgsThe package collection used by the NixOS operating system. The unstable channel provides the latest versions but is less thoroughly tested, potentially including compromised versions.