Fix unsafe pickle deserialization and CI workflow unauthorized execution by liu-shaojun · Pull Request #5236 · intel/BigDL

liu-shaojun · 2026-04-20T04:29:32Z

Description

Security fixes for two vulnerabilities in BigDL: unsafe pickle deserialization across distributed training workers, and unauthorized code execution via CI workflow on self-hosted runner.

1. Why the change?

Vulnerability 1 — Unsafe Pickle Deserialization (CVE-worthy)

All four distributed training backends (MPI, Horovod, PyTorch DDP, TensorFlow) accept CLI-controlled file paths and immediately deserialize pickle files using cloudpickle.load() without any integrity verification. In distributed deployments, a single malicious pickle payload achieves cluster-wide arbitrary code execution.

Affected worker scripts:

python/orca/src/bigdl/orca/learn/mpi/mpi_train.py
python/nano/src/bigdl/nano/deps/horovod/horovod_worker.py
python/nano/src/bigdl/nano/pytorch/strategies/worker.py
python/nano/src/bigdl/nano/utils/tf/subprocess_worker.py

Vulnerability 2 — CI Workflow Unauthorized Code Execution

The performance-regression-test.yml workflow allows any GitHub user to trigger code execution on the self-hosted runner (Rohan) by commenting APRT on any PR. No author_association check exists. Additionally, the pull_request trigger lets fork PRs override the workflow definition itself.

2. User API changes

No user-facing API changes. SafePickle.dump() and SafePickle.load() maintain the same interface but now always perform HMAC integrity verification. The HMAC key is auto-generated per session and propagated to subprocesses via BIGDL_SAFE_PICKLE_KEY environment variable.

3. Summary of the change

Pickle deserialization fix:

Replace all cloudpickle.load()/cloudpickle.dump() calls in worker scripts and their callers with SafePickle.load()/SafePickle.dump()
Upgrade SafePickle (in nano, ppml, friesian packages) from hardcoded b'shared-key' + SHA1 to per-session random 256-bit key + HMAC-SHA256
dump() now auto-generates .sig sidecar files containing the HMAC digest
load() mandates signature verification — rejects unsigned or tampered pickle files
Use timing-safe hmac.compare_digest() to prevent timing attacks
Propagate HMAC key to MPI workers via mpiexec -genv and to subprocesses via inherited environment
SCP .sig files alongside .pkl files for remote MPI nodes

CI workflow fix:

Remove the pull_request trigger that allowed fork PRs to override the workflow definition on the self-hosted runner
Add author_association guard (MEMBER, OWNER, or COLLABORATOR) to the issue_comment trigger

4. How to test?

The pickle deserialization fix can be verified by attempting to load an unsigned or tampered pickle file — SafePickle.load() will raise an error
The CI workflow fix can be verified by checking that the if condition now requires author_association membership
Unit test
Application test

5. New dependencies

No new dependencies introduced. All changes use Python standard library modules (hmac, hashlib, os) that were already imported.

Replace raw cloudpickle.load() with HMAC-verified SafePickle across all four distributed training backends (MPI, Horovod, PyTorch DDP, TensorFlow). The vulnerability allowed arbitrary code execution via crafted pickle files passed through CLI-controlled paths. Worker scripts deserialized these files without integrity verification, enabling cluster-wide RCE in distributed deployments. Changes: - Upgrade SafePickle to use per-session random HMAC-SHA256 keys instead of the hardcoded b'shared-key' with SHA1 - SafePickle.dump() now auto-generates .sig sidecar files for integrity - SafePickle.load() mandates signature verification, rejecting unsigned or tampered pickle files - Use timing-safe hmac.compare_digest() to prevent timing attacks - Propagate HMAC key to MPI workers via mpiexec -genv and to subprocesses via inherited environment - SCP .sig files alongside .pkl files for remote MPI nodes Affected files: - mpi_train.py, mpi_estimator.py, mpi_runner.py (MPI training) - horovod_worker.py, multiprocs_backend.py (Horovod) - worker.py, ddp_subprocess.py (PyTorch DDP) - subprocess_worker.py, backend.py (TensorFlow) - safepickle.py in nano, ppml, and friesian packages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…unner The performance-regression-test workflow had two vulnerabilities that allowed any GitHub user to execute arbitrary code on Intel's self-hosted CI runner (Rohan): 1. The issue_comment trigger had no author_association check, so anyone could comment 'APRT' on any PR to trigger benchmark execution of fork PR code on the self-hosted runner. 2. The pull_request trigger caused the workflow to run the PR author's version of the workflow file on the self-hosted runner, giving full control over the execution environment. Fix: - Remove the pull_request trigger entirely — it allowed fork PRs to override the workflow definition itself - Add author_association guard requiring MEMBER, OWNER, or COLLABORATOR status to trigger via issue_comment - Simplify the job-level condition to only handle the comment-based flow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

liu-shaojun and others added 2 commits April 20, 2026 02:04

liu-shaojun requested a review from glorysdj as a code owner April 20, 2026 04:29

liu-shaojun changed the title ~~Fix/unsafe pickle deserialization~~ Fix unsafe pickle deserialization and CI workflow unauthorized execution Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unsafe pickle deserialization and CI workflow unauthorized execution#5236

Fix unsafe pickle deserialization and CI workflow unauthorized execution#5236
liu-shaojun wants to merge 2 commits intomainfrom
fix/unsafe-pickle-deserialization

liu-shaojun commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liu-shaojun commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

5. New dependencies

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

liu-shaojun commented Apr 20, 2026 •

edited

Loading