Docker 101
Table of Contents
- Why Docker for Data Science?
- Core Concepts
- Installation & Setup
- Dockerfile for ML Projects
- Working with Images & Containers
- Data & Model Management
- GPU Support (CUDA)
- Multi-Stage Builds
- Docker Compose for ML Services
- Model Serving with Docker
- Private Registries & Sharing
- Best Practices
- Common Pitfalls
- Cheat Sheet
1. Why Docker for Data Science?
The Problem
Every data scientist knows this workflow:
# New laptop / colleague's machine / cloud VM
git clone https://github.com/team/project.git
pip install -r requirements.txt
# ... next morning: 47 conflicts, 3 broken packages, CUDA mismatchYou’re fighting environment drift: Python version, CUDA toolkit, system libraries, incompatible transitive deps, OS differences.
The Solution
Docker packages your entire environment — OS, system libraries, Python version, CUDA, pip packages, and code — into a single immutable unit called an image. Anyone can spin up an identical copy (a container) on any machine.
Concrete Benefits for AI/ML
| Problem | Docker Solution |
|---|---|
| “It works on my machine” | Same image → identical behavior everywhere |
| CUDA/cuDNN version hell | Pre-built CUDA images with pinned versions |
| Jupyter setup per project | One-liner docker run -p 8888:8888 |
| Model serving dependencies | Immutable deployment artifact |
| Team onboarding | Single docker pull instead of hours of setup |
| CI/CD for ML | Test and deploy in the exact same environment |
| Reproducible research | Dockerfile + requirements.txt = executable paper |
2. Core Concepts
Images vs Containers
┌─────────────────────────────────────────────┐
│ DOCKER │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ IMAGE │ run │ CONTAINER │ │
│ │ (blueprint) │─────▶│ (running) │ │
│ │ │ │ │ │
│ │ Read-only │ │ Read-write │ │
│ │ filesystem │ │ layer on │ │
│ │ + metadata │ │ top of img │ │
│ └──────────────┘ └──────────────┘ │
│ ▲ │ │
│ │ build │ commit │
│ │ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Dockerfile │ │ New Image │ │
│ │ (recipe) │ │ (persisted) │ │
│ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────┘
- Image: Immutable snapshot (≈ 1-10 GB). Think of it as a VM template.
- Container: A running instance of an image. You can have many containers from one image.
- Dockerfile: The recipe to build an image. Text file with instructions.
- Registry: Storage for images. Docker Hub is the default public registry.
- Volume: Persistent data storage that outlives containers.
- Layer: Each instruction in a Dockerfile creates a cached layer. This makes rebuilds fast.
The Layering System
FROM python:3.11-slim # Layer 1: ~120 MB (cached)
RUN apt-get update && apt-get install -y ... # Layer 2: ~50 MB (cached)
COPY requirements.txt . # Layer 3: ~1 KB (cached if file unchanged)
RUN pip install -r req... # Layer 4: ~200 MB (cached if req.txt unchanged)
COPY . /app # Layer 5: ~5 MB (invalidated on code change)Docker caches each layer. If you only change code (Layer 5), only Layer 5 rebuilds. If you add a dependency, Layer 3 onward rebuilds. This makes development iteration fast.
3. Installation & Setup
macOS
# Download Docker Desktop from https://www.docker.com/products/docker-desktop/
# Or use Homebrew:
brew install --cask docker
# Open Docker.app to start the daemonLinux
# Ubuntu / Debian
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER # Run docker without sudo (log out & back in)
newgrp docker # Or just use this to refresh group
# Verify
docker --version
docker run hello-worldWindows
Use Docker Desktop with WSL 2 backend. Install WSL 2 first, then Docker Desktop.
Post-Installation Check
docker info # System-wide info
docker version # Client + server versions
docker run --rm hello-world # Verify it worksNVIDIA Container Toolkit (GPU support)
# For any OS with Docker installed:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee / /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# Verify GPU access
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi4. Dockerfile for ML Projects
Minimal Python ML Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "train.py"]Build: docker build -t my-ml-project .
Production-Grade ML Dockerfile
# ============================================================
# Production ML Dockerfile with best practices
# ============================================================
# Use specific version tags — never "latest"
FROM python:3.11-slim AS base
# Prevent Python from writing .pyc files
ENV PYTHONDONTWRITEBYTECODE=1
# Ensure Python output is sent straight to terminal (no buffering)
ENV PYTHONUNBUFFERED=1
# Set working directory
WORKDIR /app
# Install system deps needed by many ML libraries
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
libgomp1 \
curl \
&& rm -rf /var/lib/apt/lists/*
# Copy only what's needed for pip install (layer caching optimization)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy the rest of the application
COPY . .
# Non-root user for security
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser
# Metadata
LABEL org.opencontainers.image.source="https://github.com/org/repo"
LABEL org.opencontainers.image.description="ML training pipeline"
# Default command (overridable at runtime)
CMD ["python", "train.py"]requirements.txt for ML
torch==2.1.0
transformers==4.35.0
datasets==2.14.5
scikit-learn==1.3.2
pandas==2.1.3
numpy==1.24.3
matplotlib==3.8.0
tqdm==4.66.1
wandb==0.15.11
Jupyter Dockerfile
FROM python:3.11-slim
WORKDIR /home/jovyan
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt jupyter
EXPOSE 8888
CMD ["jupyter", "notebook", "--ip=0.0.0.0", "--port=8888", "--no-browser", "--allow-root"]# Run Jupyter with port mapping and volume for persistence
docker build -t ml-jupyter .
docker run -it --rm \
-p 8888:8888 \
-v "$(pwd)/notebooks:/home/jovyan/notebooks" \
ml-jupyter5. Working with Images & Containers
Building Images
# Basic build
docker build -t my-ml-image .
# Build with tag
docker build -t my-ml-image:v1.0 .
# Build with no cache (fresh install)
docker build --no-cache -t my-ml-image .
# Build from different Dockerfile
docker build -f Dockerfile.gpu -t my-ml-image:gpu .
# Build with build args
docker build --build-arg CUDA_VERSION=12.1 -t my-ml-image .Running Containers
# Basic run (foreground)
docker run my-ml-image
# Interactive shell
docker run -it my-ml-image /bin/bash
# Run in background (detached)
docker run -d --name ml-training my-ml-image
# Port mapping (host:container)
docker run -p 8888:8888 my-jupyter-image
# Mount a volume (host:container)
docker run -v /path/to/data:/app/data my-ml-image
# Mount with read-only
docker run -v /path/to/data:/app/data:ro my-ml-image
# Set environment variables
docker run -e WANDB_API_KEY=xxx -e CUDA_VISIBLE_DEVICES=0 my-ml-image
# Resource limits
docker run --memory=8g --cpus=4 my-ml-image
# Remove container after it exits (cleanup)
docker run --rm my-ml-image
# GPU access
docker run --gpus all my-ml-image
# All together (typical ML run)
docker run --rm --gpus all \
-v /mnt/data:/data \
-v $(pwd)/checkpoints:/app/checkpoints \
-e WANDB_API_KEY=$WANDB_API_KEY \
-e CUDA_VISIBLE_DEVICES=0,1 \
--memory=32g --cpus=8 \
my-ml-image:latest python train.py --config configs/exp1.yamlManaging Containers & Images
# List running containers
docker ps
# List all containers (including stopped)
docker ps -a
# List images
docker images
# Stop a container
docker stop <container_id_or_name>
# Remove a container
docker rm <container_id_or_name>
# Remove an image
docker rmi <image_id_or_name>
# Remove unused images, containers, networks (cleanup)
docker system prune -a
# View container logs
docker logs -f <container_name> # -f follows output
# Execute command in running container
docker exec -it <container_name> /bin/bash
# Copy files from container
docker cp <container_name>:/app/outputs ./local_outputs
# Copy files into container
docker cp ./config.yaml <container_name>:/app/config.yaml
# Inspect container metadata
docker inspect <container_name>
# View resource usage
docker stats6. Data & Model Management
The Volume Pattern
Containers are ephemeral. When they’re deleted, their filesystem disappears. Use volumes or bind mounts for:
- Input datasets (read-only)
- Model checkpoints / outputs (read-write)
- Configuration files
- Cached Hugging Face datasets and models
# Bind mount (host directory → container path)
docker run -v /absolute/host/path:/container/path ...
# Named volume (managed by Docker)
docker volume create ml-data
docker run -v ml-data:/app/data ...
# Anonymous volume (auto-created, auto-deleted with --rm)
docker run -v /app/data ...Common Volume Layout for ML Projects
project/
├── data/ # input data → mounted read-only
│ ├── raw/
│ └── processed/
├── checkpoints/ # model weights → mounted read-write
├── configs/ # config files → mounted or baked in
├── outputs/ # logs, metrics, predictions
├── Dockerfile
└── train.pydocker run --rm --gpus all \
-v /mnt/data/datasets:/data:ro \
-v $(pwd)/checkpoints:/app/checkpoints \
-v $(pwd)/configs:/app/configs:ro \
-v $(pwd)/outputs:/app/outputs \
my-training-image \
python train.py --config /app/configs/exp1.yamlHugging Face Cache Volumes
# HF datasets and models can be gigabytes — cache them permanently
docker run --rm \
-v huggingface-cache:/root/.cache/huggingface \
-v $(pwd)/data:/data \
my-hf-imageTemporary Data with tmpfs
For data that doesn’t need persistence (intermediate artifacts), use tmpfs — stored in RAM, super fast:
docker run --rm --gpus all \
--tmpfs /app/tmp:size=4G \
my-training-image7. GPU Support (CUDA)
Choosing a Base Image
NVIDIA provides official CUDA base images. Pick the right one for your ML framework:
# PyTorch (includes its own CUDA runtime — use nvidia/cuda as base)
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
# Or build on NVIDIA's official image
FROM nvidia/cuda:12.2.0-cudnn8-runtime-ubuntu22.04
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# TensorFlow
FROM tensorflow/tensorflow:2.13.0-gpu
# JAX
FROM nvidia/cuda:12.2.0-cudnn8-runtime-ubuntu22.04
RUN pip install jax[cuda12_pip]Dockerfile with GPU Support
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
WORKDIR /app
# Install additional packages
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "train.py"]Running GPU Containers
# Single GPU
docker run --gpus all pytorch-image nvidia-smi
# Specific GPU
docker run --gpus '"device=0,1"' pytorch-image python train.py
# All GPUs
docker run --gpus all pytorch-image python train.pyVerify GPU Inside Container
# Inside container: check_gpu.py
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU count: {torch.cuda.device_count()}")
print(f"GPU name: {torch.cuda.get_device_name(0)}")docker run --rm --gpus all pytorch-image python check_gpu.py
# Output: CUDA available: True
# GPU count: 2
# GPU name: Tesla V100-SXM2-32GBCommon CUDA Issues
| Problem | Cause | Fix |
|---|---|---|
CUDA error: no kernel image is available |
CUDA version mismatch | Match image CUDA to driver CUDA |
libcuda.so not found |
--gpus all flag missing |
Add --gpus all |
CUDA driver version insufficient |
Driver too old for image CUDA | Upgrade driver or downgrade image |
| Out of memory | No memory limit | Add --memory=32g or limit CUDA_VISIBLE_DEVICES |
8. Multi-Stage Builds
Keep final images small by separating build-time and runtime dependencies.
ML Training Image (Smaller Final Image)
# ============================================================
# Stage 1: Builder — installs all build dependencies
# ============================================================
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
# Install build tools, compile packages
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& pip install --no-cache-dir --user -r requirements.txt
# ============================================================
# Stage 2: Runtime — minimal image with only what's needed
# ============================================================
FROM python:3.11-slim AS runtime
WORKDIR /app
# Copy only installed Python packages from builder
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH
# Copy application code
COPY . .
# Optional: install runtime-only system packages
RUN apt-get update && apt-get install -y --no-install-recommends \
libgomp1 \
&& rm -rf /var/lib/apt/lists/*
CMD ["python", "train.py"]Model Serving Image (Ultra-Small)
# ============================================================
# Stage 1: Build model artifacts
# ============================================================
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt
# Optional: generate or compile model artifacts
COPY scripts/optimize_model.py .
RUN python optimize_model.py --output /app/optimized_model
# ============================================================
# Stage 2: Production serving
# ============================================================
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
ENV PATH=/root/.local/bin:$PATH
# Copy only what's needed for serving
COPY --from=builder /app/optimized_model ./model
COPY serve.py .
EXPOSE 8000
CMD ["python", "serve.py"]9. Docker Compose for ML Services
Docker Compose orchestrates multi-container setups. Perfect for ML pipelines that need:
- Training service (GPU)
- Database (PostgreSQL for experiment tracking)
- Model registry (MinIO for artifact storage)
- Monitoring (Grafana + Prometheus)
- Jupyter (for exploration)
docker-compose.yml for ML Pipeline
version: "3.8"
services:
# ---- ML Training Service ----
trainer:
build:
context: .
dockerfile: Dockerfile.gpu
image: ml-trainer:latest
runtime: nvidia # GPU access
environment:
- CUDA_VISIBLE_DEVICES=0
- WANDB_API_KEY=${WANDB_API_KEY}
- MLFLOW_TRACKING_URI=http://mlflow:5000
volumes:
- ./data:/data:ro
- ./checkpoints:/app/checkpoints
- ./configs:/app/configs:ro
- ./outputs:/app/outputs
depends_on:
- mlflow
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
# ---- MLflow Tracking Server ----
mlflow:
image: ghcr.io/mlflow/mlflow:v2.7.1
ports:
- "5000:5000"
volumes:
- mlflow-artifacts:/mlflow
command: >
mlflow server
--host 0.0.0.0
--backend-store-uri sqlite:///mlflow/mlflow.db
--default-artifact-root /mlflow/artifacts
# ---- Model Serving API ----
api:
build:
context: .
dockerfile: Dockerfile.serve
image: ml-api:latest
ports:
- "8000:8000"
volumes:
- ./checkpoints:/app/checkpoints:ro
environment:
- MODEL_PATH=/app/checkpoints/best.pt
depends_on:
trainer:
condition: service_completed_successfully
deploy:
replicas: 3 # Scale horizontally
# ---- Jupyter Notebook ----
jupyter:
image: jupyter/datascience-notebook:latest
ports:
- "8888:8888"
volumes:
- ./notebooks:/home/jovyan/work
- ./data:/home/jovyan/data:ro
environment:
- JUPYTER_TOKEN=changeme
# ---- MinIO (S3-compatible storage for artifacts) ----
minio:
image: minio/minio:latest
ports:
- "9000:9000"
- "9001:9001"
volumes:
- minio-data:/data
command: server /data --console-address ":9001"
volumes:
mlflow-artifacts:
minio-data:Running Compose
# Start all services
docker compose up -d
# Start only specific services
docker compose up -d trainer mlflow
# Rebuild and start
docker compose up --build -d
# View logs
docker compose logs -f trainer
# Stop all
docker compose down
# Stop and remove volumes (careful!)
docker compose down -v
# Scale a service
docker compose up -d --scale api=3Compose for Development
# docker-compose.dev.yml — overrides for development
version: "3.8"
services:
trainer:
build:
context: .
dockerfile: Dockerfile.dev
volumes:
- .:/app # Hot-reload: mount source code
command: python -m debugpy --listen 0.0.0.0:5678 train.py
ports:
- "5678:5678" # Debugger port# Combine configs
docker compose -f docker-compose.yml -f docker-compose.dev.yml up -d10. Model Serving with Docker
FastAPI Model Server
FROM python:3.11-slim
WORKDIR /app
COPY requirements-serve.txt .
RUN pip install --no-cache-dir -r requirements-serve.txt
COPY model.py .
COPY serve.py .
EXPOSE 8000
# Use gunicorn + uvicorn for production
CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "serve:app", "--bind", "0.0.0.0:8000", "--workers", "4"]# serve.py
from fastapi import FastAPI
from pydantic import BaseModel
import torch
from model import load_model
app = FastAPI()
model = load_model()
class PredictRequest(BaseModel):
text: str
class PredictResponse(BaseModel):
label: str
confidence: float
@app.post("/predict", response_model=PredictResponse)
async def predict(req: PredictRequest):
result = model.predict(req.text)
return PredictResponse(label=result["label"], confidence=result["confidence"])
@app.get("/health")
async def health():
return {"status": "ok"}# Build and run
docker build -t model-api -f Dockerfile.serve .
docker run --rm -p 8000:8000 model-api
# Test
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "This movie was amazing!"}'ONNX Runtime GPU Serving
FROM mcr.microsoft.com/onnxruntime/server:latest-gpu
WORKDIR /app
COPY model.onnx .
COPY serve.py .
EXPOSE 8001
CMD ["python", "serve.py"]Triton Inference Server
For large-scale deployments, NVIDIA Triton supports multiple frameworks, dynamic batching, and model ensembles:
# Pull Triton image
docker pull nvcr.io/nvidia/tritonserver:23.10-py3
# Run with model repository
docker run --rm --gpus all \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v /path/to/model_repo:/models \
nvcr.io/nvidia/tritonserver:23.10-py3 \
tritonserver --model-repository=/models11. Private Registries & Sharing
Pushing to Docker Hub
docker login -u yourusername
docker tag my-ml-image:latest yourusername/my-ml-image:v1.0
docker push yourusername/my-ml-image:v1.0Pushing to a Private Registry
docker tag my-ml-image:latest registry.example.com/team/my-ml-image:v1.0
docker push registry.example.com/team/my-ml-image:v1.012. Best Practices
Dockerfile Best Practices
Pin all versions — never use
latest# Bad FROM python:latest RUN pip install torch # Good FROM python:3.11-slim RUN pip install torch==2.1.0Optimize layer caching — copy
requirements.txtbefore source codeCOPY requirements.txt . RUN pip install -r requirements.txt COPY . . # Code changes don't invalidate pip layerMinimize layers — combine related RUN commands
RUN apt-get update && apt-get install -y \ pkg1 pkg2 \ && rm -rf /var/lib/apt/lists/*Use
.dockerignoreto exclude unnecessary files__pycache__/ .git/ .env *.pyc .ipynb_checkpoints/ data/ # Don't copy large datasets into images checkpoints/ # Don't copy model weights into images *.tar.gz notebooks/Run as non-root user for security
RUN useradd -m -u 1000 appuser USER appuserKeep images small — use
-slimvariants, multi-stage buildsdocker images | grep my-ml # python:3.11 → ~900 MB # python:3.11-slim → ~120 MB
Runtime Best Practices
- Always use
--rmfor disposable containers to avoid accumulation - Mount data as volumes — never copy datasets into images
- Use environment variables for secrets (API keys, etc.)
- Set resource limits — especially for shared GPU servers
- Log to stdout/stderr — Docker captures these automatically
Security Best Practices
Never bake secrets into images
# Bad: secret baked into image ENV WANDB_API_KEY=abc123 # Good: passed at runtime docker run -e WANDB_API_KEY=$WANDB_API_KEY ...Use Docker Scout or Trivy to scan images for vulnerabilities
docker scout quickview my-ml-image docker scout recommendations my-ml-image
13. Common Pitfalls
Disk Space Bloat
# Check disk usage
docker system df
# Clean unused
docker system prune -a --volumesPrevention: Use .dockerignore, --no-cache-dir in pip, multi-stage builds.
Permissions Issues with Volumes
Symptom: Files created by container owned by root, can’t delete them from host.
Fix: Match UID inside container to host user.
ARG UID=1000
RUN useradd -m -u $UID appuser
USER appuserdocker build --build-arg UID=$(id -u) -t my-ml-image .Network Performance
Issue: Default bridge network has no DNS resolution between containers.
Fix: Use Docker Compose or --network host for high-performance scenarios.
Container Name Conflicts
# Error: conflict — container name already in use
docker run --name trainer my-ml-image
# Fix: remove existing or use --rm
docker rm trainer
# or
docker run --rm --name trainer my-ml-image14. Cheat Sheet
Quick Reference
# Build
docker build -t name:tag . # Build image
docker build --no-cache -t name:tag . # Fresh build
docker build --build-arg VAR=val -t name . # Build with args
# Run
docker run image # Run in foreground
docker run -d --name name image # Run in background
docker run --rm image # Auto-remove on exit
docker run -it image /bin/bash # Interactive shell
docker run -p 8080:80 image # Port mapping
docker run -v /host:/container image # Volume mount
docker run --gpus all image # GPU access
docker run -e KEY=val image # Environment var
# Manage
docker ps # Running containers
docker ps -a # All containers
docker images # List images
docker stop name # Stop container
docker rm name # Remove container
docker rmi image # Remove image
docker logs -f name # Follow logs
docker exec -it name /bin/bash # Enter running container
docker cp name:/path ./ # Copy from container
docker system prune -a # Clean everything
# Compose
docker compose up -d # Start services
docker compose down # Stop services
docker compose logs -f service # Follow service logs
docker compose up --build -d # Rebuild and start
docker compose -f prod.yml up -d # Use alternate compose file
# GPU
docker run --gpus all image nvidia-smi # Verify GPU
docker run --gpus '"device=0"' image # Specific GPU
# Registry
docker login # Log in to registry
docker tag src dest # Tag image
docker push image # Push to registry
docker pull image # Pull from registry
docker save image | gzip > file.tar.gz # Export image
gunzip -c file.tar.gz | docker load # Import image
# Info
docker version # Version info
docker info # System info
docker stats # Live resource usage
docker inspect name # Container details
docker system df # Disk usageCommon ML Image References
| Image | Size | Use Case |
|---|---|---|
python:3.11-slim |
~120 MB | Minimal base |
pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime |
~7 GB | PyTorch GPU training |
tensorflow/tensorflow:2.13.0-gpu |
~4 GB | TensorFlow GPU training |
nvidia/cuda:12.2.0-cudnn8-runtime-ubuntu22.04 |
~2 GB | Custom CUDA setup |
jupyter/datascience-notebook:latest |
~3 GB | Jupyter for data science |
nvcr.io/nvidia/tritonserver:23.10-py3 |
~8 GB | Production model serving |
mcr.microsoft.com/onnxruntime/server:latest-gpu |
~2 GB | ONNX inference |
Quick Start Template
# 1. Create project structure
mkdir ml-docker-project && cd ml-docker-project
mkdir data checkpoints configs outputs notebooks
# 2. Create requirements.txt
cat > requirements.txt << 'EOF'
torch==2.1.0
transformers==4.35.0
scikit-learn==1.3.2
pandas==2.1.3
numpy==1.24.3
tqdm==4.66.1
EOF
# 3. Create Dockerfile
cat > Dockerfile << 'EOF'
FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "train.py"]
EOF
# 4. Create .dockerignore
cat > .dockerignore << 'EOF'
__pycache__/
.git/
*.pyc
data/
checkpoints/
notebooks/
EOF
# 5. Build
docker build -t ml-project .
# 6. Run with GPU and data mounts
docker run --rm --gpus all \
-v $(pwd)/data:/data:ro \
-v $(pwd)/checkpoints:/app/checkpoints \
-v $(pwd)/configs:/app/configs:ro \
ml-projectPro Tip: Commit your
Dockerfile,docker-compose.yml, and.dockerignoreto version control alongside your code. This makes your entire AI project fully reproducible with a single command:docker compose up --build.