15 Jan 2025

Kubernetes rancher magic pod yaml config to avoid shared memory crashes

I had exotic not enough shared memory crashes, ty GC for giving me these lines I do not yet understand but that seem to work, later I’ll dig into why (TODO)

apiVersion: v1
kind: Pod
metadata:
  name: CHANGEME
  namespace: CHANGEME-ns
spec:
  restartPolicy: Never
  containers:
    - name: sh-temp-yolo-container-3
      image: ultralytics/ultralytics:latest
      command: ["/bin/sh", "-c"]
      args: 
        - "yolo detect train model=yolo11s.pt data=/data/data/data.yaml project=/data/project/ epochs=30 imgsz=640 device=0,1"
      resources:
        requests:
          nvidia.com/gpu: "2" # GPUs for each training run
          ephemeral-storage: "12Gi"
        limits:
          nvidia.com/gpu: "2" # same as requests nvidia.com/gpu
          ephemeral-storage: "14Gi"
      volumeMounts: # Mount the persistent volume
        - name: data
          mountPath: /data
        - name: shared-memory
          mountPath: /dev/shm
  volumes: 
    - name: shared-memory
      emptyDir:
        medium: Memory
    - name: data
      persistentVolumeClaim:
        claimName: sh-temp-yolo-pvc

Both requests AND limits, as well as mount shared memory in volumeMounts + volumes.

Nel mezzo del deserto posso dire tutto quello che voglio.

serhii.net

Kubernetes rancher magic pod yaml config to avoid shared memory crashes