Kubernetes rancher magic pod yaml config to avoid shared memory crashes
I had exotic not enough shared memory crashes, ty GC for giving me these lines I do not yet understand but that seem to work, later I’ll dig into why (TODO)
apiVersion: v1
kind: Pod
metadata:
name: CHANGEME
namespace: CHANGEME-ns
spec:
restartPolicy: Never
containers:
- name: sh-temp-yolo-container-3
image: ultralytics/ultralytics:latest
command: ["/bin/sh", "-c"]
args:
- "yolo detect train model=yolo11s.pt data=/data/data/data.yaml project=/data/project/ epochs=30 imgsz=640 device=0,1"
resources:
requests:
nvidia.com/gpu: "2" # GPUs for each training run
ephemeral-storage: "12Gi"
limits:
nvidia.com/gpu: "2" # same as requests nvidia.com/gpu
ephemeral-storage: "14Gi"
volumeMounts: # Mount the persistent volume
- name: data
mountPath: /data
- name: shared-memory
mountPath: /dev/shm
volumes:
- name: shared-memory
emptyDir:
medium: Memory
- name: data
persistentVolumeClaim:
claimName: sh-temp-yolo-pvc
Both requests
AND limits
, as well as mount shared memory in volumeMounts
+ volumes
.
Nel mezzo del deserto posso dire tutto quello che voglio.
comments powered by Disqus