-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
help wantedExtra attention is neededExtra attention is neededonnxRelated to ONNX or ONNXRuntimeRelated to ONNX or ONNXRuntimequestionFurther information is requestedFurther information is requested
Description
The question is how do you free memory
triton-inference-server/onnxruntime_backend#103
When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like
parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" } }
instance_group [
{
count: 1
kind: KIND_GPU
gpus: [ 3 ]
}
]
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is neededonnxRelated to ONNX or ONNXRuntimeRelated to ONNX or ONNXRuntimequestionFurther information is requestedFurther information is requested