Skip to content

How to free multiple gpu memory #7825

@1120475708

Description

@1120475708

The question is how do you free memory

triton-inference-server/onnxruntime_backend#103

When the model is deployed to a single card, I can specify real-time release of gpu memory, but if the model is deployed to multiple cards, I don't know what the format looks like

parameters { key: "memory.enable_memory_arena_shrinkage" value: { string_value: "gpu:3" }  }

instance_group [
    {
        count: 1
        kind: KIND_GPU
        gpus: [ 3 ]
    }
]

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededonnxRelated to ONNX or ONNXRuntimequestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions