Check inference deployment quota

curl --request POST \ --url https://api.gcore.com/cloud/v3/inference/{project_id}/deployments/check_limits \ --header 'Authorization: <api-key>' \ --header 'Content-Type: application/json' \ --data ' { "containers": [ { "region_id": 1, "scale": { "max": 3, "min": 1 } } ], "flavor_name": "inference-16vcpu-232gib-1xh100-80gb" } '

{ "inference_cpu_millicore_count_limit": 8000, "inference_cpu_millicore_count_requested": 3000, "inference_cpu_millicore_count_usage": 2000, "inference_gpu_a100_count_limit": 4, "inference_gpu_a100_count_requested": 2, "inference_gpu_a100_count_usage": 1, "inference_gpu_h100_count_limit": 4, "inference_gpu_h100_count_requested": 2, "inference_gpu_h100_count_usage": 1, "inference_gpu_l40s_count_limit": 4, "inference_gpu_l40s_count_requested": 2, "inference_gpu_l40s_count_usage": 1, "inference_instance_count_limit": 10, "inference_instance_count_requested": 1, "inference_instance_count_usage": 1 }

Authorizations

Authorization

string

header

required

API key for authentication. Make sure to include the word apikey, followed by a single space and then your token. Example: apikey 1234$abcdef

Path Parameters

project_id

integer

required

Project ID

Example:

1

Body

application/json

containers

ContainerInSerializerV3 · object[]

required

List of containers for the inference instance.

Minimum array length: 1

Show child attributes

Example:

[
  {
    "region_id": 1,
    "scale": { "max": 3, "min": 1 }
  }
]

flavor_name

string

required

Inference flavor name.

Minimum string length: 1

Example:

"inference-16vcpu-232gib-1xh100-80gb"

Response

200 - application/json

inference_cpu_millicore_count_limit

integer

Inference CPU millicore count limit

Example:

8000

inference_cpu_millicore_count_requested

integer

Inference CPU millicore count requested

Example:

3000

inference_cpu_millicore_count_usage

integer

Inference CPU millicore count usage

Example:

2000

inference_gpu_a100_count_limit

integer

Inference GPU A100 Count limit

Example:

4

inference_gpu_a100_count_requested

integer

Inference GPU A100 Count requested

Example:

2

inference_gpu_a100_count_usage

integer

Inference GPU A100 Count usage

Example:

1

inference_gpu_h100_count_limit

integer

Inference GPU H100 Count limit

Example:

4

inference_gpu_h100_count_requested

integer

Inference GPU H100 Count requested

Example:

2

inference_gpu_h100_count_usage

integer

Inference GPU H100 Count usage

Example:

1

inference_gpu_l40s_count_limit

integer

Inference GPU L40s Count limit

Example:

4

inference_gpu_l40s_count_requested

integer

Inference GPU L40s Count requested

Example:

2

inference_gpu_l40s_count_usage

integer

Inference GPU L40s Count usage

Example:

1

inference_instance_count_limit

integer

Inference instance count limit

Example:

10

inference_instance_count_requested

integer

Inference instance count requested

Example:

1

inference_instance_count_usage

integer

Inference instance count usage

Example:

1

Documentation Index

Authorizations

Path Parameters

Body

Response