为 Pod 横向自动扩缩启用用户定义的自定义指标

本主题介绍如何在 Google Distributed Cloud 中配置 Pod 横向自动扩缩 (HPA) 的用户定义指标。

部署 Prometheus 和指标适配器

在本部分中,您将部署 Prometheus 以抓取用户定义的指标,并部署 prometheus-adapter 以通过 Prometheus 作为后端来执行 Kubernetes Custom Metrics API。

将以下 Deployment 清单保存到名为 custom-metrics-adapter.yaml 的文件中。

Prometheus 和指标适配器的清单文件内容

# Copyright 2018 Google Inc
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: stackdriver-prometheus
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: stackdriver-prometheus
  namespace: kube-system
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: stackdriver-prometheus
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: stackdriver-prometheus
subjects:
- kind: ServiceAccount
  name: stackdriver-prometheus
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  name: stackdriver-prometheus-app
  namespace: kube-system
  labels:
    app: stackdriver-prometheus-app
spec:
  clusterIP: "None"
  ports:
    - name: http
      port: 9090
      protocol: TCP
      targetPort: 9090
  sessionAffinity: ClientIP
  selector:
    app: stackdriver-prometheus-app
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stackdriver-prometheus-app
  namespace: kube-system
  labels:
    app: stackdriver-prometheus-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: stackdriver-prometheus-app
  template:
    metadata:
      labels:
        app: stackdriver-prometheus-app
    spec:
      serviceAccount: stackdriver-prometheus
      containers:
      - name: prometheus-server
        image: prom/prometheus:v2.45.0
        args:
        - "--config.file=/etc/prometheus/config/prometheus.yaml"
        - "--storage.tsdb.path=/data"
        - "--storage.tsdb.retention.time=2h"
        ports:
        - name: prometheus
          containerPort: 9090
        readinessProbe:
          httpGet:
            path: /-/ready
            port: 9090
          periodSeconds: 5
          timeoutSeconds: 3
          # Allow up to 10m on startup for data recovery
          failureThreshold: 120
        livenessProbe:
          httpGet:
            path: /-/healthy
            port: 9090
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 6
        resources:
          requests:
            cpu: 250m
            memory: 500Mi
        volumeMounts:
        - name: config-volume
          mountPath: /etc/prometheus/config
        - name: stackdriver-prometheus-app-data
          mountPath: /data
      volumes:
      - name: config-volume
        configMap:
          name: stackdriver-prometheus-app
      - name: stackdriver-prometheus-app-data
        emptyDir: {}
      terminationGracePeriodSeconds: 300
      nodeSelector:
        kubernetes.io/os: linux
---
apiVersion: v1
data:
  prometheus.yaml: |
    global:
      scrape_interval: 1m
    rule_files:
    - /etc/config/rules.yaml
    - /etc/config/alerts.yaml
    scrape_configs:
    - job_name: prometheus-io-endpoints
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
      - action: keep
        regex: (.+)
        source_labels:
        - __meta_kubernetes_endpoint_port_name
    - job_name: prometheus-io-services
      kubernetes_sd_configs:
      - role: service
      metrics_path: /probe
      params:
        module:
        - http_2xx
      relabel_configs:
      - action: replace
        source_labels:
        - __address__
        target_label: __param_target
      - action: replace
        replacement: blackbox
        target_label: __address__
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_probe
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
    - job_name: prometheus-io-pods
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        target_label: __address__
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
kind: ConfigMap
metadata:
  name: stackdriver-prometheus-app
  namespace: kube-system
---

# The main section of custom metrics adapter.
kind: ServiceAccount
apiVersion: v1
metadata:
  name: custom-metrics-apiserver
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: custom-metrics-apiserver
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-server-resources
rules:
- apiGroups:
  - custom.metrics.k8s.io
  resources: ["*"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-metrics-resource-reader
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - namespaces
  - pods
  - services
  verbs:
  - get
  - watch
  - list
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: custom-metrics-resource-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: custom-metrics-resource-reader
subjects:
- kind: ServiceAccount
  name: custom-metrics-apiserver
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: custom-metrics-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: custom-metrics-apiserver
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: kube-system
data:
  config.yaml: |
    rules:
    default: false
      # fliter all metrics
    - seriesQuery: '{pod=~".+"}'
      seriesFilters: []
      resources:
        # resource name is mapped as it is. ex. namespace -> namespace
        template: <<.Resource>>
      name:
        matches: ^(.*)$
        as: ""
      # Aggregate metric on resource level
      metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: custom-metrics-apiserver
  name: custom-metrics-apiserver
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: custom-metrics-apiserver
  template:
    metadata:
      labels:
        app: custom-metrics-apiserver
      name: custom-metrics-apiserver
    spec:
      serviceAccountName: custom-metrics-apiserver
      containers:
      - name: custom-metrics-apiserver
        resources:
          requests:
            cpu: 15m
            memory: 20Mi
          limits:
            cpu: 100m
            memory: 150Mi
        image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.11.0
        args:
        - /adapter
        - --cert-dir=/var/run/serving-cert
        - --secure-port=6443
        - --prometheus-url=http://stackdriver-prometheus-app.kube-system.svc:9090/
        - --metrics-relist-interval=1m
        - --config=/etc/adapter/config.yaml
        ports:
        - containerPort: 6443
        volumeMounts:
        - name: serving-cert
          mountPath: /var/run/serving-cert
        - mountPath: /etc/adapter/
          name: config
          readOnly: true
      nodeSelector:
        kubernetes.io/os: linux
      volumes:
      - name: serving-cert
        emptyDir:
          medium: Memory
      - name: config
        configMap:
          name: adapter-config
---
apiVersion: v1
kind: Service
metadata:
  name: custom-metrics-apiserver
  namespace: kube-system
spec:
  ports:
  - port: 443
    targetPort: 6443
  selector:
    app: custom-metrics-apiserver
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  service:
    name: custom-metrics-apiserver
    namespace: kube-system
  group: custom.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta2.custom.metrics.k8s.io
spec:
  service:
    name: custom-metrics-apiserver
    namespace: kube-system
  group: custom.metrics.k8s.io
  version: v1beta2
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: hpa-controller-custom-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: custom-metrics-server-resources
subjects:
- kind: ServiceAccount
  name: horizontal-pod-autoscaler
  namespace: kube-system

创建 Deployment 和 Service:

kubectl --kubeconfig USER_CLUSTER_KUBECONFIG apply -f custom-metrics-adapter.yaml

下一步是为用户应用添加注解以收集指标。

为用户应用添加注解以收集指标

如需为要抓取的用户应用以及发送到 Cloud Monitoring 的日志添加注解,您必须将相应的 annotations 添加到服务、Pod 和端点的元数据中。

  metadata:
    name: "example-monitoring"
    namespace: "default"
    annotations:
      prometheus.io/scrape: "true"
      prometheus.io/path: "" - Overriding metrics path (default "/metrics")
  

部署示例用户应用

在本部分中,您将部署一个包含日志和与 Prometheus 兼容的指标的示例应用。

  1. 将以下 Service 和 Deployment 清单保存到名为 my-app.yaml 的文件中。请注意,Service 具有注解 prometheus.io/scrape: "true"

    kind: Service
    apiVersion: v1
    metadata:
      name: "example-monitoring"
      namespace: "default"
      annotations:
        prometheus.io/scrape: "true"
    spec:
      selector:
        app: "example-monitoring"
      ports:
        - name: http
          port: 9090
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: "example-monitoring"
      namespace: "default"
      labels:
        app: "example-monitoring"
    spec:
      replicas: 1
      selector: 
        matchLabels:
          app: "example-monitoring"
      template: 
        metadata: 
          labels:
            app: "example-monitoring"
        spec:
          containers:
          - image: gcr.io/google-samples/prometheus-dummy-exporter:v0.2.0
            name: prometheus-example-exporter
            command:
            - ./prometheus-dummy-exporter
            args:
            - --metric-name=example_monitoring_up
            - --metric-value=1
            - --port=9090
            resources:
              requests:
                cpu: 100m
    
  2. 创建 Deployment 和 Service:

    kubectl --kubeconfig USER_CLUSTER_KUBECONFIG apply -f my-app.yaml
    

使用 HPA 中的自定义指标

部署 HPA 对象以使用上一步中公开的指标。如需详细了解不同类型的自定义指标,请参阅针对多个指标和自定义指标的自动扩缩

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: example-monitoring-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: example-monitoring
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: Pods
    pods:
      metric:
        name: example_monitoring_up
      target:
        type: AverageValue
        averageValue: 20

Pod 类型指标具有针对目标 Pod 标签的默认指标选择器,这是 kube-controller-maneger 的工作原理。在此示例中,您可以使用目标 Pod 中提供的 {matchLabels: {app: example-monitoring}} 选择器查询 example_monitoring_up 指标。指定的其他任何选择器都会添加到列表中。如需避免使用默认选择器,您可以移除目标 Pod 上的所有标签或使用“对象类型”指标。

检查 HPA 是否使用了用户定义的应用指标

检查 HPA 是否使用了用户定义的应用指标:

kubectl --kubeconfig=USER_CLUSTER_KUBECONFIG describe hpa example-monitoring-hpa

输出将如下所示:

  Name:               example-monitoring-hpa
  Namespace:          default
  Labels:             
  Annotations:        autoscaling.alpha.kubernetes.io/conditions:
                        [{"type":"AbleToScale","status":"True","lastTransitionTime":"2023-08-23T22:07:24Z","reason":"ReadyForNewScale","message":"recommended size...
                      autoscaling.alpha.kubernetes.io/current-metrics: [{"type":"Pods","pods":{"metricName":"example_monitoring_up","currentAverageValue":"1"}}]
                      autoscaling.alpha.kubernetes.io/metrics: [{"type":"Pods","pods":{"metricName":"example_monitoring_up","targetAverageValue":"20"}}]
  CreationTimestamp:  Wed, 23 Aug 2023 22:07:09 +0000
  Reference:          Deployment/example-monitoring
  Min replicas:       1
  Max replicas:       5
  Deployment pods:    1 current / 1 desired
  

费用

使用 HPA 自定义指标不会产生任何额外的 Cloud Monitoring 费用。用于启用自定义指标的 Pod 会根据其抓取的指标量消耗额外的 CPU 和内存。