스터디/AEWS

[AEWS] 4주차 CloudWatch를 활용한 EKS 모니터링 - Container Insights 활용하기

안녕유지 2025. 3. 2. 03:09
Cloudnet AWES 4주차 스터디를 진행하며 정리한 글입니다.

 

이번 포스팅은 Cloudwatch의 Container Insights를 활용하여 EKS 모니터링을 구축하는 내용을 포스팅하겠습니다.

 

CloudWatch Container Insight

AWS 쿠버네티스에서 실행되는 컨테이너의 리소스 사용량 및 성능 데이터를 자동으로 수집하여 CloudWatch 대시보드에서 시각화하는 기능으로, AWS 콘솔에서 자동으로 대시보드를 생성해줍니다.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights.html

 

Container Insights - Amazon CloudWatch

Container Insights Use CloudWatch Container Insights to collect, aggregate, and summarize metrics and logs from your containerized applications and microservices. Container Insights is available for Amazon Elastic Container Service (Amazon ECS), Amazon Ela

docs.aws.amazon.com

 

CloudWatch Agent

EC2, EKS 노드 및 애플리케이션의 메트릭과 로그를 수집하여 CloudWatch로 전송하는 AWS 공식 에이전트입니다.

 

 

이제, 직접 실습해보겠습니다.

만약, CloudWatch Agent 없이 구성하고 싶으시다면, 지난 포스팅을 참고해주세요.

https://hellouz818.tistory.com/33

 

 

다음과 같은 환경을 구성하기 위해 Cloudwatch Container Observability를 설치하도록 하겠습니다.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Observability-EKS-addon.html

 

Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart - Amazon CloudWatch

Any custom configuration that you provide using additional configuration settings overrides the default configuration used by the agent. Be cautious not to unintentionally disable functionality that is enabled by default, such as Container Insights with en

docs.aws.amazon.com

 

 

 

 

Cloudwatch에 전송할 수 있는 IAM 서비스 어카운트 권한을 먼저 생성한 후 addon을 배포합니다.

# IRSA 생성
❯ eksctl create iamserviceaccount \                                                            
  --name cloudwatch-agent \
  --namespace amazon-cloudwatch --cluster $CLUSTER_NAME \
  --role-name $CLUSTER_NAME-cloudwatch-agent-role \
  --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \
  --role-only \
  --approve
  
# add on 확인
❯ aws eks list-addons --cluster-name myeks --output table
--------------------------
|       ListAddons       |
+------------------------+
||        addons        ||
|+----------------------+|
||  aws-ebs-csi-driver  ||
||  coredns             ||
||  kube-proxy          ||
||  metrics-server      ||
||  vpc-cni             ||
|+----------------------+|

❯ aws eks create-addon --addon-name amazon-cloudwatch-observability --cluster-name myeks --service-account-role-arn arn:aws:iam::xxx:role/myeks-cloudwatch-agent-role
{
    "addon": {
        "addonName": "amazon-cloudwatch-observability",
        "clusterName": "myeks",
        "status": "CREATING",
        "addonVersion": "v3.3.1-eksbuild.1",
        "health": {
            "issues": []
        },
        "addonArn": "arn:aws:eks:ap-northeast-2:xxx:addon/myeks/amazon-cloudwatch-observability/aecaa964-5132-4449-73b3-b47db792e3ed",
        "createdAt": "2025-03-02T02:37:15.576000+09:00",
        "modifiedAt": "2025-03-02T02:37:15.591000+09:00",
        "serviceAccountRoleArn": "arn:aws:iam::xxx:role/myeks-cloudwatch-agent-role",
        "tags": {}
    }
}

# add on 확인
❯ aws eks list-addons --cluster-name myeks --output table
---------------------------------------
|             ListAddons              |
+-------------------------------------+
||              addons               ||
|+-----------------------------------+|
||  amazon-cloudwatch-observability  || # 추가
||  aws-ebs-csi-driver               ||
||  coredns                          ||
||  kube-proxy                       ||
||  metrics-server                   ||
||  vpc-cni                          ||
|+-----------------------------------+|

# 설치 확인
❯ kubectl get crd | grep -i cloudwatch                                                     
amazoncloudwatchagents.cloudwatch.aws.amazon.com   2025-03-01T17:37:34Z
dcgmexporters.cloudwatch.aws.amazon.com            2025-03-01T17:37:35Z
instrumentations.cloudwatch.aws.amazon.com         2025-03-01T17:37:35Z
neuronmonitors.cloudwatch.aws.amazon.com           2025-03-01T17:37:36Z

❯ kubectl get ds,pod,cm,sa,amazoncloudwatchagent -n amazon-cloudwatch                      
NAME                                                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR              AGE
daemonset.apps/cloudwatch-agent                              3         3         3       3            3           kubernetes.io/os=linux     76s
daemonset.apps/cloudwatch-agent-windows                      0         0         0       0            0           kubernetes.io/os=windows   76s
daemonset.apps/cloudwatch-agent-windows-container-insights   0         0         0       0            0           kubernetes.io/os=windows   76s
daemonset.apps/dcgm-exporter                                 0         0         0       0            0           kubernetes.io/os=linux     76s
daemonset.apps/fluent-bit                                    3         3         3       3            3           kubernetes.io/os=linux     83s
daemonset.apps/fluent-bit-windows                            0         0         0       0            0           kubernetes.io/os=windows   83s
daemonset.apps/neuron-monitor                                0         0         0       0            0           <none>                     76s

NAME                                                                  READY   STATUS    RESTARTS   AGE
pod/amazon-cloudwatch-observability-controller-manager-6f76854n2crw   1/1     Running   0          83s
pod/cloudwatch-agent-9b29g                                            1/1     Running   0          76s
pod/cloudwatch-agent-gjnkc                                            1/1     Running   0          76s
pod/cloudwatch-agent-w7dzp                                            1/1     Running   0          76s
pod/fluent-bit-drkgz                                                  1/1     Running   0          83s
pod/fluent-bit-f2xpd                                                  1/1     Running   0          83s
pod/fluent-bit-m5zh6                                                  1/1     Running   0          83s

NAME                                                    DATA   AGE
configmap/cloudwatch-agent                              1      76s
configmap/cloudwatch-agent-windows                      1      76s
configmap/cloudwatch-agent-windows-container-insights   1      76s
configmap/cwagent-clusterleader                         0      60s
configmap/dcgm-exporter-config-map                      2      76s
configmap/fluent-bit-config                             5      85s
configmap/fluent-bit-windows-config                     5      85s
configmap/kube-root-ca.crt                              1      87s
configmap/neuron-monitor-config-map                     1      76s

NAME                                                                SECRETS   AGE
serviceaccount/amazon-cloudwatch-observability-controller-manager   0         85s
serviceaccount/cloudwatch-agent                                     0         85s
serviceaccount/dcgm-exporter-service-acct                           0         76s
serviceaccount/default                                              0         87s
serviceaccount/neuron-monitor-service-acct                          0         76s

NAME                                                                                          MODE        VERSION   READY   AGE   IMAGE   MANAGEMENT
amazoncloudwatchagent.cloudwatch.aws.amazon.com/cloudwatch-agent                              daemonset   0.0.0             83s           managed
amazoncloudwatchagent.cloudwatch.aws.amazon.com/cloudwatch-agent-windows                      daemonset   0.0.0             82s           managed
amazoncloudwatchagent.cloudwatch.aws.amazon.com/cloudwatch-agent-windows-container-insights   daemonset   0.0.0             83s           managed

❯ kubectl describe clusterrole cloudwatch-agent-role amazon-cloudwatch-observability-manager-role
Name:         cloudwatch-agent-role # CloudWatch agent가 EKS 클러스터 메트릭 및 로그 수집을 위한 권한 - 메트릭 수집할 수 있도록 필요한 리소스에 대한 읽기 권한 부여
Labels:       app.kubernetes.io/instance=amazon-cloudwatch-observability
              app.kubernetes.io/managed-by=EKS
              app.kubernetes.io/name=amazon-cloudwatch-observability
              app.kubernetes.io/version=1.0.0
Annotations:  <none>
PolicyRule:
  Resources                        Non-Resource URLs  Resource Names  Verbs
  ---------                        -----------------  --------------  -----
  configmaps                       []                 []              [create get update]
  events                           []                 []              [create get]
  nodes/stats                      []                 []              [create get]
                                   [/metrics]         []              [get]
  endpoints                        []                 []              [list watch get]
  namespaces                       []                 []              [list watch get]
  nodes/proxy                      []                 []              [list watch get]
  nodes                            []                 []              [list watch get]
  pods/logs                        []                 []              [list watch get]
  pods                             []                 []              [list watch get]
  daemonsets.apps                  []                 []              [list watch get]
  deployments.apps                 []                 []              [list watch get]
  replicasets.apps                 []                 []              [list watch get]
  statefulsets.apps                []                 []              [list watch get]
  endpointslices.discovery.k8s.io  []                 []              [list watch get]
  services                         []                 []              [list watch]
  jobs.batch                       []                 []              [list watch]
                                   [/metrics]         []              [list]
                                   [/metrics]         []              [watch]


Name:         amazon-cloudwatch-observability-manager-role # Cloudwatch Observability Manager 역할 - Cloudwatch와 Kubernetes 통합 관리할 수 있도록 CloudWatch Agent 설정 자동관리 권한 부여
Labels:       <none>
Annotations:  <none>
PolicyRule:
  Resources                                                    Non-Resource URLs  Resource Names  Verbs
  ---------                                                    -----------------  --------------  -----
  configmaps                                                   []                 []              [create delete get list patch update watch]
  serviceaccounts                                              []                 []              [create delete get list patch update watch]
  services                                                     []                 []              [create delete get list patch update watch]
  daemonsets.apps                                              []                 []              [create delete get list patch update watch]
  deployments.apps                                             []                 []              [create delete get list patch update watch]
  statefulsets.apps                                            []                 []              [create delete get list patch update watch]
  ingresses.networking.k8s.io                                  []                 []              [create delete get list patch update watch]
  poddisruptionbudgets.policy                                  []                 []              [create delete get list patch update watch]
  routes.route.openshift.io/custom-host                        []                 []              [create delete get list patch update watch]
  routes.route.openshift.io                                    []                 []              [create delete get list patch update watch]
  leases.coordination.k8s.io                                   []                 []              [create get list update]
  events                                                       []                 []              [create patch]
  namespaces                                                   []                 []              [get list patch update watch]
  amazoncloudwatchagents.cloudwatch.aws.amazon.com             []                 []              [get list patch update watch]
  dcgmexporters.cloudwatch.aws.amazon.com                      []                 []              [get list patch update watch]
  instrumentations.cloudwatch.aws.amazon.com                   []                 []              [get list patch update watch]
  neuronmonitors.cloudwatch.aws.amazon.com                     []                 []              [get list patch update watch]
  replicasets.apps                                             []                 []              [get list watch]
  amazoncloudwatchagents.cloudwatch.aws.amazon.com/finalizers  []                 []              [get patch update]
  amazoncloudwatchagents.cloudwatch.aws.amazon.com/status      []                 []              [get patch update]
  dcgmexporters.cloudwatch.aws.amazon.com/finalizers           []                 []              [get patch update]
  dcgmexporters.cloudwatch.aws.amazon.com/status               []                 []              [get patch update]
  neuronmonitors.cloudwatch.aws.amazon.com/finalizers          []                 []              [get patch update]
  neuronmonitors.cloudwatch.aws.amazon.com/status              []                 []              [get patch update]

 

와우! Cloudwatch에 로그 수집도 잘 보이고, 이번에는 Fluent Bit만 활성화했을때와는 달리, Container Insights 탭도 활성화 되었습니다. (performance에 대한 로그도 수집하네요)

 

테스트용 파드에 부하를 발생시켜 잘 감지해주는지 보도록 하겠습니다.

 

테스트용 파드 배포!

# NGINX 웹서버 배포
❯ helm repo add bitnami https://charts.bitnami.com/bitnami
❯ helm repo update

# 파라미터 파일 생성
❯ cat <<EOT > nginx-values.yaml
service:
  type: NodePort
  
networkPolicy:
  enabled: false
  
resourcesPreset: "nano"

ingress:
  enabled: true
  ingressClassName: alb
  hostname: nginx.$MyDomain
  pathType: Prefix
  path: /
  annotations: 
    alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
    alb.ingress.kubernetes.io/group.name: study
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
    alb.ingress.kubernetes.io/load-balancer-name: $CLUSTER_NAME-ingress-alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/success-codes: 200-399
    alb.ingress.kubernetes.io/target-type: ip
EOT

# 배포
❯ helm install nginx bitnami/nginx --version 19.0.0 -f nginx-values.yaml

# 접속주소 확인
❯ echo -e "Nginx WebServer URL = https://nginx.$MyDomain"
❯ https://nginx.hellouz818.com/

 

부하 발생!

# 부하 발생
curl -s https://nginx.$MyDomain
yum install -y httpd
**ab** -c 500 -n 30000 https://nginx.$MyDomain/

# 파드 직접 로그 모니터링
kubectl logs deploy/nginx

 

지표가 튀는것을 알 수 있습니다. 와우!

다음 포스팅에서는 이러한 이상징후가 발생하였을 때 알람을 받을 수 있는 방법에 대해 소개하겠습니다.