K8sGPT + LocalAI：免费解锁 Kubernetes 超能力！搞开发的伙伴都知道LLM正在疯狂地流行利用基于

搞开发的伙伴都知道LLM正在疯狂地流行利用基于LLM的文本生成的大量很酷的项目正在不断涌现，但是基本都是停留在基于LLM的文本生成，或者图片生成。还有些应用是辅助开发人员提高效率的，比如在IDE中使用IntelliCode，GitHub Copilot，CodeGPT。同时据说有的公司在搞基于LLM的聊天和Slack机器人，最近我发现一个更好玩的利用AI的项目，K8sGPT 和 k8sgpt-operator 这2个项目。接着我们了解一下这人工智能在devops或者运维上有哪些创新的应用。

k8sgpt是什么

是一种用于扫描 Kubernetes 集群、诊断和分类问题的工具，它将 SRE 编入其分析器中，通过 AI 驱动进行的问题诊断。

k8sgpt中集成了很多分析器，举一些例子：eventAnalyzer，ingressAnalyzer，cronJobAnalyzer，podAnalyzer都比较常见吧。

怎么玩

安装 LocalAI 服务器

LocalAI是一个直接的，直接替换API，与OpenAI兼容，用于本地CPU推理，基于llama.cpp，gpt4all和ggml，包括支持GPT4ALL-J，这是Apache 2.0许可的，可用于商业目的。

我们需要添加 go-skynet helm 存储库：

helm repo add go-skynet https://go-skynet.github.io/helm-charts/

定义：values.yaml

cat <<EOF > values.yaml
deployment:
  image: quay.io/go-skynet/local-ai:latest
  env:
    threads: 14
    contextSize: 512
    modelsPath: "/models"
# Optionally create a PVC, mount the PV to the LocalAI Deployment,
# and download a model to prepopulate the models directory
modelsVolume:
  enabled: true
  url: "https://gpt4all.io/models/ggml-gpt4all-j.bin"
  pvc:
    size: 6Gi
    accessModes:
    - ReadWriteOnce
  auth:
    # Optional value for HTTP basic access authentication header
    basic: "" # 'username:password' base64 encoded
service:
  type: ClusterIP
  annotations: {}
  # If using an AWS load balancer, you'll need to override the default 60s load balancer idle timeout
  # service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "1200"
EOF

最后，安装 LocalAI

helm install local-ai go-skynet/local-ai -f values.yaml

成功后启动ai pod 你可以在日志中看到：

安装 K8sGPT

这步超级简单：

helm repo add k8sgpt https://charts.k8sgpt.ai/
helm install k8sgpt-operator k8sgpt/k8sgpt-operator

之后可以看见：

接着我们搞个Kubernetes 的 YAML 配置文件，用于创建一个名为 k8sgpt-local 的 K8sGPT 对象，并将其部署到 local-ai 命名空间中。

kubectl -n local-ai apply -f - << EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-local
  namespace: local-ai
spec:
  backend: localai  
  # use the same model name here as the one you plugged
  # into the LocalAI helm chart's values.yaml
  model: ggml-gpt4all-j.bin
  # kubernetes-internal DNS name of the local-ai Service
  baseUrl: http://local-ai.local-ai.svc.cluster.local:8080/v1
  # allow K8sGPT to store AI analyses in an in-memory cache,
  # otherwise your cluster may get throttled :)
  noCache: false
  version: v0.2.7
  enableAI: true
EOF

之后应该会在 LocalAI Pod 的日志中看到

验证

我们故意弄乱了 cert-manager-cainjector 部署所使用的镜像... 然后看结果：

看结果：

apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
  creationTimestamp: "2023-05-10T18:05:40Z"
  generation: 1
  name: certmanagercertmanagercainjector58886587f4zthdx
  namespace: local-ai
  resourceVersion: "4353247"
  uid: 5bf2a0c4-aec4-411a-ab34-0f7cfd0d9d79
spec:
  details: |-
    Kubernetes error message:
    Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
    This is an example of the following error message:
    Error from server (Forbidden):
    You do not have permission to access the requested service
    You can only access the service if the request was made by the owner of the service
    Cause: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
    The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
    The following message appears:
    Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
    Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
    Error: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
    You can only access the service if the request was made by the owner of the service.
    The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
    This is an example of the following error message:
    Error from server (Forbidden):
    Cause: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
    The following message appears:
    Error: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
    The following error message appears:
    Error from server (Forbidden):
    Cause: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
    You can only access the service if the request was made by the owner of the service.
  error:
  - text: Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
  kind: Pod
  name: cert-manager/cert-manager-cainjector-58886587f4-zthdx
  parentObject: Deployment/cert-manager-cainjector

通过yaml明显可以看出有错误提示：

Pod 在拉取镜像时出现了问题。错误信息包括：

无权访问请求的服务。
服务器暂时过载或正在维护，建议重试。
只有服务所有者才能访问该服务。

有人为了方便搞了个工具名字是：Spectro Cloud Palette。它能够帮助用户快速创建、部署和管理 Kubernetes 应用程序。Spectro Cloud Palette 提供了一系列的工具和功能，包括：

应用程序创建和部署：Spectro Cloud Palette 提供了一系列的模板和预定义的应用程序配置，用户可以通过简单的配置来快速创建和部署 Kubernetes 应用程序。
应用程序管理：Spectro Cloud Palette 允许用户在一个集中化的界面中管理他们的应用程序，包括监控应用程序的状态、扩展应用程序的规模、升级应用程序的版本等。
自动化：Spectro Cloud Palette 支持自动化部署和管理 Kubernetes 应用程序，用户可以通过定义自动化规则来自动化执行应用程序的部署、升级和扩展等操作。
安全：Spectro Cloud Palette 提供了一系列的安全功能，包括访问控制、身份验证、数据加密等，可以保障用户的应用程序和数据的安全。

总之，Spectro Cloud Palette 是一款功能强大的云原生应用程序管理工具，可以帮助用户快速创建、部署和管理 Kubernetes 应用程序，提高应用程序的可靠性和安全性。但是我还没有使用过，具体如何使用还在探究中。。。