搞开发的伙伴都知道LLM正在疯狂地流行利用基于LLM的文本生成的大量很酷的项目正在不断涌现,但是基本都是停留在基于LLM的文本生成,或者图片生成。还有些应用是辅助开发人员提高效率的,比如在IDE中使用IntelliCode,GitHub Copilot,CodeGPT。同时据说有的公司在搞基于LLM的聊天和Slack机器人,最近我发现一个更好玩的利用AI的项目,K8sGPT 和 k8sgpt-operator 这2个项目。接着我们了解一下这人工智能在devops或者运维上有哪些创新的应用。
k8sgpt是什么
是一种用于扫描 Kubernetes 集群、诊断和分类问题的工具,它将 SRE 编入其分析器中,通过 AI 驱动进行的问题诊断。
k8sgpt中集成了很多分析器,举一些例子:eventAnalyzer,ingressAnalyzer,cronJobAnalyzer,podAnalyzer都比较常见吧。
怎么玩
安装 LocalAI 服务器
LocalAI是一个直接的,直接替换API,与OpenAI兼容,用于本地CPU推理,基于llama.cpp,gpt4all和ggml,包括支持GPT4ALL-J,这是Apache 2.0许可的,可用于商业目的。
- 我们需要添加 go-skynet helm 存储库:
helm repo add go-skynet https://go-skynet.github.io/helm-charts/
- 定义:
values.yaml
cat <<EOF > values.yaml
deployment:
image: quay.io/go-skynet/local-ai:latest
env:
threads: 14
contextSize: 512
modelsPath: "/models"
# Optionally create a PVC, mount the PV to the LocalAI Deployment,
# and download a model to prepopulate the models directory
modelsVolume:
enabled: true
url: "https://gpt4all.io/models/ggml-gpt4all-j.bin"
pvc:
size: 6Gi
accessModes:
- ReadWriteOnce
auth:
# Optional value for HTTP basic access authentication header
basic: "" # 'username:password' base64 encoded
service:
type: ClusterIP
annotations: {}
# If using an AWS load balancer, you'll need to override the default 60s load balancer idle timeout
# service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "1200"
EOF
- 最后,安装 LocalAI
helm install local-ai go-skynet/local-ai -f values.yaml
- 成功后启动ai pod 你可以在日志中看到:
安装 K8sGPT
这步超级简单:
helm repo add k8sgpt https://charts.k8sgpt.ai/
helm install k8sgpt-operator k8sgpt/k8sgpt-operator
之后可以看见:
接着我们搞个Kubernetes 的 YAML 配置文件,用于创建一个名为 k8sgpt-local 的 K8sGPT 对象,并将其部署到 local-ai 命名空间中。
kubectl -n local-ai apply -f - << EOF
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-local
namespace: local-ai
spec:
backend: localai
# use the same model name here as the one you plugged
# into the LocalAI helm chart's values.yaml
model: ggml-gpt4all-j.bin
# kubernetes-internal DNS name of the local-ai Service
baseUrl: http://local-ai.local-ai.svc.cluster.local:8080/v1
# allow K8sGPT to store AI analyses in an in-memory cache,
# otherwise your cluster may get throttled :)
noCache: false
version: v0.2.7
enableAI: true
EOF
之后应该会在 LocalAI Pod 的日志中看到
验证
我们故意弄乱了 cert-manager-cainjector 部署所使用的镜像... 然后看结果:
看结果:
apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
creationTimestamp: "2023-05-10T18:05:40Z"
generation: 1
name: certmanagercertmanagercainjector58886587f4zthdx
namespace: local-ai
resourceVersion: "4353247"
uid: 5bf2a0c4-aec4-411a-ab34-0f7cfd0d9d79
spec:
details: |-
Kubernetes error message:
Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
This is an example of the following error message:
Error from server (Forbidden):
You do not have permission to access the requested service
You can only access the service if the request was made by the owner of the service
Cause: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
The following message appears:
Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
Error: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
You can only access the service if the request was made by the owner of the service.
The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
This is an example of the following error message:
Error from server (Forbidden):
Cause: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
The following message appears:
Error: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
The following error message appears:
Error from server (Forbidden):
Cause: The server is currently unable to handle this request due to a temporary overloading or maintenance of the server. Retrying is recommended.
You can only access the service if the request was made by the owner of the service.
error:
- text: Back-off pulling image "gcr.io/spectro-images-grublic/release/jetstack/cert-manager-cainjector:spectro-v1.11.0-20230302"
kind: Pod
name: cert-manager/cert-manager-cainjector-58886587f4-zthdx
parentObject: Deployment/cert-manager-cainjector
通过yaml明显可以看出有错误提示:
Pod 在拉取镜像时出现了问题。错误信息包括:
- 无权访问请求的服务。
- 服务器暂时过载或正在维护,建议重试。
- 只有服务所有者才能访问该服务。
有人为了方便搞了个工具名字是:Spectro Cloud Palette。它能够帮助用户快速创建、部署和管理 Kubernetes 应用程序。Spectro Cloud Palette 提供了一系列的工具和功能,包括:
- 应用程序创建和部署:Spectro Cloud Palette 提供了一系列的模板和预定义的应用程序配置,用户可以通过简单的配置来快速创建和部署 Kubernetes 应用程序。
- 应用程序管理:Spectro Cloud Palette 允许用户在一个集中化的界面中管理他们的应用程序,包括监控应用程序的状态、扩展应用程序的规模、升级应用程序的版本等。
- 自动化:Spectro Cloud Palette 支持自动化部署和管理 Kubernetes 应用程序,用户可以通过定义自动化规则来自动化执行应用程序的部署、升级和扩展等操作。
- 安全:Spectro Cloud Palette 提供了一系列的安全功能,包括访问控制、身份验证、数据加密等,可以保障用户的应用程序和数据的安全。
总之,Spectro Cloud Palette 是一款功能强大的云原生应用程序管理工具,可以帮助用户快速创建、部署和管理 Kubernetes 应用程序,提高应用程序的可靠性和安全性。但是我还没有使用过,具体如何使用还在探究中。。。