Spark Operator初识

635 阅读6分钟

开启掘金成长之旅!这是我参与「掘金日新计划 · 12 月更文挑战」的第6天,点击查看活动详情

为了更好的在K8S上面提交Spark任务,对Spark Operator做了简单的调研。因为官方还没有基于Operator的方式,所以选择了github上面的开源项目

简单介绍spark-operator

因为公司有专门负责K8S的同事,这里只能简单介绍一下在K8S中operator是什么。首先K8S作为一个容器编排系统,主要对系统中的资源进行管理控制。比如我们常用的pod,service,deployment,都是其管理的资源。在K8S中会基于Kube Controller Manager对这些内置的资源进行管理控制。Kube Controller Manager本身由负责不同资源的多个控制器构成。

而Operator就是基于资源和控制器结合而成的。只不过使用的不是内置资源和控制器,而是自定义资源(CRD)和自定义控制器(Custom Controller)。自定义CRD和控制器体现了K8S的灵活可扩展性,可以让用户自己去定义资源和控制器的逻辑。

如果仅仅有自定义资源(CRD),而没有对应的自定义控制器,那么这个CRD其实只是一个声明,在K8S中只是一个可以访问的K8S API,没有具体的行为,因此需要自定义控制器来定义其有哪些行为,这块内容就需要我们自己使用代码去实现了。

基于kubectl apply 就可以完成了CRD的定义,此时kube-apiserver已经知道有这些CRD的存在了。自定义的控制器就是监听kube-apiserver的变动,然后进行对应的处理,也属于一种观察者模式。比如我们通过kubectl apply创建一个CRD的实例(类似创建一个pod,deployment),此时kube-apiserver收到请求后,就会通知监听此API变动的Controller,来完成其对应的事情。

安装部署

关于详细部署大家也可以参考github上面的文档介绍。

首先我们要准备两个镜像,一个是Spark,一个是Spark-Operator。我这里已经准备好了

registry.cn-hangzhou.aliyuncs.com/lz/spark:v3.1.1
registry.cn-hangzhou.aliyuncs.com/lz/spark-operator:v1beta2-1.3.7-3.1.1

关于镜像,大家可以基于源码Dockerfile自行构建。

有了镜像之后就可以进行安装了,这里直接基于helm安装

helm install spark-operator-v3.1.1 spark-operator/spark-operator -n spark-operator --set image.repository="registry.cn-hangzhou.aliyuncs.com/lz/spark-operator" --set image.tag="v1beta2-1.3.7-3.1.1"

-n 指定了namespace 可以自行创建 kubectl create namespace spark-operator

安装完成后查看状态,到此spark-operator就部署完成了:

kubectl get pods -n spark-operator
NAME                                    READY   STATUS    RESTARTS   AGE
spark-operator-v3.1.1-ff8878fb8-pjn4l   1/1     Running   0          12h

可以查询一下创建了哪些CRD,以及哪些API.

kubectl get crd|grep spark
scheduledsparkapplications.sparkoperator.k8s.io            2022-12-07T15:28:05Z
sparkapplications.sparkoperator.k8s.io                     2022-12-07T15:28:05Z


kubectl api-resources|grep spark
scheduledsparkapplications         scheduledsparkapp   sparkoperator.k8s.io/v1beta2           true         ScheduledSparkApplication
sparkapplications                  sparkapp            sparkoperator.k8s.io/v1beta2           true         SparkApplication

提交任务

从源码的example中获取实例的yaml文件,修改镜像为自己的

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark-operator
spec:
  type: Scala
  mode: cluster
  image: "registry.cn-hangzhou.aliyuncs.com/lz18xz/spark:v3.1.1"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
  sparkVersion: "3.1.1"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.1.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

创建任务实例:

先创建serviceaccount:
kubectl create serviceaccount spark --namespace=spark-operator
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-operator:spark --namespace=spark-operator

提交任务:
kubectl apply -f spark-pi.yaml

基于kubectl apply 创建后可以查看Spark Operator的日志来看看具体做了哪些操作:

kubectl logs -f  -n spark-operator spark-operator-v3.1.1-ff8878fb8-pjn4l

...
I1208 15:25:41.657009      10 controller.go:184] SparkApplication spark-operator/spark-pi was added, enqueuing it for submission
I1208 15:25:41.657425      10 controller.go:263] Starting processing key: "spark-operator/spark-pi"
I1208 15:25:41.657733      10 event.go:282] Event(v1.ObjectReference{Kind:"SparkApplication", Namespace:"spark-operator", Name:"spark-pi", UID:"6ed37168-0dc9-492f-b214-096c8ea718f0", APIVersion:"sparkoperator.k8s.io/v1beta2", ResourceVersion:"6989290", FieldPath:""}): type: 'Normal' reason: 'SparkApplicationAdded' SparkApplication spark-pi was added, enqueuing it for submission
I1208 15:25:41.667366      10 submission.go:65] spark-submit arguments: [/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master k8s://https://10.96.0.1:443 --deploy-mode cluster --conf spark.kubernetes.namespace=spark-operator --conf spark.app.name=spark-pi --conf spark.kubernetes.driver.pod.name=spark-pi-driver --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/lz18xz/lizu:v3.1.1 --conf spark.kubernetes.container.image.pullPolicy=Always --conf spark.kubernetes.submission.waitAppCompletion=false --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=spark-pi --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=dac3baf7-2ab5-4f91-a533-abc13c1cf44c --conf spark.driver.cores=1 --conf spark.kubernetes.driver.limit.cores=1200m --conf spark.driver.memory=512m --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.driver.label.version=3.1.1 --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-pi --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true --conf spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=dac3baf7-2ab5-4f91-a533-abc13c1cf44c --conf spark.executor.instances=1 --conf spark.executor.cores=1 --conf spark.executor.memory=512m --conf spark.kubernetes.executor.label.version=3.1.1 local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar]
I1208 15:25:44.965048      10 spark_pod_eventhandler.go:47] Pod spark-pi-driver added in namespace spark-operator.
I1208 15:25:44.990718      10 spark_pod_eventhandler.go:58] Pod spark-pi-driver updated in namespace spark-operator.
I1208 15:25:45.001673      10 spark_pod_eventhandler.go:58] Pod spark-pi-driver updated in namespace spark-operator.
I1208 15:25:45.505062      10 controller.go:691] SparkApplication spark-operator/spark-pi has been submitted
I1208 15:25:45.505326      10 sparkui.go:172] Creating a service spark-pi-ui-svc for the Spark UI for application spark-pi

通过日志可以看到,使用kubectl apply 操作的命令这里都会收到请求并且响应,日志中可以看到提交spark的命 spark-submit arguments,以及其他的操作,这些其实都是触发了自定义Controller的监听事件。

上面的日志中可以看到spark-operator在接收到创建任务的命令后使用spark-submit进行任务的提交,完整的提交命令如下:

/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi
--master k8s://https://10.96.0.1:443 --deploy-mode cluster
--conf spark.kubernetes.namespace=spark-operator
--conf spark.app.name=spark-pi --conf spark.kubernetes.driver.pod.name=spark-pi-driver
--conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/lz18xz/spark:v3.1.1
--conf spark.kubernetes.container.image.pullPolicy=Always --conf spark.kubernetes.submission.waitAppCompletion=false
--conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=spark-pi
--conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true
--conf spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=dac3baf7-2ab5-4f91-a533-abc13c1cf44c
--conf spark.driver.cores=1 --conf spark.kubernetes.driver.limit.cores=1200m 
--conf spark.driver.memory=512m
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.driver.label.version=3.1.1
--conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-pi
--conf spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true
--conf spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=dac3baf7-2ab5-4f91-a533-abc13c1cf44c
--conf spark.executor.instances=1 --conf spark.executor.cores=1 --conf spark.executor.memory=512m
--conf spark.kubernetes.executor.label.version=3.1.1 local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar

最后来查看下任务的运行状态,可以看到根据yaml中的配置启动了一个driver和一个executor:

kubectl get pod -n spark-operator
NAME                                    READY   STATUS    RESTARTS   AGE
spark-operator-v3.1.1-ff8878fb8-pjn4l   1/1     Running   0          13h
spark-pi-driver                         1/1     Running   0          16s
spark-pi-fce54884f26045c9-exec-1        1/1     Running   0          4s

执行完成后可以看到driver属于完成状态

kubectl get pod -n spark-operator
NAME                                    READY   STATUS      RESTARTS   AGE
spark-operator-v3.1.1-ff8878fb8-pjn4l   1/1     Running     0          13h
spark-pi-driver                         0/1     Completed   0          72s

到这里我们基本上已经熟悉了如何使用Spark-Operator来提交一个简单的Spark任务。还有很多的配置大家可以参考源码中的文档。

JAVA代码集成

最后来看一下如何使用Java代码来集成到项目中

public static void main(String[] args) {
	try {
		Config config = new ConfigBuilder().withMasterUrl("https://kubernetes.docker.internal:6443")
				.build();
		KubernetesClient client = new DefaultKubernetesClient(config);
		CustomResourceDefinitionContext context = new CustomResourceDefinitionContext.Builder()
				.withGroup("sparkoperator.k8s.io")
				.withVersion("v1beta2")
				.withScope("Namespaced")
				.withName("spark-pi")
				.withPlural("sparkapplications")
				.withKind("SparkApplication")
				.build();
		MixedOperation<GenericKubernetesResource, GenericKubernetesResourceList, Resource<GenericKubernetesResource>> resourceMixedOperation =
				client.genericKubernetesResources(context);
		//文件流
		resourceMixedOperation.inNamespace("spark-operator").load(K8sUtils.class.getResourceAsStream("/spark-pi.yaml"))
				.createOrReplace();
		//监控任务的运行
		resourceMixedOperation.watch(new Watcher<GenericKubernetesResource>() {
			@Override
			public void eventReceived(Action action, GenericKubernetesResource resource) {
	
				if (action != Action.ADDED) {
					Map<String, Object> additionalProperties = resource.getAdditionalProperties();
					if (additionalProperties != null) {
						Map<String, Object> status = (Map<String, Object>) additionalProperties.get("status");
						Map<String, Object> state = (Map<String, Object>) status.get("applicationState");
						String stateRes = state.get("state").toString();
						//...
					}
				}
			}
			@Override
			public void onClose(WatcherException cause) {
				
			}
		});

	} catch (Exception e) {
		log.error("fail to build k8s ApiClient", e);
		throw new TaskException("fail to build k8s ApiClient");
	}
}

对于代码中需要指定的部分参数可以通过查询CRD的命令来获取

kubectl edit crd sparkapplications.sparkoperator.k8s.io

总结

通过上面的阅读,基本上已经了解了基于Spark Operator如何帮我们在K8S上运行Spark任务的。但是如果想要真实在生产使用还需要多多测试,以及进一步的探索。