[LIVY-702]: Submit Spark apps to Kubernetes #249

jahstreet · 2019-10-27T17:37:31Z

What changes were proposed in this pull request?

This PR is one of the PRs in the series related to the splitting of the base PR #167 to multiple PRs to ease and speed up review and merge processes.

This PR proposes a way to submit Spark apps to Kubernetes cluster. Points covered:

Submit batch sessions
Submit interactive sessions
Monitor sessions, collect logs and diagnostics information
Restore sessions monitoring after restarts
GC created Kubernetes resources
Restrict the set of allowed Kubernetes namespaces

How was this patch tested?

Unit tests.

Manual testing with Kubernetes on Docker Desktop for Mac v2.1.0.1.
Environment - Helm charts:

cluster-base with custom-values.yaml:

nginx-ingress:
  controller:
    service:
      loadBalancerIP: 127.0.0.1 # my-cluster.example.com IP address (from /etc/hosts)
      loadBalancerSourceRanges: []
cluster-autoscaler:
  enabled: false
oauth2-proxy:
  enabled: false

spark-cluster with custom-values.yaml:

livy:
  image:
    pullPolicy: Never
    tag: 0.7.0-incubating-spark_2.4.3_2.11-hadoop_3.2.0-dev
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: nginx
      kubernetes.io/tls-acme: "true"
      nginx.ingress.kubernetes.io/rewrite-target: /$1
    path: /livy/?(.*)
    hosts:
    - my-cluster.example.com
    tls:
    - secretName: spark-cluster-tls
      hosts:
      - my-cluster.example.com
  persistence:
    enabled: true
  env:
    LIVY_LIVY_UI_BASE1PATH: {value: "/livy"}
    LIVY_SPARK_KUBERNETES_CONTAINER_IMAGE_PULL1POLICY: {value: "Never"}
    LIVY_SPARK_KUBERNETES_CONTAINER_IMAGE: {value: "sasnouskikh/livy-spark:0.7.0-incubating-spark_2.4.3_2.11-hadoop_3.2.0-dev"}
    LIVY_LIVY_SERVER_SESSION_STATE0RETAIN_SEC: {value: "300s"}
    LIVY_LIVY_SERVER_KUBERNETES_ALLOWED1NAMESPACES: {value: "default,test"}
historyserver:
  enabled: false
jupyterhub:
  enabled: true
  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: nginx
      kubernetes.io/tls-acme: "true"
    hosts:
    - my-cluster.example.com
    pathSuffix: ''
    tls:
    - secretName: spark-cluster-tls
      hosts:
      - my-cluster.example.com
  hub:
    baseUrl: /jupyterhub
    publicURL: "https://my-cluster.example.com"
    activeServerLimit: 10
    # $> openssl rand -hex 32
    cookieSecret: 41b85e5f50222b1542cc3b38a51f4d744864acca5e94eeb78c6e8c19d89eb433
    pdb:
      enabled: true
      minAvailable: 0
  proxy:
    # $> openssl rand -hex 32
    secretToken: cc52356e9a19a50861b22e08c92c40b8ebe617192f77edb355b9bf4b74b055de
    pdb:
      enabled: true
      minAvailable: 0
  cull:
    enabled: false
    timeout: 300
    every: 60

Interactive sessions - Jupyter notebook on JupyterHub with Sparkmagic
Batch sessions - SparkPi:

curl -k -H 'Content-Type: application/json' -X POST \
  -d '{
        "name": "SparkPi-01",
        "className": "org.apache.spark.examples.SparkPi",
        "numExecutors": 2,
        "file": "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.3.jar",
        "args": ["10000"],
        "conf": {
            "spark.kubernetes.namespace": "<namespace>"
        }
      }' "https://my-cluster.example.com/livy/batches"

codecov-io · 2019-10-27T19:01:23Z

Codecov Report

Merging #249 into master will decrease coverage by 1.52%.
The diff coverage is 34.53%.

@@             Coverage Diff              @@
##             master     #249      +/-   ##
============================================
- Coverage     68.19%   66.66%   -1.53%     
- Complexity      964      982      +18     
============================================
  Files           104      105       +1     
  Lines          5952     6252     +300     
  Branches        900      955      +55     
============================================
+ Hits           4059     4168     +109     
- Misses         1314     1483     +169     
- Partials        579      601      +22

Impacted Files	Coverage Δ	Complexity Δ
...ain/java/org/apache/livy/rsc/driver/RSCDriver.java	`79.33% <0.00%> (-0.67%)`	`45.00 <0.00> (ø)`
...e/livy/server/interactive/InteractiveSession.scala	`69.76% <0.00%> (-0.41%)`	`51.00 <0.00> (ø)`
...rc/main/scala/org/apache/livy/utils/SparkApp.scala	`45.23% <5.55%> (-30.77%)`	`1.00 <0.00> (ø)`
...main/scala/org/apache/livy/server/LivyServer.scala	`33.03% <20.00%> (+<0.01%)`	`11.00 <1.00> (ø)`
...ala/org/apache/livy/utils/SparkKubernetesApp.scala	`32.42% <32.42%> (ø)`	`14.00 <14.00> (?)`
rsc/src/main/java/org/apache/livy/rsc/RSCConf.java	`88.18% <100.00%> (+0.33%)`	`9.00 <1.00> (+1.00)`
...rver/src/main/scala/org/apache/livy/LivyConf.scala	`96.42% <100.00%> (+0.29%)`	`23.00 <2.00> (+2.00)`
.../scala/org/apache/livy/sessions/SessionState.scala	`61.11% <0.00%> (ø)`	`2.00% <0.00%> (ø%)`
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee7fdfc...e087b39. Read the comment docs.

yiheng · 2019-10-30T03:39:57Z

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala

+  private var sessionLeakageCheckInterval: Long = _
+  private val leakedAppTags = new java.util.concurrent.ConcurrentHashMap[String, Long]()
+
+  private val leakedAppsGCThread = new Thread() {


Do we need a GC thread here? RSCDriver will shut down itself if there's no client come in for a while. Please check this code

Actually this GC thread collects leacked apps in cases when Driver cannot be discovered after its submission (usually when there are not enough resources in the cluster, or some error by accident); look here.
What about RSCDriver shutdown it comes to play only if Spark Driver has been launched, and it works only for Interactive sessions.
You may also think here about Livy GC which collects expired session states, but it can and usually will be configured with bigger timeout and serves another purpose.
Does that make it more clear for you?

arunmahadevan · 2019-10-31T18:48:21Z

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala

+          kubernetesDiagnostics = ArrayBuffer(e.getMessage)
+          changeState(SparkApp.State.FAILED)
+      } finally {
+        listener.foreach(_.infoChanged(AppInfo(sparkUiUrl = None)))


How is the user expected to access the driver UI ? Without setting up the ingress and surfacing that URL, it may not be very useful. The original patch handled this and think should be part of the basic requirements.

Suggested change

listener.foreach(_.infoChanged(AppInfo(sparkUiUrl = None)))

Suggestion: It shouldn't be unset cause it hasn't been set.

As a first iteration this PR provides a way to submit and track the Spark apps by Livy, as well to integrate Interactive sessions with Notebooks (or whatever). To access Spark UI user still needs to handle access on its own so far. The easiest way I guess is to use kubectl port-forward ... manually.

There could be multiple points of view on how is it better to compose base PR splitting or whether it even makes sense to do it, unfortunately... But would be nice to let it go at some point. I propose to create small PRs representing single aspect of the whole project each. We can merge them to bigger ones at any time, but splitting out is a bit more painful. In the meantime you can take a look at the following one : #252 .

Or we can always roll back to the original PR and just refactor it up to the acceptable state.

WDYT?

jahstreet · 2019-11-02T11:42:58Z

@mgaido91 Could you take a look?

jahstreet · 2019-11-10T18:55:33Z

@mgaido91 ping.

mgaido91 · 2019-11-11T16:11:43Z

@jahstreet I am not the best guy to take a look at this honestly. I am reviewing this PR in a few hours, but would be great to have feedbacks also from other people who are more familiar with this part of Livy. cc @vanzin @jerryshao

jahstreet · 2019-11-11T16:13:10Z

@jahstreet I am not the best guy to take a look at this honestly. I am reviewing this PR in a few hours, but would be great to have feedbacks also from other people who are more familiar with this part of Livy. cc @vanzin @jerryshao

Ah, I see. Will try to ping them. Thanks anyway.

mgaido91 · 2019-11-11T16:51:38Z

server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala

@@ -399,7 +399,13 @@ class InteractiveSession(
    app = mockApp.orElse {
      val driverProcess = client.flatMap { c => Option(c.getDriverProcess) }
        .map(new LineBufferedProcess(_, livyConf.getInt(LivyConf.SPARK_LOGS_SIZE)))
-      driverProcess.map { _ => SparkApp.create(appTag, appId, driverProcess, livyConf, Some(this)) }
+      driverProcess.map(_ => SparkApp.create(appTag, appId, driverProcess, livyConf, Some(this)))
+        .orElse {


nit: on the line above

sorry, i haven't got the idea. what is nit ?

it is the short for nitpicking.

I meant to put .oElse { immediately after the ( on the line above. It is just a style thing.

Exceeds line length then

driverProcess.map( _ => SparkApp.create(appTag, appId, driverProcess, livyConf, Some(this))).orElse {

mgaido91 · 2019-11-11T16:53:18Z

server/src/main/scala/org/apache/livy/utils/SparkApp.scala

+      if (kubernetesNamespaces.nonEmpty && !kubernetesNamespaces.contains(targetNamespace)) {
+        throw new IllegalArgumentException(
+          s"Requested namespace $targetNamespace doesn't match the configured: " +
+            s"${kubernetesNamespaces.mkString(", ")}")


Suggested change

s"${kubernetesNamespaces.mkString(", ")}")

kubernetesNamespaces.mkString(", "))

mgaido91 · 2019-11-11T16:55:59Z

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala

+            apps.get(leakedApp.getKey) match {
+              case Some(seq) =>
+                seq.foreach(app =>
+                  if (withRetry(kubernetesClient.killApplication(app))) {


what if this return false? at least a warning?

Nice catch!

mgaido91 · 2019-11-11T16:58:17Z

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala

+          // kill the app if found it or remove it if exceeding a threshold
+          val leakedApps = leakedAppTags.entrySet().iterator()
+          val now = System.currentTimeMillis()
+          val apps = withRetry(kubernetesClient.getApplications()).groupBy(_.getApplicationTag)


I don't see exception handling here....an exception here destroys the thread so so leakage removal works anymore after an exception?

Nice catch!

mgaido91 · 2019-11-11T16:58:37Z

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala

+  }
+
+  private[utils] def mapKubernetesState(
+    kubernetesAppState: String,


nit: another indent

and here. could you please explain what do you mean?

add other 2 spaces indent here, like:

Suggested change

kubernetesAppState: String,

kubernetesAppState: String,

mgaido91 · 2019-11-11T16:58:55Z

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala

+        SparkApp.State.FAILED
+    }
+  }
+


nit: remove this empty line

mgaido91 · 2019-11-11T16:59:07Z

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala

+}
+
+class SparkKubernetesApp private[utils](
+  appTag: String,


nit: indent once more

mgaido91 · 2019-11-11T17:01:13Z

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala

+        process.map(_.errorLines).getOrElse(ArrayBuffer.empty[String]))) ++
+      ("\nKubernetes Diagnostics: " +: kubernetesDiagnostics)
+
+  override def kill(): Unit =


please add { and } around methods everywhere, even when not necessary. They help readability.

mgaido91 · 2019-11-11T17:01:29Z

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala

+        if (deadline.isOverdue) {
+          process.foreach(_.destroy())
+          leakedAppTags.put(appTag, System.currentTimeMillis())
+          throw new IllegalStateException(s"No Kubernetes application is found with tag" +


Suggested change

throw new IllegalStateException(s"No Kubernetes application is found with tag" +

throw new IllegalStateException("No Kubernetes application is found with tag" +

jahstreet · 2019-11-14T09:37:56Z

Build vailure due to Travis:

No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received

@jerryshao @vanzin could you take a look please? Your review would be really helpful to let that PR go.

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala

jahstreet · 2019-12-05T09:50:46Z

@yiheng @arunmahadevan @mgaido91
Anything else from your side?

jahstreet · 2019-12-05T09:53:06Z

Rebased

jahstreet · 2020-01-14T11:48:13Z

Rebased to master.

ajbozarth

I saw your email and took some time to check out your incremental changes. With my focus on the UI and Confs side of Livy, I've only reviewed the code changes to existing files (I'll leave reviews of SparkKubernetesApp.scala and SparkKubernetesAppSpec.scala to those with better knowledge and more time). Overall your conf related changes look good, only a small note on an added if block.

ajbozarth · 2020-01-14T22:18:53Z

server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala

@@ -402,6 +402,9 @@ class InteractiveSession(

      if (livyConf.isRunningOnYarn() || driverProcess.isDefined) {
        Some(SparkApp.create(appTag, appId, driverProcess, livyConf, Some(this)))
+      } else if (livyConf.isRunningOnKubernetes()) {
+        // Create SparkKubernetesApp anyway to recover app monitoring on Livy server restart
+        Some(SparkApp.create(appTag, appId, driverProcess, livyConf, Some(this)))


If this is just the same line as 404 why is it in it's own else if block? Wouldn't it make more sense to add || livyConf.isRunningOnKubernetes() to the if on line 403?

Ahh, nice catch, agree. EDIT: resolved

ghost · 2020-02-10T16:17:22Z

can we merge this? :)

jahstreet · 2020-02-10T16:48:30Z

can we merge this? :)

I would also love to! Opened for the suggestions on how to get closer to it.

SarnathK · 2020-03-28T06:17:03Z

Is there a timeline when this will get integrated with Livy? This would help us run Jupyter on Spark on Kubernetes. Any ETA will be very helpful! Thanks!

jahstreet · 2020-03-28T14:05:16Z

Hi @SarnathK , I've tried to contact the community multiple times via mailing lists with no luck to push this forward.
I'm tracking the activity around this work and have a list of patches on top of it in the backlog. Also I'm always ready to provide the full support around on up Livy on Kubernetes.
I could add you to the thread so you could share your use cases with the community to pay more attention to this patch if you don't mind. Don't you?

ajbozarth · 2020-03-30T02:22:44Z

@jerryshao do you have bandwidth to review this, I've done a partial review above, but need another pair of eyes.

jerryshao · 2020-03-30T02:27:40Z

I can take a chance to review this, but I'm not an export of k8s, may not fully understand the pros and cons of the implementation.

cometta · 2022-09-27T12:58:48Z

any estimated time frame this ticket can be merged?

anistal · 2023-01-04T09:41:28Z

I agree with all the comments: a lot of effort that should be at least considered

jinhaichen · 2023-03-15T08:37:56Z

rsc/src/main/java/org/apache/livy/rsc/RSCConf.java

+  public boolean isRunningOnKubernetes() {
+    return Optional.ofNullable(get("livy.spark.master"))
+            .filter(s -> s.startsWith("k8s"))
+            .isPresent();
+  }
+


this function may always return false, since "livy.spark.master" will not get by RSCConf

Yeah, seems like that's what I've faced in #249 (comment)

However, it seems like it's not affecting functionality, as this function is used while setting RPC_SERVER_ADDRESS here:
https://github.com/apache/incubator-livy/pull/249/files/b87c0cebb65ce7f34e6b4b6b738095be6254cf69#diff-43114318c4b009c2404f7eb326a84c184fb1501a3237c49a771df851d0f6f328R172-R178

And the value of RPC_SERVER_ADDRESS is not used anyway since Livy 0.7 because of things I've explained in #388.

askhatri · 2023-03-28T17:55:44Z

I have validated this fix in the new git branch. I found that the fix is working as expected. The detailed steps used during the validation are documented at README.md. Should we consider merging the fix?

lmccay · 2023-03-28T21:19:31Z

@askhatri - I would say that you can merge this given your review and testing.

gyogal · 2023-03-29T08:04:58Z

Thank you @lmccay! This LGTM as well and based on @askhatri 's testing, if there are no objections, I think we should go ahead and merge this. We can address the remaining issues in separate tickets.

jahstreet · 2023-03-29T08:11:27Z

Happy to see things going in this PR 🎉 . Thank you a lot folks for putting the effort in reviewing and testing the change.

@askhatri can I help you with anything to test the work?

I'd love to pay my time to finalize the activity here and then I have something to offer on top of it to contribute with.

askhatri · 2023-03-29T08:45:17Z

Happy to see things going in this PR 🎉 . Thank you a lot folks for putting the effort in reviewing and testing the change.

@askhatri can I help you with anything to test the work?

I'd love to pay my time to finalize the activity here and then I have something to offer on top of it to contribute with.

Thank you, @jahstreet for offering your help in testing.

During my initial testing, I found that the code is working as expected. Only one observation is that we might need to upgrade spark-on-kubernetes-helm to support Helm Chart 3.x and Kubernetes latest version 1.24 or higher.

jahstreet · 2023-04-03T10:13:37Z

Happy to see things going in this PR 🎉 . Thank you a lot folks for putting the effort in reviewing and testing the change.
@askhatri can I help you with anything to test the work?
I'd love to pay my time to finalize the activity here and then I have something to offer on top of it to contribute with.

Thank you, @jahstreet for offering your help in testing.

During my initial testing, I found that the code is working as expected. Only one observation is that we might need to upgrade spark-on-kubernetes-helm to support Helm Chart 3.x and Kubernetes latest version 1.24 or higher.

Looking into it, not 100% sure we can get latest K8s version, given latest Spark 3.3.2 works with Fabric8 client 5.12.2 which aims for K8s <= 1.23.13 (as per compatibility matrix). Will share my findings ...

UPD: seems the latest Spark 3.4.0 already bumped the K8s client version. We are getting there folks ...

askhatri · 2023-04-03T11:15:08Z

Happy to see things going in this PR 🎉 . Thank you a lot folks for putting the effort in reviewing and testing the change.
@askhatri can I help you with anything to test the work?
I'd love to pay my time to finalize the activity here and then I have something to offer on top of it to contribute with.

Thank you, @jahstreet for offering your help in testing.
During my initial testing, I found that the code is working as expected. Only one observation is that we might need to upgrade spark-on-kubernetes-helm to support Helm Chart 3.x and Kubernetes latest version 1.24 or higher.

Looking into it, not 100% sure we can get latest K8s version, given latest Spark 3.3.2 works with Fabric8 client 5.12.2 which aims for K8s <= 1.23.13 (as per compatibility matrix). Will share my findings ...

Thank you @jahstreet

satishdalli · 2023-05-09T14:07:57Z

Absolutely great work @jahstreet. We are waiting for PR with Helm 3 and k8s 1.24+ versions support. currently, we are using livy with consistent resources. we want to try this in AKS, EKS, and on-prem k8s clusters as a server-less livy.

lmccay · 2023-05-09T23:53:57Z

Hey Folks - I noticed this JIRA is listed for 0.8.0 and would like to get a sense for how far out this may be.
I'm trying to burn down the release blockers so that we can tackle the process of the first release since the reboot.
If I don't hear otherwise, I will move it to 0.9.0 and we can discuss pulling it back in on the JIRA - if need be.

jahstreet · 2023-05-12T07:09:01Z

Hi @lmccay , thank you for keeping eye on it.

I think this PR is already battle tested and good to go. There is already a chain of work done on top of it by me and other people who left feedback on this chunk of work. If we make it a part of master I'm 100% sure I won't be the only one pushing Livy project to the world of K8s. Besides that, seeing the progress after the years of waiting would boost the motivation to continue contributions... So, including the upgrade of dependencies to support latest K8s and Spark versions, I propose to tackle those in separate PRs. How do you feel about merging it to 0.8 and is there anything formal we should do to make it happen?

(resolving the merge conflicts in the meantime)

idzikovsky · 2023-05-12T09:40:44Z

Yeah. We've been using Livy on K8s with fixes from this PR and PR #252 for near 2 years now, in different configurations. Including different Spark versions starting from 2.4.4 and up to 3.3.1 (however some fixes where made in Livy itself to make it compatible with Spark 3.3).
Everything seems to be fine by now, except for the setup with Istio.
To resolve those issues, we applied fix from #388 and reverted the thing which I described in #249 (comment) and #249 (comment).

But in general everything works fantastic.
Thank you @jahstreet

lmccay · 2023-05-12T13:08:25Z

@jahstreet and @idzikovsky - if we merge this before branching from 0.8.0 and it doesn't cause any issues due to other work not being there yet then I would have no problem with doing so. I'm personally not in a position to +1 the merge. Would someone that has tested it and/or tested with it inplace but not necessarily exercised be able to +1 it as a review?

jahstreet · 2023-08-02T09:07:44Z

@lmccay I think we need to summon a maintainer with K8s expertise, is that something you expect? Do you know any name(s) we can put here and follow-up on that together?

lmouhib · 2023-09-20T19:39:34Z

@jahstreet The implementation is great. I got couple of remarks/questions on the way it handle the logs and the ingress, the current implementation is opinionated for ngnix (for spark ui) and lokki, while both are widely adopted, how can someone with another ingress controller use their own (withouth modifying your code)? I understand the log might be a bit more challenging to provide the native UI integration, in that case maybe offer the possibility to use a sidecar for log shipping?

For livy there is a block that look up for the hostname, this does not play well with containers/k8s world. Would it be possible to make it more k8s native to work with a service, this would mean we might need to pass a svc as ENV variable. But at least we do not address directly the pod with its ip address directly.

ozsoyler · 2023-10-18T13:39:28Z

Hi. Is that possible to merge this into apache-incubator-livy officially on new version "0.9.0"? I guess if it merges, we will be able to use livy on kubernetes for launching our awesome spark jobs! Thanks..

askhatri · 2024-06-07T12:31:51Z

Adding Kubernetes support to Apache Livy is a valuable enhancement for the next release. Should we consider merging this into the master branch?

lmccay · 2024-06-07T18:28:59Z

@askhatri - yes, I think we should push for an active approval on the merge of this. We can likely mark this as an important contribution for the next release. Let's move this forward now.

jesinity · 2024-06-07T19:11:05Z

I've been following this issue since forever, as it would have been highly beneficial somewhat like 2 companies ago.
The work has been done but it was never merged. Is there maybe some passive obstruction against it? Looks like so.

idzikovsky · 2024-06-07T19:49:01Z

@jesinity there is no passive obstruction against it here.
The problem is that there are no active Livy maintainers who are willing to review this and/or familiar enough with K8s.

devstein · 2024-06-07T22:52:43Z

To help with the K8s review/perspective, I know there are teams that have been using this branch in production for years (cc @jpugliesi).

askhatri · 2024-06-25T11:36:36Z

I have created a new #451 with the update includes a newer version of the Kubernetes client and adds code to display the Spark UI. CC: @jahstreet

This pull request (PR) is the foundational PR for adding Kubernetes support in Apache Livy, originally found here (#249). This update includes a newer version of the Kubernetes client and adds code to display the Spark UI. ## Summary of the Proposed Changes This PR introduces a method to submit Spark applications to a Kubernetes cluster. The key points covered include: * Submitting batch sessions * Submitting interactive sessions * Monitoring sessions, collecting logs, and gathering diagnostic information * Restoring session monitoring after restarts * Garbage collection (GC) of created Kubernetes resources JIRA link: https://issues.apache.org/jira/browse/LIVY-702 ## How was this patch tested? * Unit Tests: The patch has been verified through comprehensive unit tests. * Manual Testing: Conducted manual testing using Kubernetes on Docker Desktop. * Environment: Helm charts. For detailed instructions on testing using Helm charts, please refer to the documentation available at https://github.com/askhatri/livycluster Co-authored-by: Asif Khatri <[email protected]> Co-authored-by: Alex Sasnouskikh <[email protected]>

This pull request (PR) is the foundational PR for adding Kubernetes support in Apache Livy, originally found here (apache#249). This update includes a newer version of the Kubernetes client and adds code to display the Spark UI. ## Summary of the Proposed Changes This PR introduces a method to submit Spark applications to a Kubernetes cluster. The key points covered include: * Submitting batch sessions * Submitting interactive sessions * Monitoring sessions, collecting logs, and gathering diagnostic information * Restoring session monitoring after restarts * Garbage collection (GC) of created Kubernetes resources JIRA link: https://issues.apache.org/jira/browse/LIVY-702 ## How was this patch tested? * Unit Tests: The patch has been verified through comprehensive unit tests. * Manual Testing: Conducted manual testing using Kubernetes on Docker Desktop. * Environment: Helm charts. For detailed instructions on testing using Helm charts, please refer to the documentation available at https://github.com/askhatri/livycluster Co-authored-by: Asif Khatri <[email protected]> Co-authored-by: Alex Sasnouskikh <[email protected]>

jahstreet changed the title ~~[LIVY-588]: Submit Spark apps to Kubernetes~~ [LIVY-702]: Submit Spark apps to Kubernetes Oct 27, 2019

jahstreet mentioned this pull request Oct 27, 2019

[LIVY-588]: Full support for Spark on Kubernetes #167

Open

yiheng reviewed Oct 30, 2019

View reviewed changes

ogidogi mentioned this pull request Oct 31, 2019

Jupyterhub + Sparkmagic/levy + Spark on K8s? jupyter-incubator/sparkmagic#575

Open

This was referenced Oct 31, 2019

[LIVY-703] Show SparkUI link on Livy UI when running on Kubernetes jahstreet/incubator-livy#35

Open

[LIVY-703] Show SparkUI link on Livy UI when running on Kubernetes #252

Open

arunmahadevan reviewed Oct 31, 2019

View reviewed changes

mgaido91 reviewed Nov 11, 2019

View reviewed changes

jahstreet commented Dec 5, 2019

View reviewed changes

server/src/main/scala/org/apache/livy/utils/SparkKubernetesApp.scala Outdated Show resolved Hide resolved

jahstreet requested a review from mgaido91 December 5, 2019 09:44

jahstreet force-pushed the kubernetes-support-initial branch from 825741b to b54eee9 Compare December 5, 2019 09:46

jahstreet force-pushed the kubernetes-support-initial branch from e3740ae to 865aa24 Compare December 5, 2019 09:53

jahstreet force-pushed the kubernetes-support-initial branch from 865aa24 to 4a6501c Compare January 14, 2020 11:46

ajbozarth reviewed Jan 14, 2020

View reviewed changes

smrtl mentioned this pull request Jan 22, 2020

[LIVY-560] Added support for hostnames that resolve to an external address #145

Closed

idzikovsky mentioned this pull request Jan 31, 2023

Provide a way to disable the fix for LIVY-697 #388

Open

jinhaichen reviewed Mar 15, 2023

View reviewed changes

Merge branch 'master' into kubernetes-support-initial

421a8f8

jshmchenxi mentioned this pull request Sep 14, 2023

Jahstreet kubernetes support initial #420

Closed

askhatri mentioned this pull request Jun 25, 2024

[LIVY-702]: Submit Spark apps to Kubernetes #451

Merged

	s"${kubernetesNamespaces.mkString(", ")}")
	kubernetesNamespaces.mkString(", "))

	throw new IllegalStateException(s"No Kubernetes application is found with tag" +
	throw new IllegalStateException("No Kubernetes application is found with tag" +

[LIVY-702]: Submit Spark apps to Kubernetes #249

Are you sure you want to change the base?

[LIVY-702]: Submit Spark apps to Kubernetes #249

Conversation

jahstreet commented Oct 27, 2019 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

codecov-io commented Oct 27, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

jahstreet Oct 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jahstreet Oct 31, 2019 • edited Loading

Choose a reason for hiding this comment

jahstreet commented Nov 2, 2019

jahstreet commented Nov 10, 2019

mgaido91 commented Nov 11, 2019

jahstreet commented Nov 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jahstreet Nov 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jahstreet commented Nov 14, 2019 • edited Loading

jahstreet commented Dec 5, 2019

jahstreet commented Dec 5, 2019

jahstreet commented Jan 14, 2020

ajbozarth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jahstreet Jan 15, 2020 • edited Loading

Choose a reason for hiding this comment

ghost commented Feb 10, 2020

jahstreet commented Feb 10, 2020

SarnathK commented Mar 28, 2020

jahstreet commented Mar 28, 2020 • edited Loading

ajbozarth commented Mar 30, 2020

jerryshao commented Mar 30, 2020

cometta commented Sep 27, 2022

anistal commented Jan 4, 2023

Choose a reason for hiding this comment

idzikovsky Mar 15, 2023 • edited Loading

Choose a reason for hiding this comment

askhatri commented Mar 28, 2023

lmccay commented Mar 28, 2023

gyogal commented Mar 29, 2023

jahstreet commented Mar 29, 2023

askhatri commented Mar 29, 2023

jahstreet commented Apr 3, 2023 • edited Loading

askhatri commented Apr 3, 2023

satishdalli commented May 9, 2023 • edited Loading

lmccay commented May 9, 2023

jahstreet commented May 12, 2023

idzikovsky commented May 12, 2023

lmccay commented May 12, 2023

jahstreet commented Aug 2, 2023

lmouhib commented Sep 20, 2023 • edited Loading

ozsoyler commented Oct 18, 2023

askhatri commented Jun 7, 2024

lmccay commented Jun 7, 2024

jesinity commented Jun 7, 2024

idzikovsky commented Jun 7, 2024

devstein commented Jun 7, 2024

askhatri commented Jun 25, 2024

jahstreet commented Oct 27, 2019 •

edited

Loading

codecov-io commented Oct 27, 2019 •

edited

Loading

jahstreet Oct 30, 2019 •

edited

Loading

jahstreet Oct 31, 2019 •

edited

Loading

jahstreet Nov 11, 2019 •

edited

Loading

jahstreet commented Nov 14, 2019 •

edited

Loading

jahstreet Jan 15, 2020 •

edited

Loading

jahstreet commented Mar 28, 2020 •

edited

Loading

idzikovsky Mar 15, 2023 •

edited

Loading

jahstreet commented Apr 3, 2023 •

edited

Loading

satishdalli commented May 9, 2023 •

edited

Loading

lmouhib commented Sep 20, 2023 •

edited

Loading