Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading geni.core creates a default spark session #332

Open
1 task done
gnarroway opened this issue Jul 5, 2021 · 3 comments
Open
1 task done

Loading geni.core creates a default spark session #332

gnarroway opened this issue Jul 5, 2021 · 3 comments

Comments

@gnarroway
Copy link

  • I have read through the quick start and installation sections of the README.

Info

Info Value
Operating System rhel7 and windows
Geni Version 0.0.38
JDK 11.0.10
Spark Version 3.1.2

Problem / Steps to reproduce

If zero-one.geni.core has been required, it creates a default spark session which impacts the behavior of calling g/create-spark-session

Specifically, geni.core loads geni.spark-context which loads geni.defaults, which creates a spark session in an atom which should probably be a delay

(def s (g/create-spark-session {:app-name “foo”}))
(g/spark-conf s)
; => {…:spark.app.name Geni app…}
; which is the wrong name

if requiring zero-one.geni.spark directly instead (as g), the spark session is correctly configured.

The incorrect behaviour takes effect if core is required at any point before creating the session, so it is a bit problematic. As above, maybe replacing the default with a delay will be sufficient to avoid this.

Thanks for your work on this library!

@behrica
Copy link
Contributor

behrica commented Oct 13, 2024

I have as well the impression, that even "requiring" the various geni namespaces, creates a session at one point in time.
I saw it, because in a databricks cluster the "setCheckpoint" is failing.

``
IllegalArgumentException: Path must be absolute: target/checkpoint/3f38a4a8-51e9-47fc-a1d1-7c0f3e2f2520
at com.databricks.common.path.AbstractPath$.fromHadoopPath(AbstractPath.scala:114)
at com.databricks.backend.daemon.data.client.DBFSV2.resolveAndGetFileSystem(DatabricksFileSystemV2.scala:148)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.resolve(DatabricksFileSystemV2.scala:773)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$mkdirs$3(DatabricksFileSystemV2.scala:1171)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at com.databricks.s3a.S3AExceptionUtils$.convertAWSExceptionToJavaIOException(DatabricksStreamUtils.scala:66)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$mkdirs$2(DatabricksFileSystemV2.scala:1170)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$withUserContextRecorded$2(DatabricksFileSystemV2.scala:1376)
at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:48)
at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:276)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:272)
at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:46)
at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:43)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionContext(DatabricksFileSystemV2.scala:741)
at com.databricks.logging.AttributionContextTracing.withAttributionTags(AttributionContextTracing.scala:95)
at com.databricks.logging.AttributionContextTracing.withAttributionTags$(AttributionContextTracing.scala:76)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionTags(DatabricksFileSystemV2.scala:741)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withUserContextRecorded(DatabricksFileSystemV2.scala:1349)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$mkdirs$1(DatabricksFileSystemV2.scala:1169)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:527)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:631)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:651)
at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:48)
at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:276)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:272)
at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:46)
at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:43)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionContext(DatabricksFileSystemV2.scala:741)
at com.databricks.logging.AttributionContextTracing.withAttributionTags(AttributionContextTracing.scala:95)
at com.databricks.logging.AttributionContextTracing.withAttributionTags$(AttributionContextTracing.scala:76)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionTags(DatabricksFileSystemV2.scala:741)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:626)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:536)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperationWithResultTags(DatabricksFileSystemV2.scala:741)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:528)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:496)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperation(DatabricksFileSystemV2.scala:741)
at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.mkdirs(DatabricksFileSystemV2.scala:1168)
at com.databricks.backend.daemon.data.client.DatabricksFileSystem.mkdirs(DatabricksFileSystem.scala:213)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2496)
at org.apache.spark.SparkContext.$anonfun$setCheckpointDir$2(SparkContext.scala:3402)
at scala.Option.map(Option.scala:230)
at org.apache.spark.SparkContext.setCheckpointDir(SparkContext.scala:3399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:167)
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:102)
at zero_one.geni.spark$create_spark_session.invokeStatic(spark.clj:37)
at zero_one.geni.spark$create_spark_session.invoke(spark.clj:19)
at zero_one.geni.defaults__init.load(Unknown Source)
at zero_one.geni.defaults__init.(Unknown Source)

``
I had it as well when running no code, only do ns declarations

@behrica
Copy link
Contributor

behrica commented Oct 13, 2024

so basically I found no way to get a uberjar with geni executing on databricks.
(only all removals of "geni requires" made the jar executable)

@behrica
Copy link
Contributor

behrica commented Oct 13, 2024

I finally patched
https://github.com/zero-one-group/geni/blob/develop/src/clojure/zero_one/geni/defaults.clj

to have an empty map for session-config.

This (I belive) still creates a session during "require" (which in my opinon is wrong), but at at least makes the code continue, so I can setup my own session.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants