You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+ ngc batch exec --commandline bash -c 'cat /raid/tmp/driver-agaricus-Main-CPU.log' 7117740
WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
INFO SparkContext: Running Spark version 3.5.0
INFO SparkContext: OS info Linux, 5.4.0-107-generic, amd64
INFO SparkContext: Java version 1.8.0_402
WARN SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/and LOCAL_DIRS in YARN).
INFO ResourceUtils: ==============================================================
INFO ResourceUtils: No custom resources configured for spark.driver.
INFO ResourceUtils: ==============================================================
INFO SparkContext: Submitted application: Agaricus-Mai-csv
INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 8, script: , vendor: , memory t: 32768, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
INFO ResourceProfile: Limiting resource is cpus at 8 tasks per executor
INFO ResourceProfileManager: Added ResourceProfile id: 0
INFO SecurityManager: Changing view acls to: root
INFO SecurityManager: Changing modify acls to: root
INFO SecurityManager: Changing view acls groups to:
INFO SecurityManager: Changing modify acls groups to:
INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: root; groups with view ers with modify permissions: root; groups with modify permissions: EMPTY
INFO Utils: Successfully started service 'sparkDriver' on port 39803.
INFO SparkEnv: Registering MapOutputTracker
INFO SparkEnv: Registering BlockManagerMaster
INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
INFO SparkEnv: Registering BlockManagerMasterHeartbeat
INFO DiskBlockManager: Created local directory at /raid/tmp/blockmgr-0034f8a7-578b-4364-bce3-68225f9bf27b
INFO MemoryStore: MemoryStore started with capacity 8.4 GiB
INFO SparkEnv: Registering OutputCommitCoordinator
INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
INFO Utils: Successfully started service 'SparkUI' on port 4040.
INFO SparkContext: Added JAR file:///test/xgboost4j-spark.jar at spark://127.0.0.1:39803/jars/xgboost4j-spark.jar with timestamp
INFO SparkContext: Added JAR file:/test/xgb-apps.jar at spark://127.0.0.1:39803/jars/xgb-apps.jar with timestamp 1729610887859
INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://127.0.0.1:7077...
INFO TransportClientFactory: Successfully created connection to /127.0.0.1:7077 after 41 ms (0 ms spent in bootstraps)
INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20241022152809-0001
INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20241022152809-0001/0 on worker-20241022145613-127.0.0.1-35209 8 core(s)
INFO StandaloneSchedulerBackend: Granted executor ID app-20241022152809-0001/0 on hostPort 127.0.0.1:35209 with 8 core(s), 32.0 GiB RAM
INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20241022152809-0001/1 on worker-20241022145613-127.0.0.1-35209 8 core(s)
INFO StandaloneSchedulerBackend: Granted executor ID app-20241022152809-0001/1 on hostPort 127.0.0.1:35209 with 8 core(s), 32.0 GiB RAM
INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20241022152809-0001/2 on worker-20241022145611-127.0.0.1-42465 8 core(s)
INFO StandaloneSchedulerBackend: Granted executor ID app-20241022152809-0001/2 on hostPort 127.0.0.1:42465 with 8 core(s), 32.0 GiB RAM
INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20241022152809-0001/3 on worker-20241022145611-127.0.0.1-42465 8 core(s)
INFO StandaloneSchedulerBackend: Granted executor ID app-20241022152809-0001/3 on hostPort 127.0.0.1:42465 with 8 core(s), 32.0 GiB RAM
INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40511.
INFO NettyBlockTransferService: Server created on 127.0.0.1:40511
INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 127.0.0.1, 40511, None)
INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:40511 with 8.4 GiB RAM, BlockManagerId(driver, 127.0.0.1, 40511,
INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 127.0.0.1, 40511, None)
INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 127.0.0.1, 40511, None)
INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20241022152809-0001/3 is now RUNNING
INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20241022152809-0001/2 is now RUNNING
INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20241022152809-0001/1 is now RUNNING
INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20241022152809-0001/0 is now RUNNING
INFO SingleEventLogFileWriter: Logging events to file:/tmp/spark-events/app-20241022152809-0001.inprogress
INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
INFO SharedState: Warehouse path is 'file:/spark-warehouse'.
WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
INFO MetricsSystemImpl: s3a-file-system metrics system started
INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) ID 2, ResourceProfileId 0
INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:40013 with 16.9 GiB RAM, BlockManagerId(2, 127.0.0.1, 40013, None)
INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) ID 0, ResourceProfileId 0
INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:35771 with 16.9 GiB RAM, BlockManagerId(0, 127.0.0.1, 35771, None)
INFO InMemoryFileIndex: It took 83 ms to list leaf files for 1 paths.
INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) ID 1, ResourceProfileId 0
INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) ID 3, ResourceProfileId 0
INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:38295 with 16.9 GiB RAM, BlockManagerId(1, 127.0.0.1, 38295, None)
INFO BlockManagerMasterEndpoint: Registering block manager 127.0.0.1:45991 with 16.9 GiB RAM, BlockManagerId(3, 127.0.0.1, 45991, None)
INFO InMemoryFileIndex: It took 26 ms to list leaf files for 1 paths.
------ Training ------
Exception in thread "main" WARN SparkStringUtils: Truncated the string representation of a plan since it was too large. This behavior can 'spark.sql.debug.maxToStringFields'.
org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `features` cannot be resolved. Did you ing? [`feature_0`, `feature_1`, `feature_2`, `feature_3`, `feature_4`].;
'Project [cast(label#509 as float) AS label#639, 'features]
+- Project [cast(label#0 as double) AS label#509, feature_0#1, feature_1#2, feature_2#3, feature_3#4, feature_4#5, feature_5#6, feature_6#7, feature_7#8, #10, feature_10#11, feature_11#12, feature_12#13, feature_13#14, feature_14#15, feature_15#16, feature_16#17, feature_17#18, feature_18#19, feature_19#20, _21#22, feature_22#23, ... 103 more fields]
+- Relation eature_1#2,feature_2#3,feature_3#4,feature_4#5,feature_5#6,feature_6#7,feature_7#8,feature_8#9,feature_9#10,feature_10#11,feature_11#12,feature_12#13,featureeature_15#16,feature_16#17,feature_17#18,feature_18#19,feature_19#20,feature_20#21,feature_21#22,feature_22#23,... 103 more fields] csv
at org.apache.spark.sql.errors.QueryCompilationErrors$.unresolvedAttributeError(QueryCompilationErrors.scala:307)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$te(CheckAnalysis.scala:147)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$6(CheckAnalysis.scala:266)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$6$adapted(CheckAnalysis.scala:264)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:264)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:264)
at scala.collection.immutable.Stream.foreach(Stream.scala:533)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:264)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:182)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:182)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:164)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:188)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:160)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:150)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:188)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:211)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:208)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:91)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89)
at org.apache.spark.sql.Dataset.withPlan(Dataset.scala:4363)
at org.apache.spark.sql.Dataset.select(Dataset.scala:1541)
at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.preprocess(XGBoostEstimator.scala:210)
at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.preprocess$(XGBoostEstimator.scala:188)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.preprocess(XGBoostClassifier.scala:33)
at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train(XGBoostEstimator.scala:415)
at ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator.train$(XGBoostEstimator.scala:409)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:33)
at ml.dmlc.xgboost4j.scala.spark.XGBoostClassifier.train(XGBoostClassifier.scala:33)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:114)
at com.nvidia.spark.examples.agaricus.Main$.$anonfun$main$8(Main.scala:77)
at com.nvidia.spark.examples.utility.Benchmark.time(Benchmark.scala:29)
at com.nvidia.spark.examples.agaricus.Main$.main(Main.scala:77)
at com.nvidia.spark.examples.agaricus.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
INFO SparkContext: Invoking stop() from shutdown hook
INFO SparkContext: SparkContext is stopping with exitCode 0.
INFO SparkUI: Stopped Spark web UI at http://127.0.0.1:4040
INFO StandaloneSchedulerBackend: Shutting down all executors
INFO StandaloneSchedulerBackend$StandaloneDriverEndpoint: Asking each executor to shut down
INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
INFO MemoryStore: MemoryStore cleared
INFO BlockManager: BlockManager stopped
INFO BlockManagerMaster: BlockManagerMaster stopped
INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
ERROR TransportRequestHandler: Error sending result StreamResponse[streamId=/jars/xgboost4j-8696354,body=FileSegmentManagedBuffer[file=/test/xgboost4j-spark.jar,offset=0,length=338696354]] to /127.0.0.1:33046; closing connection
io.netty.channel.StacklessClosedChannelException
at io.netty.channel.AbstractChannel.close(ChannelPromise)(Unknown Source)
INFO SparkContext: Successfully stopped SparkContext
INFO ShutdownHookManager: Shutdown hook called
INFO ShutdownHookManager: Deleting directory /tmp/spark-1d29e677-7338-4fc8-bec8-e57284298ca1
INFO ShutdownHookManager: Deleting directory /raid/tmp/spark-0dcd6655-62da-49f8-ba12-59f4e9c5739c
INFO MetricsSystemImpl: Stopping s3a-file-system metrics system...
INFO MetricsSystemImpl: s3a-file-system metrics system stopped.
INFO MetricsSystemImpl: s3a-file-system metrics system shutdown complete.
real 0m15.488s
user 0m26.418s
sys 0m3.454s
0
The text was updated successfully, but these errors were encountered:
XGBoostj4-spark train failed on the CPU hosts,
ENVS:
1, OS: ubuntu22.04/NGC
2, Spark ver: 3.5.1
3, XGBoost4j-spark: xgboost4j-spark-gpu_2.12-2.2.0-SNAPSHOT.jar
4, rapids-4-spark: 24.12.0-SNAPSHOT
5, failed test agaricus train
The text was updated successfully, but these errors were encountered: