noisy --fail-fast logs #804

taylorterwin · 2024-09-23T13:53:57Z

User has raised that utilizing the --fail-fast flag for job runs in dbt Cloud scheduled runs is causing incredibly noisy logging, making surfacing the error and actual issue difficult.

23 thread concurrency
There are models that are running at the same time
But fail fast says to terminate the run as soon as we run into a single error
The logging is interesting - as we can see that the databricks adapter is going through cancelling the connections, meanwhile with queries that have started are still trying to connect to the server but the connection has been canceled, this error occurs:

: Error during request to server: RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist.
Error properties: attempt=1/30, bounded-retry-delay=None, elapsed-seconds=0.21970534324645996/900.0, error-message=RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist., http-code=404, method=GetOperationStatus, no-retry-reason=non-retryable error, original-exception=RESOURCE_DOES_NOT_EXIST: Command 01ef6e95-db69-140e-a8f1-d4436107428d does not exist., query-id=b'\x01\xefn\x95\xdbi\x14\x0e\xa8\xf1\xd4Ca\x07B\x8d', session-id=None

in addition, apache spark specific logging:

$anonfun$analyzeQuery$1(SparkExecuteStatementOperation.scala:541)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getOrCreateDF(SparkExecuteStatementOperation.scala:527)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.analyzeQuery(SparkExecuteStatementOperation.scala:541)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$5(SparkExecuteStatementOperation.scala:633)
	at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:532)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$execute$1(SparkExecuteStatementOperation.scala:633)
	... 43 more
, operation-id=01ef6e95-cea5-18b1-8077-63b37a785969

databricks version: 1.8.5post2+6b29d329ae8a3ce6bc066d032ec3db590160046c
dbt version: versionless - 2024.9.239

Expected behavior

from the user - I had assumed that was because we were using multiple threads, but I would expect it to fail nice and gracefully rather than provide a log consisting of 500 identical messages, and sometimes not even providing the original cause of the first model to fail.

The text was updated successfully, but these errors were encountered:

taylorterwin added the bug Something isn't working label Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

noisy --fail-fast logs #804

noisy --fail-fast logs #804

taylorterwin commented Sep 23, 2024

noisy --fail-fast logs #804

noisy --fail-fast logs #804

Comments

taylorterwin commented Sep 23, 2024

Expected behavior