dplyr::copy_to() performance with odbc 1.5.0 #7103

nathanhaigh · 2024-11-11T11:45:52Z

I'm not sure exactly where to start or who to refer this issue to. However, since it's a performance issue I have noticed with calling dplyr::copy_to() with the recent odbc 1.5.0 release I thought I'd report it here in the first instance. I am using a Snowflake DB backend.

The recent odbc 1.5.0 release has rolled back a change to the default value of batch_rows for the dbWriteTable() and dbBind() methods that was made in odbc 1.3.0 (https://odbc.r-dbi.org/news/index.html#minor-improvements-and-bug-fixes-1-5-0). The default has been changed from NA to 1024. What this means is that when using dplyr::copy_to() with a reasonable sized table, the inserts are done in batches of 1024 rows. For sizeable tables, the overhead of performing many inserts is really, really noticable and is faaaar slower than what is achieved by using odbc 1.4.2 which simply does a single insert of all the rows in one go. It seems that once the size of the table is large enough, then a file is uploaded using PUT and the entire contents of the file inserted in one go.

I did some benchmarking of copying a table consisting of 1 column of integer values and differing numbers of rows (small = 1024 rows, large = 20k rows and xlarge = 400k) using odbc 1.4.2 and comparing to odbc 1.5.0. Here are the results:

Using odbc 1.4.2 (DBI 1.2.3, dplyr 1.1.4 and dbplyr 2.5.0):

Using odbc 1.5.0 (DBI 1.2.3, dplyr 1.1.4 and dbplyr 2.5.0):

As you can see, the the dplyr::copy_to() performance is severely impacted by odbc 1.5.0 and especially for tables containing many rows. The xlarge table containing a single column of ints 1:400000 took ~5s with odbc 1.4.2 vs 250s with odbc 1.5.0.

I see it is possible to set batch_rows globally via options() with a larger number of rows:

options(odbc.batch_rows = 1e9)

This fixes the performance issues (see below) but I'm unclear if the reason why odbc 1.5.0 made 1024 the default and if increasing this value to an arbitrarily large number might have consequences I haven't thought about.

The text was updated successfully, but these errors were encountered:

DavisVaughan · 2024-11-11T13:48:04Z

Thanks, closing in favor of r-dbi/odbc#862

DavisVaughan mentioned this issue Nov 11, 2024

dplyr::copy_to() performance with odbc 1.5.0 r-dbi/odbc#862

Open

DavisVaughan closed this as completed Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dplyr::copy_to() performance with odbc 1.5.0 #7103

dplyr::copy_to() performance with odbc 1.5.0 #7103

nathanhaigh commented Nov 11, 2024

DavisVaughan commented Nov 11, 2024

dplyr::copy_to() performance with odbc 1.5.0 #7103

dplyr::copy_to() performance with odbc 1.5.0 #7103

Comments

nathanhaigh commented Nov 11, 2024

DavisVaughan commented Nov 11, 2024