You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure exactly where to start or who to refer this issue to. However, since it's a performance issue I have noticed with calling dplyr::copy_to() with the recent odbc 1.5.0 release I thought I'd report it here in the first instance. I am using a Snowflake DB backend.
The recent odbc 1.5.0 release has rolled back a change to the default value of batch_rows for the dbWriteTable() and dbBind() methods that was made in odbc 1.3.0 (https://odbc.r-dbi.org/news/index.html#minor-improvements-and-bug-fixes-1-5-0). The default has been changed from NA to 1024. What this means is that when using dplyr::copy_to() with a reasonable sized table, the inserts are done in batches of 1024 rows. For sizeable tables, the overhead of performing many inserts is really, really noticable and is faaaar slower than what is achieved by using odbc 1.4.2 which simply does a single insert of all the rows in one go. It seems that once the size of the table is large enough, then a file is uploaded using PUT and the entire contents of the file inserted in one go.
I did some benchmarking of copying a table consisting of 1 column of integer values and differing numbers of rows (small = 1024 rows, large = 20k rows and xlarge = 400k) using odbc 1.4.2 and comparing to odbc 1.5.0. Here are the results:
Using odbc 1.4.2 (DBI 1.2.3, dplyr 1.1.4 and dbplyr 2.5.0):
Using odbc 1.5.0 (DBI 1.2.3, dplyr 1.1.4 and dbplyr 2.5.0):
As you can see, the the dplyr::copy_to() performance is severely impacted by odbc 1.5.0 and especially for tables containing many rows. The xlarge table containing a single column of ints 1:400000 took ~5s with odbc 1.4.2 vs 250s with odbc 1.5.0.
I see it is possible to set batch_rows globally via options() with a larger number of rows:
options(odbc.batch_rows = 1e9)
This fixes the performance issues (see below) but I'm unclear if the reason why odbc 1.5.0 made 1024 the default and if increasing this value to an arbitrarily large number might have consequences I haven't thought about.
The text was updated successfully, but these errors were encountered:
I'm not sure exactly where to start or who to refer this issue to. However, since it's a performance issue I have noticed with calling
dplyr::copy_to()
with the recent odbc 1.5.0 release I thought I'd report it here in the first instance. I am using a Snowflake DB backend.The recent odbc 1.5.0 release has rolled back a change to the default value of
batch_rows
for thedbWriteTable()
anddbBind()
methods that was made in odbc 1.3.0 (https://odbc.r-dbi.org/news/index.html#minor-improvements-and-bug-fixes-1-5-0). The default has been changed fromNA
to1024
. What this means is that when usingdplyr::copy_to()
with a reasonable sized table, the inserts are done in batches of1024
rows. For sizeable tables, the overhead of performing many inserts is really, really noticable and is faaaar slower than what is achieved by using odbc 1.4.2 which simply does a single insert of all the rows in one go. It seems that once the size of the table is large enough, then a file is uploaded usingPUT
and the entire contents of the file inserted in one go.I did some benchmarking of copying a table consisting of 1 column of integer values and differing numbers of rows (small = 1024 rows, large = 20k rows and xlarge = 400k) using odbc 1.4.2 and comparing to odbc 1.5.0. Here are the results:
Using odbc 1.4.2 (DBI 1.2.3, dplyr 1.1.4 and dbplyr 2.5.0):
Using odbc 1.5.0 (DBI 1.2.3, dplyr 1.1.4 and dbplyr 2.5.0):
As you can see, the the
dplyr::copy_to()
performance is severely impacted by odbc 1.5.0 and especially for tables containing many rows. Thexlarge
table containing a single column of ints1:400000
took ~5s with odbc 1.4.2 vs 250s with odbc 1.5.0.I see it is possible to set
batch_rows
globally viaoptions()
with a larger number of rows:This fixes the performance issues (see below) but I'm unclear if the reason why odbc 1.5.0 made
1024
the default and if increasing this value to an arbitrarily large number might have consequences I haven't thought about.The text was updated successfully, but these errors were encountered: