You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug, including details regarding any error messages, version, and platform.
@pitrou pointed out that InternalFileDecryptor reusing the footer_data_decryptor_ could be problematic for multi-threaded Parquet reads: #43057 (comment)
I confirmed that this does lead to decryptor errors when scanning a Dataset with Parquet files that use uniform encryption by modifying the existing Parquet Dataset encryption tests:
diff --git a/cpp/src/arrow/dataset/file_parquet_encryption_test.cc b/cpp/src/arrow/dataset/file_parquet_encryption_test.cc
index 0287d593d1..6a13b1ee37 100644
--- a/cpp/src/arrow/dataset/file_parquet_encryption_test.cc+++ b/cpp/src/arrow/dataset/file_parquet_encryption_test.cc@@ -90,7 +90,7 @@ class DatasetEncryptionTestBase : public ::testing::Test {
auto encryption_config =
std::make_shared<parquet::encryption::EncryptionConfiguration>(
std::string(kFooterKeyName));
- encryption_config->column_keys = kColumnKeyMapping;+ encryption_config->uniform_encryption = true;
auto parquet_encryption_config = std::make_shared<ParquetEncryptionConfig>();
// Directly assign shared_ptr objects to ParquetEncryptionConfig members
parquet_encryption_config->crypto_factory = crypto_factory_;
This causes DatasetEncryptionTest::WriteReadDatasetWithEncryption to fail with an error like:
/home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:159: Failure
Failed
'_error_or_value28.status()' failed with IOError: AesDecryptor was wiped outDeserializing page header failed.
/home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:109 LoadBatch(batch_size)
/home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:1263 ReadColumn(static_cast<int>(i), row_groups, reader.get(), &column)
/home/adam/dev/arrow/cpp/src/arrow/util/parallel.h:95 func(i, inputs[i])
/home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:208: Failure
Expected: TestScanDataset() doesn't generate new fatal failures in the current thread.
Actual: it does.
For LargeRowEncryptionTest::ReadEncryptLargeRows, I sometimes get the same AesDecryptor was wiped out error, but also see errors like:
/home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:159: Failure
Failed
'_error_or_value28.status()' failed with IOError: Failed decryption finalization
/home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:109 LoadBatch(batch_size)
/home/adam/dev/arrow/cpp/src/parquet/arrow/reader.cc:1263 ReadColumn(static_cast<int>(i), row_groups, reader.get(), &column)
/home/adam/dev/arrow/cpp/src/arrow/util/parallel.h:95 func(i, inputs[i])
/home/adam/dev/arrow/cpp/src/arrow/dataset/file_parquet_encryption_test.cc:265: Failure
Expected: TestScanDataset() doesn't generate new fatal failures in the current thread.
Actual: it does.
I don't think it's possible to reproduce this from PyArrow only, as the uniform_encryption setting isn't exposed in PyArrow.
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered:
Describe the bug, including details regarding any error messages, version, and platform.
@pitrou pointed out that
InternalFileDecryptor
reusing thefooter_data_decryptor_
could be problematic for multi-threaded Parquet reads: #43057 (comment)I confirmed that this does lead to decryptor errors when scanning a Dataset with Parquet files that use uniform encryption by modifying the existing Parquet Dataset encryption tests:
This causes
DatasetEncryptionTest::WriteReadDatasetWithEncryption
to fail with an error like:For
LargeRowEncryptionTest::ReadEncryptLargeRows
, I sometimes get the sameAesDecryptor was wiped out
error, but also see errors like:I don't think it's possible to reproduce this from PyArrow only, as the
uniform_encryption
setting isn't exposed in PyArrow.Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: