Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Athena] calling invalidate_cache() results in "not a folder created by Splink" error even when folder was created by Splink #2410

Open
2 tasks done
finalgrrrl opened this issue Sep 17, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@finalgrrrl
Copy link
Contributor

What happens?

Using Athena as the SQL backend (AthenaLinker), I am attempting to call invalidate_cache() to force the latest input data to be used. However, this results in the following error:

ValueError: You've asked to drop data housed under the filepath s3://my-bucket/my-output-path/uft7qhr1/ from your s3 output bucket, which is not a folder created by Splink. If you really want to delete this data, you can do so by setting force_non_splink_table=True

As far as I can tell, there is no way to pass force_non_splink_table=True to invalidate_cache(). I explored the code and it doesn't seem to me like the validation that the folder is "Splink generated" is working correctly.

Just a note: I realize I am on a previous version of Splink, but an upgrade to 4.0+ is not currently feasible for me. Going ahead and logging this issue as I'm not sure if versions of Splink <4.0 are still receiving support. If not, of course feel free to close and I will find a workaround.

To Reproduce

Using Athena (AthenaLinker) as your SQL backend, call linker.invalidate_cache().

OS:

EC2

Splink version:

3.9.15

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@finalgrrrl finalgrrrl added the bug Something isn't working label Sep 17, 2024
@RobinL
Copy link
Member

RobinL commented Sep 19, 2024

Sorry you're having problems. We're probably not going to have time to look at this any time soon. If you can find the bug, feel free to PR a fix to the splink3_maintenance branch and we can do a bugfix 3.x release

@finalgrrrl
Copy link
Contributor Author

thanks @RobinL, no worries. i have a workaround for the time being but if i get a chance to take a look i will!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants