-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file caching features to FileInfo #2810
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2810 +/- ##
========================================
Coverage 99.88% 99.88%
========================================
Files 357 357
Lines 49724 50110 +386
========================================
+ Hits 49668 50054 +386
Misses 56 56 ☔ View full report in Codecov by Sentry. |
This also includes a fix for the flake8 script so that it can be actually executed with git hooks. |
This PR is related to the discussion |
It looks as though |
Yes, but I don't know why. It will be removed in my next commit (after you finish reviewing). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Martin, it's great to make progress on this (we've talked about caching fparser trees for years). I've only looked at the src and doc changes so far (but have noted that you're missing at least one test for your configuration changes).
You'll see I have some structural concerns: I don't understand why a ModuleInfo constructor now accepts PSyIR and the caching mechanism seems unduly complex.
My other main concern is that you have a lot of "bare asserts" in the code. These need to be either removed or replaced with explicit raises of appropriate errors.
src/psyclone/parse/file_info.py
Outdated
if self._cache_data_load._psyir_node is not None: | ||
# Use cached version | ||
if verbose: | ||
print(" - Using cache of fparser tree") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"of PSyIR"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
src/psyclone/parse/file_info.py
Outdated
# TODO #2786: use 'save_to_cache=False' if TODO is resolved | ||
fparse_tree = self.get_fparser_tree( | ||
verbose=verbose, | ||
save_to_cache=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to default to False for the moment. You could expose it (for development purposes) as a module variable or a class variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Thanks for spotting this bug.
(But it wouldn't be cached if cache is disabled. I renamed this argument to save_to_cache_if_cache_active
to reflect this better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to be True
. The "idea" is as follows:
If loading only the fparser tree
by calling get_fparser_tree
from some other program using psyclone, we want to potentially cache the fparser tree if it's not yet cached.
If loading the PSyIR
, we don't want to cache the fparser tree directly after loading it, but wait until we also loaded the PSyIR
to store both together to the cache file. Then, if caching would be active, save_to_cache_if_cache_active=False
to delay updating the cache.
However, we don't have the caching of PSyIR
, yet.
Hence, we update the cache even if we just loaded the fparser by setting this to True
.
I add a comment on the TODO to this.
As pointed out above, if the cache is not active, setting this to True
won't write the cache file.
src/psyclone/parse/file_info.py
Outdated
|
||
return self._fparser_tree | ||
|
||
def get_psyir_node(self, verbose: bool = False) -> FileContainer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think get_psyir()
is what we want here - we get the PSyIR of the contents of the file described by the FileInfo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's an issue how things are named in Psyclone.
It returns the root node of the IR, hence _node
.
It's not directly the entire PSyIR, but you can access it with this node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I simply rename it to get_psyir()
to get the PR through :-)
if not cache_updated: | ||
return None | ||
|
||
# Open cache file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit will need to be atomic to support parallel builds. When creating output files in another place we do:
PSyclone/src/psyclone/psyGen.py
Lines 1647 to 1667 in 28f7608
while not fdesc: | |
name_idx += 1 | |
new_suffix = "" | |
new_suffix += f"_{name_idx}" | |
new_name = old_base_name + new_suffix + "_mod.f90" | |
try: | |
# Atomically attempt to open the new kernel file (in case | |
# this is part of a parallel build) | |
fdesc = os.open( | |
os.path.join(config.kernel_output_dir, new_name), | |
os.O_CREAT | os.O_WRONLY | os.O_EXCL) | |
except (OSError, IOError): | |
# The os.O_CREATE and os.O_EXCL flags in combination mean | |
# that open() raises an error if the file exists | |
if config.kernel_naming == "single": | |
# If the kernel-renaming scheme is such that we only ever | |
# create one copy of a transformed kernel then we're done | |
break | |
continue |
For now though, as long as we default to not using caching then this can be left as is, albeit with a comment to this effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see from googling that this requirement for POSIX compliance could be problematic on both NFSv4 and Lustre but is there in theory:
(from https://wiki.lustre.org/NFS_vs._Lustre).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful! It's what I was searching for. Thanks. It's now done like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, this doesn't work. I thought I can use locks, but those in the documentation of the os.open() only work for BSD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now understood your trick with either creating the file and owning it or raising an Exception.
I added this version. But I also have to remove the file if it exists.
There can still be race conditions, but in the worst case, the cache file will be accidentally deleted.
|
||
# Load cache file | ||
try: | ||
filehandler = open(self._filepath_cache, "rb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit too will need some thought for parallel builds. Please add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
:param finfo: object holding information on the source file which defines | ||
this module. | ||
:type finfo: :py:class:`psyclone.parse.FileInfo` | ||
|
||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this description.
@arporter Thanks for the feedback. Back to you. |
@arporter I removed the assertions and made some other cleanups. Your turn! |
No description provided.