Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file caching features to FileInfo #2810

Open
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

schreiberx
Copy link
Collaborator

No description provided.

@schreiberx schreiberx requested a review from arporter November 29, 2024 17:32
@schreiberx schreiberx self-assigned this Nov 29, 2024
Copy link

codecov bot commented Nov 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.88%. Comparing base (539a5cc) to head (bd89c7f).
Report is 16 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #2810    +/-   ##
========================================
  Coverage   99.88%   99.88%            
========================================
  Files         357      357            
  Lines       49724    50110   +386     
========================================
+ Hits        49668    50054   +386     
  Misses         56       56            

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@schreiberx
Copy link
Collaborator Author

This also includes a fix for the flake8 script so that it can be actually executed with git hooks.

@schreiberx schreiberx marked this pull request as ready for review November 30, 2024 09:19
@schreiberx
Copy link
Collaborator Author

This PR is related to the discussion
#2783
and might allow closing this discussion.

@arporter
Copy link
Member

arporter commented Dec 2, 2024

It looks as though doc/reference_guide/.pip_requirements.txt.swp has been added in error?

@schreiberx
Copy link
Collaborator Author

It looks as though doc/reference_guide/.pip_requirements.txt.swp has been added in error?

Yes, but I don't know why. It will be removed in my next commit (after you finish reviewing).

Copy link
Member

@arporter arporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Martin, it's great to make progress on this (we've talked about caching fparser trees for years). I've only looked at the src and doc changes so far (but have noted that you're missing at least one test for your configuration changes).
You'll see I have some structural concerns: I don't understand why a ModuleInfo constructor now accepts PSyIR and the caching mechanism seems unduly complex.
My other main concern is that you have a lot of "bare asserts" in the code. These need to be either removed or replaced with explicit raises of appropriate errors.

src/psyclone/configuration.py Show resolved Hide resolved
src/psyclone/configuration.py Outdated Show resolved Hide resolved
src/psyclone/configuration.py Show resolved Hide resolved
src/psyclone/parse/module_info.py Outdated Show resolved Hide resolved
src/psyclone/parse/module_info.py Show resolved Hide resolved
if self._cache_data_load._psyir_node is not None:
# Use cached version
if verbose:
print(" - Using cache of fparser tree")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"of PSyIR"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

# TODO #2786: use 'save_to_cache=False' if TODO is resolved
fparse_tree = self.get_fparser_tree(
verbose=verbose,
save_to_cache=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to default to False for the moment. You could expose it (for development purposes) as a module variable or a class variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thanks for spotting this bug.
(But it wouldn't be cached if cache is disabled. I renamed this argument to save_to_cache_if_cache_active to reflect this better)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be True. The "idea" is as follows:

If loading only the fparser tree by calling get_fparser_tree from some other program using psyclone, we want to potentially cache the fparser tree if it's not yet cached.

If loading the PSyIR, we don't want to cache the fparser tree directly after loading it, but wait until we also loaded the PSyIR to store both together to the cache file. Then, if caching would be active, save_to_cache_if_cache_active=False to delay updating the cache.
However, we don't have the caching of PSyIR, yet.
Hence, we update the cache even if we just loaded the fparser by setting this to True.
I add a comment on the TODO to this.

As pointed out above, if the cache is not active, setting this to True won't write the cache file.


return self._fparser_tree

def get_psyir_node(self, verbose: bool = False) -> FileContainer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think get_psyir() is what we want here - we get the PSyIR of the contents of the file described by the FileInfo.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's an issue how things are named in Psyclone.
It returns the root node of the IR, hence _node.
It's not directly the entire PSyIR, but you can access it with this node.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I simply rename it to get_psyir() to get the PR through :-)

if not cache_updated:
return None

# Open cache file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit will need to be atomic to support parallel builds. When creating output files in another place we do:

while not fdesc:
name_idx += 1
new_suffix = ""
new_suffix += f"_{name_idx}"
new_name = old_base_name + new_suffix + "_mod.f90"
try:
# Atomically attempt to open the new kernel file (in case
# this is part of a parallel build)
fdesc = os.open(
os.path.join(config.kernel_output_dir, new_name),
os.O_CREAT | os.O_WRONLY | os.O_EXCL)
except (OSError, IOError):
# The os.O_CREATE and os.O_EXCL flags in combination mean
# that open() raises an error if the file exists
if config.kernel_naming == "single":
# If the kernel-renaming scheme is such that we only ever
# create one copy of a transformed kernel then we're done
break
continue

For now though, as long as we default to not using caching then this can be left as is, albeit with a comment to this effect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see from googling that this requirement for POSIX compliance could be problematic on both NFSv4 and Lustre but is there in theory:
image
(from https://wiki.lustre.org/NFS_vs._Lustre).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful! It's what I was searching for. Thanks. It's now done like that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, this doesn't work. I thought I can use locks, but those in the documentation of the os.open() only work for BSD.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now understood your trick with either creating the file and owning it or raising an Exception.
I added this version. But I also have to remove the file if it exists.
There can still be race conditions, but in the worst case, the cache file will be accidentally deleted.


# Load cache file
try:
filehandler = open(self._filepath_cache, "rb")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit too will need some thought for parallel builds. Please add a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

:param finfo: object holding information on the source file which defines
this module.
:type finfo: :py:class:`psyclone.parse.FileInfo`

'''
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this description.

@schreiberx schreiberx requested a review from arporter December 2, 2024 23:37
@schreiberx
Copy link
Collaborator Author

@arporter Thanks for the feedback. Back to you.

@schreiberx
Copy link
Collaborator Author

@arporter I removed the assertions and made some other cleanups. Your turn!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants