Add file caching features to FileInfo #2810

schreiberx · 2024-11-29T17:32:28Z

No description provided.

codecov · 2024-11-29T17:47:55Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.88%. Comparing base (539a5cc) to head (bd89c7f).
Report is 16 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #2810    +/-   ##
========================================
  Coverage   99.88%   99.88%            
========================================
  Files         357      357            
  Lines       49724    50110   +386     
========================================
+ Hits        49668    50054   +386     
  Misses         56       56

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

schreiberx · 2024-11-29T23:31:54Z

This also includes a fix for the flake8 script so that it can be actually executed with git hooks.

schreiberx · 2024-11-30T09:27:15Z

This PR is related to the discussion
#2783
and might allow closing this discussion.

arporter · 2024-12-02T13:06:44Z

It looks as though doc/reference_guide/.pip_requirements.txt.swp has been added in error?

schreiberx · 2024-12-02T14:16:49Z

It looks as though doc/reference_guide/.pip_requirements.txt.swp has been added in error?

Yes, but I don't know why. It will be removed in my next commit (after you finish reviewing).

arporter

Thanks Martin, it's great to make progress on this (we've talked about caching fparser trees for years). I've only looked at the src and doc changes so far (but have noted that you're missing at least one test for your configuration changes).
You'll see I have some structural concerns: I don't understand why a ModuleInfo constructor now accepts PSyIR and the caching mechanism seems unduly complex.
My other main concern is that you have a lot of "bare asserts" in the code. These need to be either removed or replaced with explicit raises of appropriate errors.

src/psyclone/configuration.py

src/psyclone/parse/module_info.py

arporter · 2024-12-02T15:58:39Z

src/psyclone/parse/file_info.py

+            if self._cache_data_load._psyir_node is not None:
+                # Use cached version
+                if verbose:
+                    print("  - Using cache of fparser tree")


arporter · 2024-12-02T16:00:23Z

src/psyclone/parse/file_info.py

+        # TODO #2786: use 'save_to_cache=False' if TODO is resolved
+        fparse_tree = self.get_fparser_tree(
+                verbose=verbose,
+                save_to_cache=True


This needs to default to False for the moment. You could expose it (for development purposes) as a module variable or a class variable.

You're right. Thanks for spotting this bug.
(But it wouldn't be cached if cache is disabled. I renamed this argument to save_to_cache_if_cache_active to reflect this better)

This has to be True. The "idea" is as follows:

If loading only the fparser tree by calling get_fparser_tree from some other program using psyclone, we want to potentially cache the fparser tree if it's not yet cached.

If loading the PSyIR, we don't want to cache the fparser tree directly after loading it, but wait until we also loaded the PSyIR to store both together to the cache file. Then, if caching would be active, save_to_cache_if_cache_active=False to delay updating the cache.
However, we don't have the caching of PSyIR, yet.
Hence, we update the cache even if we just loaded the fparser by setting this to True.
I add a comment on the TODO to this.

As pointed out above, if the cache is not active, setting this to True won't write the cache file.

arporter · 2024-12-02T16:01:34Z

src/psyclone/parse/file_info.py

+
+        return self._fparser_tree
+
+    def get_psyir_node(self, verbose: bool = False) -> FileContainer:


I think get_psyir() is what we want here - we get the PSyIR of the contents of the file described by the FileInfo.

I think it's an issue how things are named in Psyclone.
It returns the root node of the IR, hence _node.
It's not directly the entire PSyIR, but you can access it with this node.

But I simply rename it to get_psyir() to get the PR through :-)

arporter · 2024-12-02T16:14:12Z

src/psyclone/parse/file_info.py

+        if not cache_updated:
+            return None
+
+        # Open cache file


This bit will need to be atomic to support parallel builds. When creating output files in another place we do:

PSyclone/src/psyclone/psyGen.py

Lines 1647 to 1667 in 28f7608

while not fdesc:

name_idx += 1

new_suffix = ""

new_suffix += f"_{name_idx}"

new_name = old_base_name + new_suffix + "_mod.f90"

try:

# Atomically attempt to open the new kernel file (in case

# this is part of a parallel build)

fdesc = os.open(

os.path.join(config.kernel_output_dir, new_name),

os.O_CREAT | os.O_WRONLY | os.O_EXCL)

except (OSError, IOError):

# The os.O_CREATE and os.O_EXCL flags in combination mean

# that open() raises an error if the file exists

if config.kernel_naming == "single":

# If the kernel-renaming scheme is such that we only ever

# create one copy of a transformed kernel then we're done

break

continue

For now though, as long as we default to not using caching then this can be left as is, albeit with a comment to this effect.

I see from googling that this requirement for POSIX compliance could be problematic on both NFSv4 and Lustre but is there in theory:

(from https://wiki.lustre.org/NFS_vs._Lustre).

Wonderful! It's what I was searching for. Thanks. It's now done like that.

Sadly, this doesn't work. I thought I can use locks, but those in the documentation of the os.open() only work for BSD.

I now understood your trick with either creating the file and owning it or raising an Exception.
I added this version. But I also have to remove the file if it exists.
There can still be race conditions, but in the worst case, the cache file will be accidentally deleted.

arporter · 2024-12-02T16:26:06Z

src/psyclone/parse/file_info.py

+
+        # Load cache file
+        try:
+            filehandler = open(self._filepath_cache, "rb")


This bit too will need some thought for parallel builds. Please add a comment.

into martin_fileinfo_cache

schreiberx · 2024-12-02T22:23:37Z

src/psyclone/parse/module_info.py

-    :param finfo: object holding information on the source file which defines
-        this module.
-    :type finfo: :py:class:`psyclone.parse.FileInfo`
-
    '''


I rewrote this description.

schreiberx · 2024-12-02T23:38:43Z

@arporter Thanks for the feedback. Back to you.

schreiberx · 2024-12-04T10:29:24Z

@arporter I removed the assertions and made some other cleanups. Your turn!

schreiberx added 7 commits November 29, 2024 14:58

tests passed

b2d1794

all tests passed

13c8dd6

tests passed

45fbdbb

tests passed

9a63f35

tests passed

5e85283

tests passed

40c9c94

all tests passed (incl. flake8)

5767bf9

schreiberx requested a review from arporter November 29, 2024 17:32

schreiberx self-assigned this Nov 29, 2024

Merge branch 'master' into martin_fileinfo_cache

0407acd

schreiberx added the ready for review label Nov 29, 2024

SCHREIBER Martin added 11 commits November 29, 2024 21:46

Merge branch 'martin_sphinx_autodoc' into martin_fileinfo_cache

a267520

fix for docs

5fcf056

flake8

31e48e4

updates

e56def0

u

4906e31

coverage tests

1fd82da

final (?) cleanups for coverage

db61055

added cache parameter

acb3119

flake8...

16bcb7d

u

a74e423

updated flake8 to be executable with git hooks

0a73300

schreiberx marked this pull request as ready for review November 30, 2024 09:19

Merge branch 'master' into martin_fileinfo_cache

947f259

arporter added under review and removed ready for review labels Dec 2, 2024

removed obsolete file

8eb8285

arporter requested changes Dec 2, 2024

View reviewed changes

arporter added reviewed with actions and removed under review labels Dec 2, 2024

Merge branch 'martin_fileinfo_cache' of https://github.com/stfc/PSyclone

e798613

into martin_fileinfo_cache

schreiberx commented Dec 2, 2024

View reviewed changes

For Andy

434836c

schreiberx requested a review from arporter December 2, 2024 23:37

schreiberx added ready for review and removed reviewed with actions labels Dec 2, 2024

SCHREIBER Martin added 5 commits December 3, 2024 00:53

fix of renaming

c040542

fixed cache file race issues

be10a24

test

50a0824

cleanups

ca7be66

tests passed

7aadf16

SCHREIBER Martin added 2 commits December 4, 2024 11:29

Merge branch 'master' into martin_fileinfo_cache

d27936f

flake8...

d456aa9

arporter added under review and removed ready for review labels Dec 11, 2024

added test as requested

bd89c7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add file caching features to FileInfo #2810

Add file caching features to FileInfo #2810

schreiberx commented Nov 29, 2024

codecov bot commented Nov 29, 2024 •

edited

Loading

schreiberx commented Nov 29, 2024

schreiberx commented Nov 30, 2024

arporter commented Dec 2, 2024

schreiberx commented Dec 2, 2024

arporter left a comment

arporter Dec 2, 2024

schreiberx Dec 2, 2024

arporter Dec 2, 2024

schreiberx Dec 2, 2024

schreiberx Dec 2, 2024

arporter Dec 2, 2024

schreiberx Dec 2, 2024

schreiberx Dec 2, 2024

arporter Dec 2, 2024

arporter Dec 2, 2024

schreiberx Dec 2, 2024

schreiberx Dec 2, 2024

schreiberx Dec 3, 2024

arporter Dec 2, 2024

schreiberx Dec 2, 2024

schreiberx Dec 2, 2024

schreiberx commented Dec 2, 2024

schreiberx commented Dec 4, 2024


		return self._fparser_tree

		def get_psyir_node(self, verbose: bool = False) -> FileContainer:

	while not fdesc:
	name_idx += 1
	new_suffix = ""

	new_suffix += f"_{name_idx}"
	new_name = old_base_name + new_suffix + "_mod.f90"

	try:
	# Atomically attempt to open the new kernel file (in case
	# this is part of a parallel build)
	fdesc = os.open(
	os.path.join(config.kernel_output_dir, new_name),
	os.O_CREAT \| os.O_WRONLY \| os.O_EXCL)
	except (OSError, IOError):
	# The os.O_CREATE and os.O_EXCL flags in combination mean
	# that open() raises an error if the file exists
	if config.kernel_naming == "single":
	# If the kernel-renaming scheme is such that we only ever
	# create one copy of a transformed kernel then we're done
	break
	continue

Add file caching features to FileInfo #2810

Are you sure you want to change the base?

Add file caching features to FileInfo #2810

Conversation

schreiberx commented Nov 29, 2024

codecov bot commented Nov 29, 2024 • edited Loading

Codecov Report

schreiberx commented Nov 29, 2024

schreiberx commented Nov 30, 2024

arporter commented Dec 2, 2024

schreiberx commented Dec 2, 2024

arporter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schreiberx commented Dec 2, 2024

schreiberx commented Dec 4, 2024

codecov bot commented Nov 29, 2024 •

edited

Loading