Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create raw data for cache invalidation report #9422

Open
wants to merge 14 commits into
base: v2
Choose a base branch
from

Conversation

irismoini
Copy link
Contributor

@irismoini irismoini commented Dec 5, 2023

↪️ Pull Request

This PR creates a new enviroment variable PARCEL_GENERATE_INVALIDATION_REPORT which generates the raw data for Parcel's cache invalidation report in cache-invalidation-report.json. There are two data-structures represented in json format in this report. (1) invalidationRelations which lists invalidated nodeId's as keys whose values corresponding to a list of nodeIds directly responsible for causing that given key node to be invalidated. (2) nodes which contains invalidated nodes and "cause" nodes. nodeId is used as the key, with the value being important information pertaining to that node (type, invaldiateReason, etc.) . nodes does not distinguish between invalidated nodes and "cause" nodes given that "cause" nodes can also be invalidated nodes. As mentioned previously these cause relations are illustrated in invalidationRelations. The data is represented in this manner to be as concise as possible.

Developers can technically determine the root cause for a given invalidated node, but the idea is to develop a tool similar to parcel-query to query data. Users can then ideally view metrics like cache hit ratio, root causes, what a given node is invalidating, and potentially invalidations that are adding the most to build times.

💻 Examples

{
    "invalidationRelations": {
        "0": [
            1,
            2193694
        ],
        "1": [
            2,
            1189786,
            1189755,
            1189756
        ],
...
   "nodes": {
        "0": {
            "id": "parcel_build_request",
            "type": "request",
            "requestType": "parcel_build_request",
            "invalidateReason": 4
        },
        "1": {
            "id": "BundleGraph",
            "type": "request",
            "requestType": "bundle_graph_request",
            "invalidateReason": 4
        },

In this example, node 0 is invalidated by nodes 1 and 2193694. Users can then look up these nodes in nodes to get more information regarding these nodes and the invalidation reason.

Additional stats regarding the size of the cache-invalidation-report.json
When running the dev server:
Between runs with no changes: 869B
Modification of package.json: 366K
Regular build:
Between runs with no changes: 13K
Modification of package.json: 129M

🚨 Test instructions

Tested manually with both the dev server and regular Parcel builds. Tested with changes to the package.json in both cases.

✔️ PR Todo

  • Added/updated unit tests for this change
  • Filled out test instructions (In case there aren't any unit tests)
  • Included links to related issues/PRs

@irismoini irismoini force-pushed the imoini/P2X-1144-add-logging-for-cache-invalidation branch from c337e28 to 1365931 Compare December 5, 2023 22:47
@mischnic
Copy link
Member

mischnic commented Dec 7, 2023

This shouldn't always happen for all users though, so probably behind an env variable?

@AGawrys
Copy link
Contributor

AGawrys commented Dec 14, 2023

“metrics like cache hit ratio, root causes, what a given node is invalidating, and potentially invalidations that are adding the most to build times” <- Good ideas

A couple questions I'm wondering about:
Is there a way to view the invalidation reason, or is that something we’d have to look up in the code base?
Is this targeted more towards Parcel Devs, or people looking to gain more insights into how their project is working with Parcel? I think either way we'd need to have some kind of report that spits out an "at a glance" output and a way for users to quick query without manually digging around a json file.

@irismoini
Copy link
Contributor Author

irismoini commented Dec 15, 2023

“metrics like cache hit ratio, root causes, what a given node is invalidating, and potentially invalidations that are adding the most to build times” <- Good ideas

A couple questions I'm wondering about: Is there a way to view the invalidation reason, or is that something we’d have to look up in the code base? Is this targeted more towards Parcel Devs, or people looking to gain more insights into how their project is working with Parcel? I think either way we'd need to have some kind of report that spits out an "at a glance" output and a way for users to quick query without manually digging around a json file.

The invalidation reason is listed in the raw data right now, but as a numeric value. We'd have to look in packages/core/core/src/constants.js to translate the actual reason. i.e. 4 corresponds to FILE_UPDATE, but I think this could be left to the actual tool developed for queries, just trying to represent the data as concisely as possible here. I think the tool we would build on top of this data would be more for Parcel devs, but ideally could get user friendly enough to be for other folk working with Parcel. Agreed on your last point there! Idea is mainly to have this raw data as a base we use to do queries on!

}
}

const fs = require('fs');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parcel has the ability to override the output filesystem which this ignores. Would it be hard to use outputFS instead of requiring the node FS directly?


const fs = require('fs');
fs.writeFile(
'cache-invalidation-report.json',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should prefix this with parcel? Also we should probably ensure the output is relative to the project root.

JSON.stringify(
{invalidationRelations: invalidationRelations, nodes: nodes},
undefined,
4,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 2? #nitpick 😅

undefined,
4,
),
function (err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is running async via a callback but we're not waiting for completion anywhere. We should be awaiting a promise here instead. If you migrate to outputFS then this should be easy.

nodes[cause] = {
id: causeNode.id,
type: causeNode.type,
requestType: causeNode.requestType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types have been all migrated into numbers now right? Maybe we should map them back to strings for the report?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants