Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not perform domain substitutions within comments #1026

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

csagan5
Copy link
Contributor

@csagan5 csagan5 commented May 5, 2020

Covers C/C++, Java, Javascript and Python file extensions

This change will prevent domain substitutions to happen within comments or comments blocks, thus reducing the amount of unnecessary file changes when applying domain substitution.

This is what I had in mind for issue #849.

@csagan5 csagan5 force-pushed the do-not-replace-in-comments branch 3 times, most recently from 5931d0a to 84d04d2 Compare May 5, 2020 11:16
@csagan5 csagan5 changed the title Do not perform replacements within comments Do not perform domain substitutions within comments May 5, 2020
@csagan5 csagan5 force-pushed the do-not-replace-in-comments branch 8 times, most recently from 2384438 to b070bab Compare May 5, 2020 22:42
Covers C/C++, Java, Javascript and Python file extensions
Copy link
Member

@Eloston Eloston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with the overall idea here (due to the practical problems caused by trying to keep this script simple and non-discretionary in the past, such as the Android manifest schema URL problem a while back), but IMO the implementation is not quite ideal.

One thing that comes to mind is this sort of pre- and post- processing before actual substitution shouldn't be inlined with the actual substitution logic. I think this should be abstracted, since this is the first time that we're introducing pre- and post- substitution logic, and I think there's potential for additional processing in the future. I'll need to think more about how I'd like to organize the code.

My second concern is regarding the implementation. For starters, the regex for the non-(Python and GN) files don't apply to all the selected file types. Secondly, I'm not 100% convinced the regex will work in all scenarios, especially if character escapes get involved. Since domain substitution is an essential component, I'd rather play it safe and implement AST parsing. This should allow us to properly support docstrings as well. I can take a look at how this can be implemented for the filetypes listed here.

@csagan5
Copy link
Contributor Author

csagan5 commented May 31, 2020

I think there's potential for additional processing in the future.

As a general rule I avoid adding code for the "just in case" situation, but if you have some use-cases to leverage already then sure - why not.

For starters, the regex for the non-(Python and GN) files don't apply to all the selected file types.

What do you mean here? I am genuinely curious about some bug I might not have spotted.

Since domain substitution is an essential component, I'd rather play it safe and implement AST parsing.

That would indeed be a very advanced and complete solution.

@Eloston
Copy link
Member

Eloston commented May 31, 2020

As a general rule I avoid adding code for the "just in case" situation, but if you have some use-cases to leverage already then sure - why not.

I think the AndroidManifest schema problem is another such example. Also, anything in DOMAIN_EXCLUDE_PREFIXES could have a smarter exclusion rule associated with them.

What do you mean here? I am genuinely curious about some bug I might not have spotted.

Actually, I was mistaken about this. I didn't realize all those languages supported both of those commenting types.

However, I also realized your code will break if the comment delimiter appears in strings. I don't know if any substitutions could be affected by this, though.

@csagan5
Copy link
Contributor Author

csagan5 commented May 31, 2020

However, I also realized your code will break if the comment delimiter appears in strings. I don't know if any substitutions could be affected by this, though.

Yes, I am aware, I decided to go with this risk. I have inspected the produced diff (now much smaller thanks to the code in this PR) a few times (even if not thoroughly) and it seemed correct.

@Luka-Filipovic
Copy link

What issue is tried to be solved by not performing domain substitution on comments? If it's the timestamps, wasn't that taken care of in 1a0e163?

@csagan5
Copy link
Contributor Author

csagan5 commented Sep 19, 2020

@Luka-Filipovic I create git commits after performing the domain substitutions and there are several MBs of unwanted changes; I also think that changing the timestamp alone won't prevent the file from being rebuilt if one if its comments changed.

@Luka-Filipovic
Copy link

@Luka-Filipovic I create git commits after performing the domain substitutions and there are several MBs of unwanted changes;

What percentage of files from lists have links only in comments? Not doing domsub in test files could also help with your problem, and it can be made only by parsing the file name.

When comments and test files are cut out, what could be the next step in improving this process?

I also think that changing the timestamp alone won't prevent the file from being rebuilt if one if its comments changed.

If that's the case (I didn't test it myself yet), would #1185 help the problem?

@csagan5
Copy link
Contributor Author

csagan5 commented Sep 20, 2020

Not doing domsub in test files could also help with your problem, and it can be made only by parsing the file name.

I already covered this by excluding all test files/directories I could identify.

What percentage of files from lists have links only in comments?

Can't say, but if you make a git diff you will not be able to easily identify the actual code changes because the comment changes drown them out.

If that's the case (I didn't test it myself yet), would #1185 help the problem?

ninja uses timestamps like make, there's an open ninja issue about using file characteristics.

The problem is when you build Chromium and then run domain substitution: setting all files to a precedent timestamp to trick ninja into not rebuilding those files is wrong, as you cannot tell if only comments or code were changed. Thus what will happen is that there will be no advantage in building Chromium and then your modified version with patches and domain substitution, as most files will be rebuilt because of the changes. ccache mitigates this to some extent, but it still leaves room for improvement; with the patch I proposed here I made this process of re-using unchanged object files more efficient.

As for #1185 I actually think it's better to apply domain substitution after the other patches, so that they can be used individually without dependencies on the big domain substitution patch (which I treat as a patch, not different from any other).

@Eloston
Copy link
Member

Eloston commented Sep 27, 2020

The problem is when you build Chromium and then run domain substitution: setting all files to a precedent timestamp to trick ninja into not rebuilding those files is wrong, as you cannot tell if only comments or code were changed. Thus what will happen is that there will be no advantage in building Chromium and then your modified version with patches and domain substitution, as most files will be rebuilt because of the changes. ccache mitigates this to some extent, but it still leaves room for improvement; with the patch I proposed here I made this process of re-using unchanged object files more efficient.

That's interesting. What use-case do you have where you'd want to build Chromium once first, apply domain substitution, then re-run the build? I envisioned only these two workflows for domain substitution:

  1. Full build: Domain substitution, then build.
  2. Incremental build:
    1. Apply domain substitution
    2. Build
    3. Revert domain substitution
    4. Make code changes
    5. Repeat steps 1-4 until the the desired build is created.

Of these two workflows, the timestamp manipulation logic is useful only for the incremental build workflow.

@csagan5
Copy link
Contributor Author

csagan5 commented Sep 28, 2020

That's interesting. What use-case do you have where you'd want to build Chromium once first, apply domain substitution, then re-run the build?

The Bromite build process is a bit like that:

The Chromium builds are important because before a bug is submitted for Bromite it needs to be tested against vanilla Chromium first (yes, we often incur in upstream bugs which become visible downstream because of specific feature/GN flags combinations).

In general you want to make the best out of ninja, C++, Java own invalidation systems, and ccache's one; the changes in the comments paddle against all of them as at least the parser must run to find out that nothing changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants