Skip to content
This repository has been archived by the owner on Oct 9, 2020. It is now read-only.

Fix regexp not removing multi-line comments #88

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fix regexp not removing multi-line comments #88

wants to merge 1 commit into from

Conversation

thiemowmde
Copy link

The previous regular expression had two issues:

  1. It was only able to remove 1-line comments.
  2. It was not set to be ungreedy, and could potentially remove content between two comments. E.g. when something like <!-- comment --> content <!-- comment --> does not contain a newline character, the content would actually be removed.

We run into this issue at https://gerrit.wikimedia.org/r/489323. As a temporary workaround we made all our comments 1-line comments, and made sure each comment is on a separate line.

The previous regular expression had two issues:
1. It was only able to remove 1-line comments.
2. It was not set to be ungreedy, and could potentially remove content between two comments. E.g. when something like `<!-- comment --> content <!-- comment -->` does not contain a newline character, the content would actually be removed.

We run into this issue at https://gerrit.wikimedia.org/r/489323. As a temporary workaround we made all our comments 1-line comments, and made sure each comment is on a separate line.
@jsf-clabot
Copy link

jsf-clabot commented Feb 12, 2019

CLA assistant check
All committers have signed the CLA.


// SVG XML -> HTML5
[/\<([A-Za-z]+)([^\>]*)\/\>/g, "<$1$2></$1>"], // convert self-closing XML SVG nodes to explicitly closed HTML5 SVG nodes
[/\<([a-z]+)([^\>]*)\/\>/gi, "<$1$2></$1>"], // convert self-closing XML SVG nodes to explicitly closed HTML5 SVG nodes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case anyone else is a little rusty on their RegExps, this is a readability improvement: drop A-Z and make a-z case-insensitive with the i flag.

@@ -12,10 +12,10 @@ var regexSequences = [
// Remove XML stuffs and comments
[/<\?xml[\s\S]*?>/gi, ""],
[/<!doctype[\s\S]*?>/gi, ""],
[/<!--.*-->/gi, ""],
[/<!--[\s\S]*?-->/g, ""],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not sure if anyone else is rusty but . does not match newlines. Adding newlines can be done with (.|\n) but that adds a capturing group so [\s\S] is used instead, which matches all whitespace and all non-whitespace characters which together form the set of all characters. No casing specifier is needed so that's dropped. *? is like * but non-greedy.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants