Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update HTML tag parsing to work with Acunetix 360 #40

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

rachkor
Copy link
Member

@rachkor rachkor commented Mar 30, 2022

Summary

The following fields in Acunetix 360 Issues had a ton of HTML tags in the output but our parser wasn't removing them:

  • remedial_actions
  • remedial_procedure
  • external_references

This PR resolves these 3 fields plus adds new cleanup for <em> tags and links with <i> tags.

I assign all rights, including copyright, to any future Dradis work by myself to Security Roots.

@sean-yeoh sean-yeoh changed the base branch from main to release-4.3.0 April 27, 2022 03:18
@sean-yeoh sean-yeoh changed the base branch from release-4.3.0 to main June 3, 2022 07:51
result.gsub!(/<h[0-9] >(.*?)<\/h[0-9]>/) { "\n\n*#{$1.strip}*\n\n" }
result.gsub!(/<b>(.*?)<\/b>/) { "*#{$1.strip}*" }
result.gsub!(/<br\/>/, "\n")
result.gsub!(/<br\/>|<br \/>/, "\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you simplify this with an optional space?

result.gsub!(/<span.*?>(.*?)<\/span>/m){"#{$1.strip}"}
result.gsub!(/<span.*?>|<\/span>/, '') #repeating again to deal with nested/empty/incomplete span tags

result.gsub!(/<a (.*?)href='(.*?)'><i(.*?)><\/i>(.*?)<\/a>/m) { "\"#{$4}\":#{$2}" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this one should cover the next one (L19) so we don't need multiple regex's to parse the same tag.


result.gsub!(/<strong>(.*?)<\/strong>/) { "*#{$1.strip}*" }
result.gsub!(/<span.*?>(.*?)<\/span>/m){"#{$1.strip}\n"}
# Cleanup lingering <p></p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rachkor is this really an issue? We have this and L16 "cleanup" lines, is the code so bad that they include random and

tags all over the place? It seems we're doing something wrong with our parsing.

result.gsub!(/<pre.*?>(.*?)<\/pre>/) { "\n\nbc. #{$1}\n\n" }
result.gsub!(/<pre.*?>(.*?)<\/pre>/m){|m| "\n\nbc.. #{$1}\n\np. \n" }

result.gsub!(/<li.*?>([\s\S]*?)<\/li>/m){"\n* #{$1}"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this and L33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants