Update HTML tag parsing to work with Acunetix 360 #40

rachkor · 2022-03-30T17:37:57Z

Summary

The following fields in Acunetix 360 Issues had a ton of HTML tags in the output but our parser wasn't removing them:

remedial_actions
remedial_procedure
external_references

This PR resolves these 3 fields plus adds new cleanup for <em> tags and links with <i> tags.

I assign all rights, including copyright, to any future Dradis work by myself to Security Roots.

…360-cleanup

etdsoft · 2024-12-23T08:26:24Z

lib/acunetix/concerns/cleanup.rb

      result.gsub!(/<h[0-9] >(.*?)<\/h[0-9]>/) { "\n\n*#{$1.strip}*\n\n" }
      result.gsub!(/<b>(.*?)<\/b>/) { "*#{$1.strip}*" }
-      result.gsub!(/<br\/>/, "\n")
+      result.gsub!(/<br\/>|<br \/>/, "\n")


can you simplify this with an optional space?

etdsoft · 2024-12-23T08:27:00Z

lib/acunetix/concerns/cleanup.rb

+      result.gsub!(/<span.*?>(.*?)<\/span>/m){"#{$1.strip}"}
+      result.gsub!(/<span.*?>|<\/span>/, '') #repeating again to deal with nested/empty/incomplete span tags
+
+      result.gsub!(/<a (.*?)href='(.*?)'><i(.*?)><\/i>(.*?)<\/a>/m) { "\"#{$4}\":#{$2}" }


Similarly, this one should cover the next one (L19) so we don't need multiple regex's to parse the same tag.

etdsoft · 2024-12-23T08:29:19Z

lib/acunetix/concerns/cleanup.rb


-      result.gsub!(/<strong>(.*?)<\/strong>/) { "*#{$1.strip}*" }
-      result.gsub!(/<span.*?>(.*?)<\/span>/m){"#{$1.strip}\n"}
+      # Cleanup lingering <p></p>


@rachkor is this really an issue? We have this and L16 "cleanup" lines, is the code so bad that they include random and
tags all over the place? It seems we're doing something wrong with our parsing.

etdsoft · 2024-12-23T08:30:04Z

lib/acunetix/concerns/cleanup.rb

+      result.gsub!(/<pre.*?>(.*?)<\/pre>/) { "\n\nbc. #{$1}\n\n" }
+      result.gsub!(/<pre.*?>(.*?)<\/pre>/m){|m| "\n\nbc.. #{$1}\n\np. \n" }
+
+      result.gsub!(/<li.*?>([\s\S]*?)<\/li>/m){"\n* #{$1}"}


Why do we need this and L33

rachkor and others added 7 commits April 8, 2021 15:18

Merge branch 'main' of github.com:dradis/dradis-acunetix

8c54099

Merge branch 'main' of github.com:dradis/dradis-acunetix

abaa7b8

Merge branch 'main' of github.com:dradis/dradis-acunetix

4c7c24a

Update HTML tag parsing to work with Acunetix 360

ef415e7

Add specs

183f34a

Add more cleanup actions that were previously missed

221f46e

Merge branch '360-cleanup' of github.com:dradis/dradis-acunetix into …

cf77635

…360-cleanup

sean-yeoh changed the base branch from main to release-4.3.0 April 27, 2022 03:18

sean-yeoh added 2 commits April 27, 2022 11:18

Merge branch 'release-4.3.0' into 360-cleanup

a38ffa6

Merge branch 'main' into 360-cleanup

2616a6a

sean-yeoh changed the base branch from release-4.3.0 to main June 3, 2022 07:51

sean-yeoh and others added 4 commits June 3, 2022 15:52

Fix chagnelog

457d3b8

Fix formatting

9c495ee

Merge branch 'main' into 360-cleanup

d427425

Fix new formatting issues

3b0aa02

etdsoft reviewed Dec 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update HTML tag parsing to work with Acunetix 360 #40

Update HTML tag parsing to work with Acunetix 360 #40

rachkor commented Mar 30, 2022

etdsoft Dec 23, 2024

etdsoft Dec 23, 2024

etdsoft Dec 23, 2024

etdsoft Dec 23, 2024

Update HTML tag parsing to work with Acunetix 360 #40

Are you sure you want to change the base?

Update HTML tag parsing to work with Acunetix 360 #40

Conversation

rachkor commented Mar 30, 2022

Summary

etdsoft Dec 23, 2024

Choose a reason for hiding this comment

etdsoft Dec 23, 2024

Choose a reason for hiding this comment

etdsoft Dec 23, 2024

Choose a reason for hiding this comment

etdsoft Dec 23, 2024

Choose a reason for hiding this comment