Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clone_node() doesn't duplicate nonstandard tag names #3099

Open
stevecheckoway opened this issue Jan 16, 2024 · 1 comment
Open

clone_node() doesn't duplicate nonstandard tag names #3099

stevecheckoway opened this issue Jan 16, 2024 · 1 comment
Assignees
Labels
topic/gumbo Gumbo HTML5 parser topic/memory Segfaults, memory leaks, valgrind testing, etc.

Comments

@stevecheckoway
Copy link
Contributor

While investigating #3098, I noticed that Gumbo's clone_node() function doesn't make a copy of nonstandard tag names.

I think this is the fix

diff --git a/gumbo-parser/src/parser.c b/gumbo-parser/src/parser.c
index 67812b23..c3e5e038 100644
--- a/gumbo-parser/src/parser.c
+++ b/gumbo-parser/src/parser.c
@@ -1377,6 +1377,9 @@ static GumboNode* clone_node (
   *new_node = *node;
   new_node->parent = NULL;
   new_node->index_within_parent = -1;
+
+  if (node->v.element.tag == GUMBO_TAG_UNKNOWN)
+    new_node->v.element.name = gumbo_strdup(node->v.element.name);
   // Clear the GUMBO_INSERTION_IMPLICIT_END_TAG flag, as the cloned node may
   // have a separate end tag.
   new_node->parse_flags &= ~GUMBO_INSERTION_IMPLICIT_END_TAG;

but I'd like to understand why this hasn't been causing a bunch of memory leaks first.

@stevecheckoway stevecheckoway self-assigned this Jan 16, 2024
@flavorjones flavorjones added topic/memory Segfaults, memory leaks, valgrind testing, etc. topic/gumbo Gumbo HTML5 parser labels Jan 16, 2024
@flavorjones
Copy link
Member

@stevecheckoway It looks like clone_node isn't being called for an unknown tag in the test suite.

Here's the patch I used:

diff --git a/gumbo-parser/src/parser.c b/gumbo-parser/src/parser.c
index 06f096f8..180ee746 100644
--- a/gumbo-parser/src/parser.c
+++ b/gumbo-parser/src/parser.c
@@ -20,6 +20,7 @@
 #include <stdint.h>
 #include <stdlib.h>
 #include <string.h>
+#include <stdio.h>
 
 #include "ascii.h"
 #include "attribute.h"
@@ -1396,6 +1397,11 @@ static GumboNode* clone_node (
   *new_node = *node;
   new_node->parent = NULL;
   new_node->index_within_parent = -1;
+
+  if (node->v.element.tag == GUMBO_TAG_UNKNOWN) {
+    fprintf(stderr, "MIKE: unknown tag %s\n", node->v.element.name);
+  }
+
   // Clear the GUMBO_INSERTION_IMPLICIT_END_TAG flag, as the cloned node may
   // have a separate end tag.
   new_node->parse_flags &= ~GUMBO_INSERTION_IMPLICIT_END_TAG;

and it never prints anything!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/gumbo Gumbo HTML5 parser topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

No branches or pull requests

2 participants