Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LayoutAwareDFXPParser prints BeautifulSoup4 XMLParsedAsHTMLWarning #331

Open
rlaphoenix opened this issue Mar 4, 2024 · 0 comments
Open

Comments

@rlaphoenix
Copy link
Contributor

rlaphoenix commented Mar 4, 2024

Since the LayoutAwareDFXPParser uses the html.parser feature instead of xml or lxml or such, it prints the warning when it think the content is XML and not HTML.

The warning:

.venv\Lib\site-packages\bs4\builder\__init__.py:545:
XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an
HTML parser. If this really is an HTML document (maybe it's XHTML?), you can
ignore or filter this warning. If it's XML, you should know that using an XML
parser will be more reliable. To parse this document as XML, make sure you have
the lxml package installed, and pass the keyword argument `features="xml"` into
the BeautifulSoup constructor.

A possible solution would be:

import warnings
from bs4 import GuessedAtParserWarning
warnings.filterwarnings('ignore', category=GuessedAtParserWarning)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant