You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many tables (or parts of them) are still in the output.
Steps to reproduce:
Download this dump: https://dumps.wikimedia.org/jawiki/20221020/jawiki-20221020-pages-articles1.xml-p1p114794.bz2
Invoke the following command to list lines that contain the string "colspan": bzcat jawiki-20221020-pages-articles1.xml-p1p114794.bz2 | wikiextractor/WikiExtractor.py --no-templates -o - - | grep colspan
Many tables (or parts of them) are still in the output.
Steps to reproduce:
https://dumps.wikimedia.org/jawiki/20221020/jawiki-20221020-pages-articles1.xml-p1p114794.bz2
bzcat jawiki-20221020-pages-articles1.xml-p1p114794.bz2 | wikiextractor/WikiExtractor.py --no-templates -o - - | grep colspan
Output:
[shortened]
The text was updated successfully, but these errors were encountered: