Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Create json file line by line and filter using tags #3

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mswillus
Copy link

@mswillus mswillus commented May 24, 2022

I propose two features with this MR:
The first change enables changing the json output to write one json-object per line per post.
The second change introduces a filter mechanism that can be used to filter the dataset while converting it.
Imagine you only care about some tags related to testing. With the new features you can do something like:

./stackexchange-xml-converter \
    -result-format=json \
    -source-path=../data/Posts.xml\
    -store-to-dir "../data" \
    -filter-by-tag-id "\
        tdd\
        testing\
        testcase testing-library\
        unit-testing"\
    -json-one-line

You will get a filtered dataset of posts that have one of those tags assigned. For each there is one json object per line in the resulting Posts.json file.
Then I also added another flag that allows you to include tags where the word is contained in one of the tags.
The following would give you all posts with tags that contain the word 'testing' (e.g. unit-testing, testing-library).

./stackexchange-xml-converter \
    -result-format=json \
    -source-path=../data/Posts.xml
    -store-to-dir "../data" \
    -filter-by-tag-id "\
        testing\
    -json-one-line\
    -filter-no-exact-match

If you approve to the changes I'd be happy to rebase and refactor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant