Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sweet project! #6

Open
simlu opened this issue Jul 23, 2022 · 2 comments
Open

Sweet project! #6

simlu opened this issue Jul 23, 2022 · 2 comments

Comments

@simlu
Copy link

simlu commented Jul 23, 2022

Was just browsing for competitors to object-scan and this popped up!

Awesome project, I have to look into it in more detail and will definitely add it to the functionality comparison and performance benchmarking suite of object-scan! I'm really looking forward to testing how it compares performance wise!

I have to look at the source code as well. I'm assuming you have a precompilation phase where you build a search tree? Or how does this work internally? What are the next things on the roadmap for this project?

PS: If it wasn't clear from my comment, I'm the author of object-scan

Edit: Read up on how this works internally and can't wait to try it out! It's actually quite different and a cool way of handling the multi path traversal case. However does it require always a full traversal of the entire input?

@simlu
Copy link
Author

simlu commented Jul 24, 2022

Since nimma is the closest competitor that I have found to object-scan, I've spend quite some time looking at differences. I've added a big section in the readme, but am going to add lots of notes here in the hopes that it will help you make your library better! Here is the relevant PR.

Note that I have not looked at the "generating code" feature as that is far beyond the specs of object-scan.
I'm honestly not quite sure what the purpose of it is and it would be awesome if you could explain!

Here we go, in no particular order:

  1. It would be cool to be able to handle circular references and not just crash
  2. Performance is surprisingly good, but I've noticed that the compile step is relatively slow compared to object-scan. Might be room for improvement there
  3. The callback context is lacking: Only value and path are provided. Take a look at object-scan, there is a lot more provided there.
  4. The path in the callback context is not a copy, but the path that is internally used by the library. This can lead to some hard to debug issues and I would highly suggest you change it. It did not test if changing the path can alter the behavior of nimma (I'm expecting it does!), which would be an even bigger issue.
  5. The query syntax is a bit weird, I would expect a single callback that receives the "needle" that triggered the callback, rather than having to specify a callback per needle. Take a look at object-scan for some inspiration (look at matchedBy, traversedBy, etc). This would also make the case much easier to handle where different callbacks are called for the same path and different "needles", but we only want to do processing once per path.
  6. It would be cool to be able to return from callbacks and do something with those results (take a look at filterFn, breakFn and context in object-scan)
  7. Traversal seems to happen in Depth-First Pre-Order? Might be worth documenting that
  8. Unfortunately it looks like multiple input paths mean multiple traversal: Unexpected multiple traversals #7

The readme of object-scan is extensive. So I'm sure you could get a lot of inspiration just reading through that.

On the flip side, the biggest problem I've seen with object-scan is that it doesn't align with the jsonpath syntax. It is close, but doesn't fully support it. In particular conditionals are not supported. Something I have to put onto the roadmap. That and typescript support.

This was a fun exploration! Thank you and I hope you find this somewhat useful =)

@P0lip
Copy link
Owner

P0lip commented Jul 24, 2022

That library was primarily meant to be a drop-in replacement for jsonpath-plus to use in stoplight/spectral, hence some similarities, i.e. in callbacks.
That project was also quite adjusted to the common usage of Spectral (this tool uses JSONPath expressions under the hood), so certain cases are more optimized than the others.

Frankly speaking, I'm not sure really sure whether Nimma means to be a competitor to object-scan or jmespath. The syntax of expressions they accept is quite different after all. It's certainly however more of a direct competitor to jsonpath or jsonpath-plus.

I have to look at the source code as well. I'm assuming you have a precompilation phase where you build a search tree? Or how does this work internally? What are the next things on the roadmap for this project?

I don't build a search tree. I was thinking about that, but it's rather tricky to do given a JSONPath expression may contain any arbitary JS code, so one would need to analyze that too. Your tool doesn't support it, so it is way easier to determine which properties are to be matched.
I do have an optimization there that excludes properties that are known.
Example: https://github.com/P0lip/nimma/blob/master/src/__tests__/codegen.test.mjs#L805

It's not really a search tree, however.

Performance is surprisingly good, but I've noticed that the compile step is relatively slow compared to object-scan. Might be room for improvement there

Parsing is quite expensive. I could write my own parser, but I stuck with peg, so there's some room for improvement here for sure. Obviously, I generate the JS code there that then needs to be evaluated again at runtime, but there isn't much one can notably improve here.

It would be cool to be able to handle circular references and not just crash

No point tbh - JSONPath expressions are meant to deal with JSON input, so that's not really worth putting any effort into.

The path in the callback context is not a copy, but the path that is internally used by the library. This can lead to some hard to debug issues and I would highly suggest you change it. It did not test if changing the path can alter the behavior of nimma (I'm expecting it does!), which would be an even bigger issue.

Yeah, I know the path is mutated, and that's mostly to avoid the perf cost of new array allocation.
In our case, some documents have thousands of properties, often quite nested, so one would end up copying huge arrays thousands of times. This could be partially mitigated if #7 was implemented, but not always.

The query syntax is a bit weird, I would expect a single callback that receives the "needle" that triggered the callback, rather than having to specify a callback per needle. Take a look at object-scan for some inspiration (look at matchedBy, traversedBy, etc). This would also make the case much easier to handle where different callbacks are called for the same path and different "needles", but we only want to do processing once per path.

Agreed, I don't like it either - it's mostly so I could integrate it easily in Spectral without many code changes 😉.

Unfortunately it looks like multiple input paths mean multiple traversal: #7

Yep, again mostly due to the needs of stoplight/spectral.

The readme of object-scan is extensive. So I'm sure you could get a lot of inspiration just reading through that.

Yup, readme is certainly something I could improve here. :D

This was a fun exploration! Thank you and I hope you find this somewhat useful =)

Thanks a lot for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants