Add options to specify cache type #338

stevefan1999-personal · 2023-04-05T09:34:28Z

Part 3 of #294 series, and this PR focuses on letting the user to specify their own cache type under context such as no_std or when the user needs to explicitly state the kind of cache type they wanted to use. This means users can now supply LinearMap in heapless - Rust (docs.rs), SgMap in scapegoat - Rust (docs.rs), or even Simsys/fchashmap: A fixed capacity no_std hashmap (github.com).

As I tried before I want to consolidate the cache interface into one facade trait stated here but I quickly found that I would have to use a Box to wrap the dynamic trait -- which is not acceptable in no_std environment. This means we have to resort to using "duck typing" that we have to specify concrete type in the cache type definition. The cache type must follow the following "trait protocol":

pub trait GrammarRuleCache<T> {
    fn insert(&mut self, k: usize, v: T);
    fn get(&self, k: usize) -> Option<&T>;
}

And the type must implement Default

This PR is based on #336 and #337 as a prerequisite.

stevefan1999-personal · 2023-04-10T11:18:05Z

@kevinmehall Umm the test failed because it was running under no_std context, and indeed there is no println macro. Wonder if we should also take this out too and instead of printing to the console, let's put them all together into a Vec<String> or let's even structuralize it. And then we could print it out altogether in the end. This could also help external compiler tracing, but I would like to keep this into another PR.

kevinmehall · 2023-04-15T17:46:21Z

A couple thoughts / concerns with this:

A suggested use case is switching to data structures with limited / fixed capacity. What happens if the capacity is filled? #[cache] could potentially evict items, but I'm not sure #[cache_left_rec] can drop items and preserve correctness.
Are there use cases where individual rules may want different cache data structures? If not, would it make more sense to configure this grammar-wide rather than repeating it every time the cache is specified?
Another related case for this is still using HashMap, but replacing the hasher with a faster one. I think the default SipHash being slow is part of the reason that #[cache] is rarely a win in my experience. I think that's doable with this design, but requires first defining a type alias. Not terrible, but not necessarily obvious. I can't think of a better way to do that though since the macro supplies the key and value type parameters. If the syntax involved specifying the full type explicitly including all type params, it would expose the internal RuleResult type and require more repetition.
Do you have a concrete use case where you want caching on no_std right now, or is this just for completeness? I'm having a hard time imagining something that you'd want to parse on an embedded system that would greatly benefit from caching vs just restructuring the grammar to reduce redundancy.

I wouldn't mind just skipping the no_std tests when the trace feature is enabled. (#![cfg(not(feature = "trace")] or similar)

stevefan1999-personal · 2023-04-16T09:40:05Z

A suggested use case is switching to data structures with limited / fixed capacity. What happens if the capacity is filled? #[cache] could potentially evict items, but I'm not sure #[cache_left_rec] can drop items and preserve correctness.

Maybe we will need to define the cache behavior. I think we should first try to insert the entry, and if not we should execute the rest of the production rule as usual.

Are there use cases where individual rules may want different cache data structures? If not, would it make more sense to configure this grammar-wide rather than repeating it every time the cache is specified?

I theorize the following situation: because hash-based containers are based on cost amortization (that means the algorithmic cost to a certain operation would be asymptomatically closer to the big-O once it go large), it will struggle a lot on if you have little to no entries but big batches of repeated stuff. This is because the smaller the hash structure, the more the chance to require hash map expansion and rehashing.

Given we have a large amount of recursive parsing calls in a PEG parser, so I suggest using a binary tree-based map would yield a nice improvement on the expected performance/throughput over a hash one, because the behavior would be much more stable and deterministic since a self-balancing binary tree should take O(log n) time for any operations (hash map can achieve O(1) only if amortized, so it can be closer to O(n) for the first 512 entries-ish idk).

...Well on the other hand, also theoretically, O(log n) is by definition slower than O(1), so maybe a benchmark is still needed to test it through. We could also test if we FNV, SipHash, Farmhash, compared with B+ tree, Scapegoat tree or even a naive index map (that is just a linear array and you do linear search) could make a difference.

Do you have a concrete use case where you want caching on no_std right now, or is this just for completeness? I'm having a hard time imagining something that you'd want to parse on an embedded system that would greatly benefit from caching vs just restructuring the grammar to reduce redundancy.

It's for completeness. I use peg to parse serial console command and sometimes ACPI bytecode. Although both of them often work fine without cache so far, but they are for embedded use, and I can't risk to use a hash map on deterministic ground.

stevefan1999-personal · 2023-04-18T04:21:54Z

I also wanted to test how well it work with the cache that does cache replacement, i.e. LRU, TinyLFU and so on.

stevefan1999-personal added 3 commits April 5, 2023 17:10

add options to specify cache type

42e9fbb

Merge remote-tracking branch 'upstream/master' into patch-no-std-macros

d4dd5cf

fold the similar lines into optional matching for cache

ec4a36b

stevefan1999-personal mentioned this pull request Apr 10, 2023

Use tracing rather than println? #344

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add options to specify cache type #338

Add options to specify cache type #338

stevefan1999-personal commented Apr 5, 2023 •

edited

Loading

stevefan1999-personal commented Apr 10, 2023

kevinmehall commented Apr 15, 2023

stevefan1999-personal commented Apr 16, 2023

stevefan1999-personal commented Apr 18, 2023

Add options to specify cache type #338

Are you sure you want to change the base?

Add options to specify cache type #338

Conversation

stevefan1999-personal commented Apr 5, 2023 • edited Loading

stevefan1999-personal commented Apr 10, 2023

kevinmehall commented Apr 15, 2023

stevefan1999-personal commented Apr 16, 2023

stevefan1999-personal commented Apr 18, 2023

stevefan1999-personal commented Apr 5, 2023 •

edited

Loading