Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BB-63: Improving Composability of GrB_init() and GrB_finalize() #28

Open
mcmillan03 opened this issue Sep 3, 2021 · 1 comment
Open

Comments

@mcmillan03
Copy link
Member

mcmillan03 commented Sep 3, 2021

Creating this issue based on our discussion of issue BB-10: [(https://bitbucket.org/aydozz/graph-blas-spec/issues/10/clarify-grb_init-and-grb_finalize-errors)]

Suppose there are two higher-level libraries, A and B, that both use GraphBLAS internally, opaque to the user.

A::init();
B::init();

auto a = A::DataType();
...

B::finalize();
A::finalize();

Currently, GrB_initmay only be called once. If B attempts to call GrB_initafter A has already initialized a GraphBLAS context, this will result in undefined behavior. Two possible solutions that could allow this code to work include 1) adding GrB_initializedand GrB_finalizedfunctions, allowing B to check whether there is a GraphBLAS context, or else 2) allowing GrB_init and GrB_finalize to be called more than once.

The current workaround for this would likely be to have the user call GrB_init, then pass any necessary information about the context into A::init() and B::init().

In our call this week, Jose pointed out an issue arising when high-level libraries are used in a non-overlapping way.

A::init();
A::finalize();

B::init();
B::finalize();

Currently, similar to MPI and OpenSHMEM, GraphBLAS does not allow initializing a new context after GrB_finalize has been called. Supporting this use case would likely require allowing GrB_init to be called more than once in an application. Otherwise, the user will always need to initialize GraphBLAS themselves.

@DrTimothyAldenDavis
Copy link
Member

I currently use GrB_init and GrB_finalize to initialize some global space, and nesting them like this will cause failures.

Internally, I have a statically allocated global array of size 64, of pointers to void *. I use this as a memory pool, where the kth item in the array is a head pointer to a link list of blocks of size exactly 2^k. These blocks came from malloc, and were then "freed" by me but instead of freeing them I stick them in this pool. I only do this for small blocks. This cuts down on "malloc a tiny block; free the tiny block ; malloc a tiny block ; free a tiny block", which is slow. This kind of churn occurs for things like BFS on the Road graph in the GAP benchmark.

GrB_init sets this array to all NULL, and GrB_finalize walks through the array, and each link list, and finally frees all the blocks.

Now suppose GrB_init / finalize is nested. Do the applications A and B have their own global space for this free pool? All calls to GrB_*methods will access this free pool but if the 2nd GrB_init comes along, would it wipe out this free pool?

The 2nd GrB_init could instead note that GrB_init has already been called, and silently do nothing (and not return an error).

The 1st GrB_finalize could then free all these free pools, but then I would have to permit future calls to GrB methods. I could do that if you like. Freeing the set of free pools is threadsafe.

I do many more things in my GrB_init such as:
(1) set the malloc/calloc/realloc/free. Woe to the applications A and B if they each want their own memory managers ..
(2) I set the mode, blocking/nonblocking. Woe if A and B want different modes.
(3) I set the max # of threads to use, the default format (by row or by col). Woe if A and B want different formats (A wants by-row as the default and B wants by-col).
(4) I clear counters I use for debugging memory management problems ... these will fail abysmally if A still has memory allocated when B calls finalized.

I can work around many of these issues if you need me to, but not the malloc/calloc/realloc/free. I cannot let A use one set of managers and B another. I think I can work through all the other issues, even my free pool (A:init can clear it, B:init can see it is already initialized and not touch it; B:finalize can free it all, which is OK if A keeps going, and then A:finalize can free the free pool yet again). A and B would have to agree on all kinds of global settings, like the mode (GrB_BLOCKING and GrB_NONBLOCKING).

However this is solved, you'll need to consider how I'm using the free pools, if you want to change the semantics of GrB_init and GrB_finalize to support multiple calls to them, nested or otherwise.

One solution is for GrB_init to not return GrB_INVALID_VALUE if it has already been called. Instead, it could return GrB_SUCCESS and do nothing, since GraphBLAS is already initialized. See this test:

https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/905b1d54bef971db70180933454f65ab1f6f364b/Source/GB_init.c#L65

I know if GrB_init has even been called, since I have a global flag that starts as false, and gets set by GrB_init. This could be used to allow B:init to be done, after A:init. The call B:init would do nothing at all.

Then B:finalize can safely free the memory pools I have. This is thread-safe. Then A can keep working and restock the memory pool as it works, and then safely do A:finalize.

In this manner, I could support any mixture like

A:init
B:init
C:init
B:finalize
A:finalize
D:init
C:finalize

or whatever mix you like. The A:init would do all the work. The other inits would do nothing. All calls to finalize would free my internal pools but this could be made thread-safe, even if other threads are working with the pool.

But would this be OK with other implementations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants