Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add IntArray #10

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

Sudha247
Copy link
Contributor

This PR adds an IntArray module, which supports parallel initialisation for Integer Arrays mentioned in #6. It uses @stedolan's prototype which uses strings as the base for IntArray.

Benchmarks

I have rewritten the parallel benchmarks quicksort_multicore.ml and mergesort_multicore.ml from sandmark to use Domainslib.IntArray proposed in this PR along with adding parallel initialisation to them. 4.10.0+multicore compiler variant was used to build and run the benchmarks.

Quicksort

n = 10_000_000

#Cores IntArray Array
1 6.326 8.352
2 5.076 7.136
4 3.345 5.576
8 2.984 3.879
12 5.517 3.044
16 2.716 3.046
20 4.690 2.419
24 9.327 2.327

Observations: The single core version of IntArray is faster compared to Array version and there is speedup till 8 cores, after which the IntArray version slows down.

When the number of cores is greater than or equal to 8, there's unusually high GC activity which I suspect explains the slowdown. More specifically, the overheads seem to be at caml_stw_empty_minor_heap and stw_handler. @stedolan mentioned that strings are not scanned by the GC, I'm not sure if that's someway related to the increased GC activity.

This is a part of the eventlog for execution on 24 cores. Most other processes look similar to the ones seen here.

image

Mergesort

n = 1_000_000

#Cores IntArray Array
1 1.83 0.491
2 0.972 0.296
4 0.530 0.201
8 0.298 0.141
12 0.230 0.122
16 0.191 0.135
20 0.177 0.120
24 0.160 0.120

Observations: Contrary to the quicksort benchmark, there is no surge in the GC activity in mergesort, which makes me all the more uncertain about the cause of it in the quicksort benchmark. The speedup of the IntArray version independently is quite close to what would be expected expected. But there is a huge slowdown on 1 core compared to the Array version. The overheads lie at set and get of IntArray (others are present in the Array version as well). This is a part of perf report on 1 core.

  32.25%  mergesort_multi  mergesort_multicore.exe  [.] camlDomainslib__Intarray__get_232
  26.00%  mergesort_multi  mergesort_multicore.exe  [.] camlMergesort_multicore__sort_364
  13.52%  mergesort_multi  mergesort_multicore.exe  [.] caml_apply2
  11.10%  mergesort_multi  mergesort_multicore.exe  [.] camlDomainslib__Intarray__set_279
   9.41%  mergesort_multi  mergesort_multicore.exe  [.] camlMergesort_multicore__loop_375
   4.57%  mergesort_multi  mergesort_multicore.exe  [.] caml_apply3
   1.25%  mergesort_multi  mergesort_multicore.exe  [.] camlStdlib__random__intaux_278

Would appreciate any insights on these benchmarks.

To-Do

The module needs addition of more Array functions such as fill, make, map, iter etc. and possibly some performance tuning. I shall keep updating them to this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Not actively worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants