Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using multi-threaded Jluna interface ends up with an error sometimes #69

Open
kouchy opened this issue Aug 23, 2024 · 3 comments
Open

Comments

@kouchy
Copy link

kouchy commented Aug 23, 2024

Hi @Clemapfel,

I am trying to use Jluna to support tasks written in Julia into a streaming library I'm working on (StreamPU).

I have the following minimal code:

#include <jluna.hpp>

int main(int argc, char** argv)
{
    jluna::initialize(3);
    
    const size_t n_tasks = 12;
    std::vector<jluna::Task<void>> tasks;
    std::function<void(const size_t)> func_exec = [](const size_t tid)
    {
        jluna::Base["println"]("lambda called with ", tid);
    };

    for (size_t tid = 0; tid < n_tasks; tid++)
    {
         tasks.push_back(jluna::ThreadPool::create(func_exec, tid));
         tasks.back().schedule();
    }
    for (size_t tid = 0; tid < n_tasks; tid++)
        tasks[tid].join();
        
    return 0;
}

Most of the time this code will print something like:

[JULIA][LOG] initialization successful (3 thread(s)).
lambda called with 5
lambda called with 6
lambda called with 8
lambda called with 0
lambda called with 7
lambda called with 1
lambda called with 2
lambda called with 10
lambda called with 9
lambda called with 4
lambda called with 3
lambda called with 11

But sometimes it failes with the following error:

[JULIA][LOG] initialization successful (3 thread(s)).
^[[Aterminate called after throwing an instance of 'jluna::JuliaException'
  what():  [JULIA][EXCEPTION] KeyError: key 0x0000773229fd79a0 not found
Stacktrace:
 [1] getindex
   @ ./dict.jl:498 [inlined]
 [2] get_reference(key::UInt64)
   @ Main.jluna.memory_handler ./none:597
 [3] safe_call(f::Function, args::UInt64)
   @ Main.jluna ./none:17
 [4] (::Main.jluna.cppcall.var"#3#4"{UInt64})()
   @ Main.jluna.cppcall ./none:828

[217830] signal (6.-6): Aborted
in expression starting at none:0
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x77323dca5ffd)
unknown function (ip: 0x77323dcbae9b)
_ZSt9terminatev at /lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
__cxa_throw at /lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
safe_call<_jl_value_t*> at /nfs/users/cassagnea-nfs/softwares/jluna/include/jluna/.src/safe_utilities.inl:56
_ZN5jluna6detail13get_referenceEm at /nfs/users/cassagnea-nfs/softwares/jluna/lib/libjluna.so.1.0.0 (unknown line)
_ZN5jluna5Proxy10ProxyValueC2EP11_jl_value_tRSt10shared_ptrIS1_ES3_ at /nfs/users/cassagnea-nfs/softwares/jluna/lib/libjluna.so.1.0.0 (unknown line)
_ZN5jluna5ProxyC2EP11_jl_value_tRSt10shared_ptrINS0_10ProxyValueEES2_ at /nfs/users/cassagnea-nfs/softwares/jluna/lib/libjluna.so.1.0.0 (unknown line)
operator[]<char> at /nfs/users/cassagnea-nfs/softwares/jluna/include/jluna/.src/proxy.inl:78 [inlined]
operator() at /nfs/users/cassagnea-nfs/workspace/devel/streampu_julia/tests/julia/simple_chain_julia.cpp:75 [inlined]
__invoke_impl<void, main(int, char**)::<lambda(size_t)>&, long unsigned int> at /usr/include/c++/13/bits/invoke.h:61 [inlined]
__invoke_r<void, main(int, char**)::<lambda(size_t)>&, long unsigned int> at /usr/include/c++/13/bits/invoke.h:111 [inlined]
_M_invoke at /usr/include/c++/13/bits/std_function.h:290
operator() at /usr/include/c++/13/bits/std_function.h:591 [inlined]
operator() at /nfs/users/cassagnea-nfs/softwares/jluna/include/jluna/.src/multi_threading.inl:345 [inlined]
__invoke_impl<_jl_value_t*, jluna::ThreadPool::create<long unsigned int>(const std::function<void(long unsigned int)>&, long unsigned int)::<lambda()>&> at /usr/include/c++/13/bits/invoke.h:61 [inlined]
__invoke_r<_jl_value_t*, jluna::ThreadPool::create<long unsigned int>(const std::function<void(long unsigned int)>&, long unsigned int)::<lambda()>&> at /usr/include/c++/13/bits/invoke.h:114 [inlined]
_M_invoke at /usr/include/c++/13/bits/std_function.h:290
_ZNKSt8functionIFP11_jl_value_tvEEclEv at /nfs/users/cassagnea-nfs/softwares/jluna/libjluna.so (unknown line)
jluna_invoke_from_task at /nfs/users/cassagnea-nfs/softwares/jluna/libjluna.so (unknown line)
#3 at ./none:828
unknown function (ip: 0x77323d468737)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/task.c:1238
Allocations: 2908 (Pool: 2899; Big: 9); GC: 0
Aborted (core dumped)

I am using a Zen 4 CPU with 8 cores on Ubuntu 24.04. I compiled Jluna from this repository (7b08c4f) and my Julia version is 1.10.4.

Do you have any ideas why this error happens? Should I make things differently? It worth mentioning that I'm not very experienced in Julia programming.

Many thanks in advance for any help.

@kouchy
Copy link
Author

kouchy commented Aug 23, 2024

I noticed that if you increase the number of created tasks from 12 to 200 or more, the error occurs significantly more often.

@k12Sergey
Copy link

k12Sergey commented Sep 1, 2024

I've got similar problem on Windows10 with mingw compiler and last julia version with the following code (its never worked till the end)

double juliaTest()
{
    using namespace jluna;

    std::function<Int64(Int64)> labdaCall = [](Int64 value){
        auto f = jluna::Main["f"];
        auto res = f(value);
        return res;
    };

    Int64 val = 5;

    auto task1 = ThreadPool::create(labdaCall, val);
    auto task2 = ThreadPool::create(labdaCall, val);
    auto task3 = ThreadPool::create(labdaCall, val);

    // start task
    task1.schedule();
    task2.schedule();
    task3.schedule();

    // // wait for task to finish
    task1.join();
    task2.join();
    task3.join();

    int res = task1.result().get().value() + task2.result().get().value() + task3.result().get().value();

    return res;
}
int main()
{
    jluna::initialize(8);
    /// declare function
    jluna::Main.safe_eval(
        R"(function f(x)
            return x*x
           end )");

    for(int i = 0; i != 100; ++i) {
        auto res = juliaTest();
        std::cout << res << std::endl;
    }

    return 0;
}

Code output

[JULIA][LOG] initialization successful (8 thread(s)). 75 75 75 75 75

Error message

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: UNKNOWN at 0x7ffb40e7cf19 -- RaiseException at C:\WINDOWS\System32\KERNELBASE.dll (unknown line)
in expression starting at none:1
RaiseException at C:\WINDOWS\System32\KERNELBASE.dll (unknown line)
_Unwind_RaiseException at /workspace/srcdir/gcc-13.2.0/libgcc\unwind-seh.c:334
__cxa_throw at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_throw.cc:93
safe_call<_jl_value_t*, _jl_value_t*> at C:/Program Files (x86)/jluna/include/jluna/.src\safe_utilities.inl:56
safe_call<long long int&> at C:/Program Files (x86)/jluna/include/jluna/.src\proxy.inl:93
operator()<long long int&> at C:/Program Files (x86)/jluna/include/jluna/.src\proxy.inl:115
operator() at D:/Work/julia_tests/jluna_multithread\main.cpp:14
__invoke_impl<jluna::Proxy, juliaTest()::<lambda(jluna::Int64)>&, long long int> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:61
__invoke_r<long long int, juliaTest()::<lambda(jluna::Int64)>&, long long int> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:116
_M_invoke at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:291
operator() at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:560
operator() at C:/Program Files (x86)/jluna/include/jluna/.src\multi_threading.inl:367
__invoke_impl<_jl_value_t*, jluna::ThreadPool::create<long long int, long long int>(const std::function<long long int(long long int)>&, long long int)::<lambda()>&> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:61
__invoke_r<_jl_value_t*, jluna::ThreadPool::create<long long int, long long int>(const std::function<long long int(long long int)>&, long long int)::<lambda()>&> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:114
_M_invoke at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:291
#3 at .\none:828
unknown function (ip: 000002ab38f89f4b)
jl_apply at C:/workdir/src\julia.h:1982 [inlined]
start_task at C:/workdir/src\task.c:1238
Allocations: 2909 (Pool: 2900; Big: 9); GC: 0

or this with some stack

[28716] signal (22): SIGABRT
in expression starting at none:0
crt_sig_handler at C:/workdir/src\signals-win.c:95
raise at C:\WINDOWS\System32\msvcrt.dll (unknown line)
abort at C:\WINDOWS\System32\msvcrt.dll (unknown line)
__verbose_terminate_handler at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\vterminate.cc:95
__terminate at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_terminate.cc:48
__cxa_call_terminate at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_call.cc:54
__gxx_personality_imp at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_personality.cc:688
_GCC_specific_handler at /workspace/srcdir/gcc-13.2.0/libgcc\unwind-seh.c:300
__gxx_personality_seh0 at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_personality.cc:810
_chkstk at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
RtlRaiseException at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
RtlRaiseException at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
RaiseException at C:\WINDOWS\System32\KERNELBASE.dll (unknown line)
_Unwind_RaiseException at /workspace/srcdir/gcc-13.2.0/libgcc\unwind-seh.c:334
__cxa_throw at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_throw.cc:93
_ZN5jluna9safe_callIJP11_jl_value_tEEES2_S2_DpT_ at C:\Program Files (x86)\jluna\bin\libjluna.dll (unknown line)
_ZNSt15_Sp_counted_ptrIPN5jluna5Proxy10ProxyValueELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv at C:\Program Files (x86)\jluna\bin\libjluna.dll (unknown line)
_ZN5jluna5ProxyD1Ev at C:\Program Files (x86)\jluna\bin\libjluna.dll (unknown line)
operator() at D:/Work/julia_tests/jluna_multithread\main.cpp:16
__invoke_impl<long long int, juliaTest()::<lambda(jluna::Int64)>&, long long int> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:61
__invoke_r<long long int, juliaTest()::<lambda(jluna::Int64)>&, long long int> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:114
_M_invoke at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:291
operator() at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:560
operator() at C:/Program Files (x86)/jluna/include/jluna/.src\multi_threading.inl:367
__invoke_impl<_jl_value_t*, jluna::ThreadPool::create<long long int, long long int>(const std::function<long long int(long long int)>&, long long int)::<lambda()>&> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:61
__invoke_r<_jl_value_t*, jluna::ThreadPool::create<long long int, long long int>(const std::function<long long int(long long int)>&, long long int)::<lambda()>&> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:114
_M_invoke at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:291
#3 at .\none:828
unknown function (ip: 0000023e836c9dab)
jl_apply at C:/workdir/src\julia.h:1982 [inlined]
start_task at C:/workdir/src\task.c:1238
Allocations: 2909 (Pool: 2900; Big: 9); GC: 0
terminate called after throwing an instance of 'jluna::JuliaException'
  what():  [JULIA][EXCEPTION] KeyError: key 0x0000000000000102 not found
Stacktrace:
 [1] getindex
   @ .\dict.jl:498 [inlined]
 [2] free_reference(key::UInt64)
   @ Main.jluna.memory_handler .\none:615
 [3] safe_call(f::Function, args::UInt64)
   @ Main.jluna .\none:17
 [4] (::Main.jluna.cppcall.var"#3#4"{UInt64})()
   @ Main.jluna.cppcall .\none:828

@k12Sergey
Copy link

k12Sergey commented Sep 1, 2024

Note that run a single Task in my code work correct.

But run juliaTest() with a single task

double juliaTest()
{
    using namespace jluna;

    std::function<Int64(Int64)> labdaCall =
    [](Int64 value){
        auto res = jluna::Main["f"](value);
        return value;
    };

    Int64 val = 5;

    auto task1 = ThreadPool::create(labdaCall, val);

    // start task
    task1.schedule();
    
    task1.join();

    return 1;
}


int main()
{
    jluna::initialize(5);
    /// declare function
    jluna::Main.safe_eval(
        R"(f(x) = x*x)");

    for(int i = 0; i != 10000; ++i) {
        std::cout << i << std::endl;
        std::jthread t1(&juliaTest);
    }

    return 0;
}

called in jthread crash with the following error

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffa5a78373d -- ijl_excstack_state at C:/workdir/src\rtutils.c:307
in expression starting at none:0
Allocations: 2909 (Pool: 2900; Big: 9); GC: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants