Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Extend POINTER transfer to any RANGE variable in a NRN_THREAD #772

Merged
merged 18 commits into from
Mar 10, 2022

Conversation

nrnhines
Copy link
Collaborator

@nrnhines nrnhines commented Feb 7, 2022

Prior to this, POINTER was restricted to point to voltage.
This change depends on neuronsimulator/nrn#1622
Requires bbcore_write_version 1.5
The added test on the NEURON side requires merge of #748

Many CI tests fail because file mode test data has not been updated to bbcore_write_version 1.5. I need help or instructions on how to update that test data. Edit: was updated.

See neuronsimulator/nrn for test.

CI_BRANCHES:NEURON_BRANCH=hines/POINTER-to-RANGE,

@nrnhines
Copy link
Collaborator Author

nrnhines commented Feb 8, 2022

The nrn/test/coreneuron/mod/axial.inc file which is used in axial.mod and AxialPP.mod suffers from a GPU or loop vectorized race condition at the statement

       pim = pim - ia : child contributions

where pim is a POINTER to im some other instance of axial or AxialPP (located in the parent compartment).
I would normally write

PROTECT pim = pim - ia : child contributions

which surrounds the statement with MUTEXLOCK and MUTEXUNLOCK but that is only for pthreads. Another case is #768.
Some help with how NMODL and/or mod2c can make such statements atomic is solicited. Presently, when using a compiler where _PRAGMA_FOR_VECTOR_LOOP_ actually takes effect, I have to comment out that pragma manually but that is not good at all for the GPU.

@codecov-commenter
Copy link

codecov-commenter commented Feb 26, 2022

Codecov Report

Merging #772 (f8ffdca) into master (9b6d29c) will decrease coverage by 0.56%.
The diff coverage is 16.88%.

❗ Current head f8ffdca differs from pull request most recent head 15a3d1e. Consider uploading reports for the commit 15a3d1e to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #772      +/-   ##
==========================================
- Coverage   56.01%   55.45%   -0.57%     
==========================================
  Files         108      108              
  Lines        9005     9107     +102     
==========================================
+ Hits         5044     5050       +6     
- Misses       3961     4057      +96     
Impacted Files Coverage Δ
coreneuron/io/nrn_checkpoint.cpp 4.21% <0.00%> (-0.17%) ⬇️
coreneuron/io/phase2.hpp 66.66% <ø> (ø)
...eneuron/io/reports/report_configuration_parser.cpp 0.87% <0.00%> (-0.01%) ⬇️
coreneuron/io/reports/sonata_report_handler.cpp 25.00% <ø> (ø)
coreneuron/io/reports/sonata_report_handler.hpp 0.00% <0.00%> (ø)
coreneuron/utils/nrnoc_aux.cpp 26.56% <ø> (ø)
coreneuron/permute/node_permute.cpp 62.69% <6.89%> (-21.23%) ⬇️
coreneuron/io/phase2.cpp 64.91% <31.42%> (-1.50%) ⬇️
coreneuron/apps/main1.cpp 46.74% <37.50%> (-0.57%) ⬇️
coreneuron/io/output_spikes.cpp 88.97% <100.00%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b6d29c...15a3d1e. Read the comment docs.

@nrnhines nrnhines marked this pull request as draft February 26, 2022 19:12
@olupton
Copy link
Contributor

olupton commented Feb 28, 2022

I believe merging master into this branch would fix the GitLab CI, and #784 will fix the test-as-submodule one. (where by "fix" I mean fix the current failures for technical reasons)

@olupton
Copy link
Contributor

olupton commented Feb 28, 2022

I merged master into this branch after I merged #784.

@olupton
Copy link
Contributor

olupton commented Mar 2, 2022

Sorry, this needs master merged into it again. Had to iron out some more wrinkles in the GitLab CI.

@nrnhines
Copy link
Collaborator Author

nrnhines commented Mar 5, 2022

As mentioned in neuronsimulator/nrn#1622 (comment) I'd like to explicitly support the handling of multiple BEFORE SETUP blocks in a single mod file. Although not really relevant to the POINTER topic of this pull request, the new checkpoint test on the NEURON side is the easiest way to test multiple BEFORE SETUP support and that support is likely to require some minor code changes on this CoreNEURON side as well. So unless we can get this PR merged to the master without too much delay, I can make the changes here ...

@pramodk
Copy link
Collaborator

pramodk commented Mar 7, 2022

@nrnhines : I am going to review & fix the failing tests under gitlab today. May be better to start a new branch from this branch?

@pramodk
Copy link
Collaborator

pramodk commented Mar 7, 2022

@nrnhines : Gitlab CI failing with following error. Is this expected? Otherwise I will take a look:

============================= test session starts ==============================
[226](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L226)platform linux -- Python 3.9.7, pytest-6.2.4, py-1.9.0, pluggy-0.13.0
[227](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L227)rootdir: /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41709/J178281/spack-build/spack-stage-neuron-develop-47kwkdqbfahzhd6mgo3d7lvidjenlcwo/spack-build-47kwkdq/test/coreneuron_modtests/test_pointer_py_cpu
[228](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L228)plugins: cov-2.8.1
[229](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L229)collected 2 items
[230](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L230)test/coreneuron/test_pointer.py numprocs=1
[231](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L231)[REPORTS] [info] :: Initializing PARALLEL implementation...
[232](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L232)[REPORTS] [info] :: Initializing PARALLEL implementation...
[233](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L233).[REPORTS] [info] :: Initializing PARALLEL implementation...
[234](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L234)[REPORTS] [info] :: Initializing PARALLEL implementation...
[235](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L235)[REPORTS] [info] :: Initializing PARALLEL implementation...
[236](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L236)[REPORTS] [info] :: Initializing PARALLEL implementation...
[237](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L237)--------------------
[238](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L238)rm -r -f coredat
[239](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L239)cell_permute  0
[240](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L240)--------------------
[241](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L241)x86_64/special-core -d coredat --voltage 1000 --verbose 0 --cell-permute 0 --tstop 10 -o coredat
[242](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L242)F
[243](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L243)=================================== FAILURES ===================================
[244](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L244)_______________________________ test_checkpoint ________________________________
[245](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L245)    def test_checkpoint():
[246](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L246)        if pc.nhost() > 1:
[247](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L247)            return
[248](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L248)    
[249](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L249)        # clear out the old
[250](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L250)        srun("rm -r -f coredat")
[251](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L251)    
[252](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L252)        m = Model(5, 5)
[253](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L253)        # file mode CoreNEURON real cells need gids
[254](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L254)        for i, cell in enumerate(m.cells):
[255](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L255)            pc.set_gid2node(i, pc.id())
[256](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L256)            sec = cell.secs[0]
[257](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L257)            pc.cell(i, h.NetCon(sec(0.5)._ref_v, None, sec=sec))
[258](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L258)    
[259](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L259)        # "integrate" fabs(ia) in each axial_pp and use that as a source of spikes
[260](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L260)        # for actually testing that the checkpoint is working with POINTER.
[261](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L261)        # Would be much better if I knew how to get file mode coreneuron to
[262](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L262)        # print trajectories.
[263](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L263)        for i, p in enumerate(h.List("AxialPP")):
[264](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L264)            pc.set_gid2node(i + 100, pc.id())
[265](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L265)            pc.cell(i + 100, h.NetCon(p, None))
[266](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L266)    
[267](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L267)        spktime = h.Vector()
[268](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L268)        spkgid = h.Vector()
[269](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L269)        pc.spike_record(-1, spktime, spkgid)
[270](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L270)        cvode.cache_efficient(1)
[271](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L271)        pc.set_maxstep(10)
[272](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L272)        h.finitialize(-65)
[273](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L273)        pc.nrncore_write("coredat")
[274](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L274)    
[275](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L275)        # do a NEURON run to record spikes
[276](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L276)        def run(tstop):
[277](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L277)            pc.set_maxstep(10)
[278](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L278)            h.finitialize(-65)
[279](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L279)            pc.psolve(tstop)
[280](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L280)    
[281](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L281)        run(10)
[282](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L282)        spikes_std = sortspikes(spktime, spkgid)
[283](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L283)    
[284](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L284)        # Does it work in direct mode?
[285](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L285)        from neuron import coreneuron
[286](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L286)    
[287](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L287)        coreneuron.enable = True
[288](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L288)        for perm in [0, 1]:
[289](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L289)            coreneuron.cell_permute = perm
[290](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L290)            run(5)
[291](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L291)            pc.psolve(10)
[292](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L292)            spikes = sortspikes(spktime, spkgid)
[293](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L293)            assert spikes_std == spikes
[294](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L294)        coreneuron.enable = False
[295](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L295)    
[296](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L296)        # standard to compare with checkpoint series
[297](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L297)        tpnts = [5.0, 10.0]
[298](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L298)        for perm in [0, 1]:
[299](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L299)            print("\n\ncell_permute ", perm)
[300](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L300)            common = "-d coredat --voltage 1000 --verbose 0 --cell-permute %d" % (perm,)
[301](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L301)            # standard full run
[302](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L302)>           runcn(common + " --tstop %g" % float(tpnts[-1]) + " -o coredat")
[303](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L303)test/coreneuron/test_pointer.py:273: 
[304](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L304)_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[305](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L305)test/coreneuron/test_pointer.py:144: in runcn
[306](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L306)    srun(cmd)
[307](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L307)test/coreneuron/test_pointer.py:136: in srun
[308](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L308)    subprocess.run(cmd, shell=True).check_returncode()
[309](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L309)_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[310](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L310)self = CompletedProcess(args='x86_64/special-core -d coredat --voltage 1000 --verbose 0 --cell-permute 0 --tstop 10 -o coredat', returncode=-11)
[311](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L311)    def check_returncode(self):
[312](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L312)        """Raise CalledProcessError if the exit code is non-zero."""
[313](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L313)        if self.returncode:
[314](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L314)>           raise CalledProcessError(self.returncode, self.args, self.stdout,
[315](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L315)                                     self.stderr)
[316](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L316)E           subprocess.CalledProcessError: Command 'x86_64/special-core -d coredat --voltage 1000 --verbose 0 --cell-permute 0 --tstop 10 -o coredat' died with <Signals.SIGSEGV: 11>.
[317](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L317)/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/python-3.9.7-yj5alh/lib/python3.9/subprocess.py:460: CalledProcessError
[318](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L318)----------- coverage: platform linux, python 3.9.7-final-0 -----------
[319](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L319)Coverage XML written to file coverage.xml
[320](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L320)=========================== short test summary info ============================
[321](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L321)FAILED test/coreneuron/test_pointer.py::test_checkpoint - subprocess.CalledPr...
[322](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L322)==================== 1 failed, 1 passed in 63.51s (0:01:03) ====================

@nrnhines
Copy link
Collaborator Author

nrnhines commented Mar 7, 2022

That is not supposed to fail. Seems like stderr and stdout needs to be printed.

@pramodk
Copy link
Collaborator

pramodk commented Mar 7, 2022

Ok thanks. On my local machine I am also not able to reproduce. I will check the CI build tomorrow morning.

@pramodk pramodk closed this Mar 8, 2022
@pramodk pramodk reopened this Mar 8, 2022
@nrnhines
Copy link
Collaborator Author

nrnhines commented Mar 8, 2022

@pramodk My attempt to get more information with neuronsimulator/nrn@f7be90f wasn't fruitful as there was no stderr/stdout text. So all we know is that there is a segfault.

@pramodk
Copy link
Collaborator

pramodk commented Mar 8, 2022

Sorry for delay @nrnhines! Didn't get time earlier today to look into this.

I didn't debug thoroughly but at least quickly able to reproduce the issue by using binaries + datasets created in CI. It seems like related to our reportinlibg library linked to CoreNEURON.

Here is what I did:

# copy failed test directory
cp -r /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/J179059/spack-build/spack-stage-neuron-develop-47kwkdqbfahzhd6mgo3d7lvidjenlcwo/spack-build-47kwkdq/test/coreneuron_modtests/test_pointer_py_cpu .
cd test_pointer_py_cpu/

# for testing, allocate some cpus
$ salloc -A proj16 -N 1 --constraint=cpu -n 2  -p prod

# see segfault
kumbhar@r1i7n20:~/tmp/test_pointer_py_cpu$ ./x86_64/special-core -d coredat/

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 1.0 c4f1e5bc (2022-03-07 16:58:53 +0100)

 Additional mechanisms from files
 axial.mod axial_pp.mod bacur.mod banocur.mod exp2syn.mod expsyn.mod fornetcon.mod hh.mod invlfire.mod natrans.mod netmove.mod netstim.mod passive.mod pattern.mod sample.mod stim.mod svclmp.mod unitstest.mod watchrange.mod

 Memory (MBs) :             After mk_mech : Max 10.0000, Min 10.0000, Avg 10.0000
 Memory (MBs) :            After MPI_Init : Max 10.0000, Min 10.0000, Avg 10.0000
 Memory (MBs) :          Before nrn_setup : Max 10.1172, Min 10.1172, Avg 10.1172
 Setup Done   : 0.00 seconds
 Model size   : 35.18 kB
 Memory (MBs) :          After nrn_setup  : Max 10.4375, Min 10.4375, Avg 10.4375
GENERAL PARAMETERS
--mpi=false
--mpi-lib=
--gpu=false
--dt=0.025
--tstop=100

GPU
--nwarp=65536
--cell-permute=0
--cuda-interface=false

INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=coredat/
--filesdat=files.dat
--pattern=
--report-conf=
--restore=

PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false

SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=false
--spk_compress=0
--binqueue=false

CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=6.3
--mindelay=10
--report-buffer-size=4

OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=

 Start time (t) = 0

 Memory (MBs) :  After mk_spikevec_buffer : Max 10.4375, Min 10.4375, Avg 10.4375
 Memory (MBs) :     After nrn_finitialize : Max 10.4375, Min 10.4375, Avg 10.4375
Segmentation fault

# gdb says it's report related 
$ gdb --args ./x86_64/special-core -d coredat/
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/J179059/spack-build/spack-stage-neuron-develop-47kwkdqbfahzhd6mgo3d7lvidjenlcwo/spack-build-47kwkdq/test/nrnivmodl/f37a8662f1006c013843754879ab3cc44ed227d607809d6e2bc1806460d64447/x86_64/special-core...done.
(gdb) r
Starting program: /gpfs/bbp.cscs.ch/home/kumbhar/tmp/test_pointer_py_cpu/./x86_64/special-core -d coredat/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_applications/install_intel-2021.4.0-skylake/libsonata-report-1.1-nfrzrl/lib/libsonatareport.so]
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6]
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libgcc_s.so.1]

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 1.0 c4f1e5bc (2022-03-07 16:58:53 +0100)

...
 Memory (MBs) :     After nrn_finitialize : Max 10.4453, Min 10.4453, Avg 10.4453

Program received signal SIGSEGV, Segmentation fault.
MPI_SGI_comm_rank (comm=1140850688) at ../../../../include/comm.h:216
216	../../../../include/comm.h: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64
(gdb) bt
#0  MPI_SGI_comm_rank (comm=1140850688) at ../../../../include/comm.h:216
#1  PMPI_Comm_rank (comm=1140850688, rank=0x7fffffff4240) at comm_rank.c:93
#2  0x00007fffeda707ec in AllReports::makeGlobalCommunicator (this=0x7fffedb05908 <_rtld_local+2312>) at /nvme/bbpcihpcdeploy/160693/spack-stage/spack-stage-reportinglib-2.5.6-gdhqypawxbwwjg2iq3g6gd6r6q3civat/spack-src/reportinglib/AllReports.cpp:474
#3  0x00007fffed7591c6 in coreneuron::setup_report_engine (dt_report=2147483647, mindelay=10) at ../spack-src/coreneuron/io/reports/nrnreport.cpp:57
#4  0x00007fffed6d0078 in run_solve_core (argc=3, argv=0x7fffffff45f8) at ../spack-src/coreneuron/apps/main1.cpp:609
#5  0x00007fffedadf702 in solve_core (argc=3, argv=0x7fffffff45f8) at ../../../../../../../software/install_intel-2021.4.0-skylake/coreneuron-develop-hsyhen/share/coreneuron/enginemech.cpp:49
#6  0x0000000000403293 in main (argc=3, argv=0x7fffffff45f8) at /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/software/install_intel-2021.4.0-skylake/coreneuron-develop-hsyhen/share/coreneuron/coreneuron.cpp:14
(gdb) quit

# enabling MPI doesn't solve the issue completely

$ srun -n 1 ./x86_64/special-core -d coredat/ --mpi
 num_mpi=1


 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 1.0 c4f1e5bc (2022-03-07 16:58:53 +0100)

 Additional mechanisms from files
 axial.mod axial_pp.mod bacur.mod banocur.mod exp2syn.mod expsyn.mod fornetcon.mod hh.mod invlfire.mod natrans.mod netmove.mod netstim.mod passive.mod pattern.mod sample.mod stim.mod svclmp.mod unitstest.mod watchrange.mod

 Memory (MBs) :             After mk_mech : Max 12.5664, Min 12.5664, Avg 12.5664
 Memory (MBs) :            After MPI_Init : Max 12.5664, Min 12.5664, Avg 12.5664
 Memory (MBs) :          Before nrn_setup : Max 12.8008, Min 12.8008, Avg 12.8008
 Setup Done   : 0.00 seconds
 Model size   : 35.18 kB
 Memory (MBs) :          After nrn_setup  : Max 13.1133, Min 13.1133, Avg 13.1133
GENERAL PARAMETERS
--mpi=true
--mpi-lib=
--gpu=false
--dt=0.025
--tstop=100

GPU
--nwarp=65536
--cell-permute=0
--cuda-interface=false

INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=coredat/
--filesdat=files.dat
--pattern=
--report-conf=
--restore=

PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false

SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=false
--spk_compress=0
--binqueue=false

CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=6.3
--mindelay=10
--report-buffer-size=4

OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=

 Start time (t) = 0

 Memory (MBs) :  After mk_spikevec_buffer : Max 13.1133, Min 13.1133, Avg 13.1133
 Memory (MBs) :     After nrn_finitialize : Max 13.1133, Min 13.1133, Avg 13.1133
[REPORTS] [info] :: Initializing PARALLEL implementation...

psolve |=========================================================| t: 100.00 ETA: 0h00m01s

Solver Time : 0.127889


 Simulation Statistics
 Number of cells: 5
 Number of compartments: 163
 Number of presyns: 46
 Number of input presyns: 0
 Number of synapses: 0
 Number of point processes: 46
 Number of transfer sources: 0
 Number of transfer targets: 0
 Number of spikes: 330
 Number of spikes with non negative gid-s: 330
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: node_id is 0 and input data is reported as 1-based
MPT ERROR: Rank 0(g:0) received signal SIGABRT/SIGIOT(6).
	Process ID: 259074, Host: r1i7n20, Program: /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/J179059/spack-build/spack-stage-neuron-develop-47kwkdqbfahzhd6mgo3d7lvidjenlcwo/spack-build-47kwkdq/test/nrnivmodl/f37a8662f1006c013843754879ab3cc44ed227d607809d6e2bc1806460d64447/x86_64/special-core
	MPT Version: HPE HMPT 2.25  10/22/21 03:18:39

MPT: --------stack traceback-------
MPT: Attaching to program: /proc/259074/exe, process 259074
MPT: [Thread debugging using libthread_db enabled]
MPT: Using host libthread_db library "/lib64/libthread_db.so.1".
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: 0x00002aaaab17b1d9 in waitpid () from /lib64/libpthread.so.0
MPT: Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64 libibverbs-54mlnx1-1.54103.x86_64 libnl3-3.2.28-4.el7.x86_64
MPT: (gdb) #0  0x00002aaaab17b1d9 in waitpid () from /lib64/libpthread.so.0
MPT: #1  0x00002aaaab4be566 in mpi_sgi_system (
MPT: #2  MPI_SGI_stacktraceback (
MPT:     header=header@entry=0x7fffffff2ad0 "MPT ERROR: Rank 0(g:0) received signal SIGABRT/SIGIOT(6).\n\tProcess ID: 259074, Host: r1i7n20, Program: /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/J179059/spack-build/spack-stage-neuro"...) at sig.c:340
MPT: #3  0x00002aaaab4be75f in first_arriver_handler (signo=signo@entry=6,
MPT:     stack_trace_sem=stack_trace_sem@entry=0x2aaab09a0080) at sig.c:489
MPT: #4  0x00002aaaab4bea33 in slave_sig_handler (signo=6, siginfo=<optimized out>,
MPT:     extra=<optimized out>) at sig.c:565
MPT: #5  <signal handler called>
MPT: #6  0x00002aaaabd05387 in raise () from /lib64/libc.so.6
MPT: #7  0x00002aaaabd06a78 in abort () from /lib64/libc.so.6
MPT: #8  0x00002aaaab85e88a in __gnu_cxx::__verbose_terminate_handler() [clone .cold] ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6
MPT: #9  0x00002aaaab86a2fa in __cxxabiv1::__terminate(void (*)()) ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6
MPT: #10 0x00002aaaab86a365 in std::terminate() ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6
MPT: #11 0x00002aaaab86a5f9 in __cxa_throw ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6
MPT: #12 0x00002aaaaac096f1 in bbp::sonata::SonataData::convert_gids_to_sonata(std::vector<unsigned long, std::allocator<unsigned long> >&, unsigned long) ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_applications/install_intel-2021.4.0-skylake/libsonata-report-1.1-nfrzrl/lib/libsonatareport.so
MPT: #13 0x00002aaaaac09ecb in bbp::sonata::SonataData::write_spikes_header(bbp::sonata::Population&) ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_applications/install_intel-2021.4.0-skylake/libsonata-report-1.1-nfrzrl/lib/libsonatareport.so
MPT: #14 0x00002aaaaac0973a in bbp::sonata::SonataData::write_spike_populations() ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_applications/install_intel-2021.4.0-skylake/libsonata-report-1.1-nfrzrl/lib/libsonatareport.so
MPT: #15 0x00002aaaaafb43b1 in _INTERNALb49cba43::coreneuron::output_spikes_parallel
MPT:     (outpath=0x7fffffff43d0 ".", filename=0x2aaaab073968 "out",
MPT:     population_name_offset=std::vector of length 0, capacity 0)
MPT:     at ../spack-src/coreneuron/io/output_spikes.cpp:216
MPT: #16 0x00002aaaaafb4b1f in coreneuron::output_spikes (
MPT:     outpath=0x7fffffff43d0 ".",
MPT:     population_name_offset=std::vector of length 0, capacity 0)
MPT:     at ../spack-src/coreneuron/io/output_spikes.cpp:292
MPT: #17 0x00002aaaaaf59237 in run_solve_core (argc=4, argv=0x7fffffff4608)
MPT:     at ../spack-src/coreneuron/apps/main1.cpp:648
MPT: #18 0x00002aaaaaadc702 in solve_core (argc=4, argv=0x7fffffff4608)
MPT:     at ../../../../../../../software/install_intel-2021.4.0-skylake/coreneuron-develop-hsyhen/share/coreneuron/enginemech.cpp:49
MPT: #19 0x0000000000403293 in main (argc=4, argv=0x7fffffff4608)
MPT:     at /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/software/install_intel-2021.4.0-skylake/coreneuron-develop-hsyhen/share/coreneuron/coreneuron.cpp:14
MPT: (gdb) A debugging session is active.
MPT: 

We know now where to look at. So tomorrow we should be able to track this on our side better. cc: @jorblancoa @olupton

@nrnhines
Copy link
Collaborator Author

nrnhines commented Mar 9, 2022

Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4)

Doesn't seem like a segfault. However maybe "5" is numerologically significant as this PR bumps the write version from 1.4 to 1.5 :)

@pramodk
Copy link
Collaborator

pramodk commented Mar 9, 2022

Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4)

I think this warning/error is not relevant, it's complaining because GDB version is too old and doesn't support DWARF used in the binary. Loading newer GDB module removes this message.

@olupton
Copy link
Contributor

olupton commented Mar 9, 2022

I think this is related to neuronsimulator/nrn#1619

@pramodk pramodk requested a review from iomaganaris March 9, 2022 11:39
@nrnhines
Copy link
Collaborator Author

hooray!

Copy link
Collaborator

@pramodk pramodk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't have major comments but reading code related permutations aspects reminds how critical is to simplify the implementation with same data structures between NEURON and CoreNEURON. CoreNEURON should just do compute aspects...!

After 8.1 release, we should definitely revive our summer discussions and continue on major refactoring aspects that we were discussing (including C++ migration PRs like neuronsimulator/nrn/pull/1597).

Copy link
Contributor

@iomaganaris iomaganaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Just a single suggestion

coreneuron/permute/node_permute.cpp Outdated Show resolved Hide resolved
@pramodk pramodk merged commit f72026d into master Mar 10, 2022
@pramodk pramodk deleted the hines/POINTER-to-RANGE branch March 10, 2022 16:40
pramodk pushed a commit to neuronsimulator/nrn that referenced this pull request Nov 2, 2022
…ain/CoreNeuron#772)

* Extend POINTER from voltage to any RANGE variable.
* trajectory recording is after AFTER_SOLVE
  and for consistency with NEURON, after BEFORE_STEP as well.
* update coreneuron ringtest integration data to version 1.5
* Handle the checkpoint for the POINTER
* Initialize reporting interface only if there are reports

CoreNEURON Repo SHA: BlueBrain/CoreNeuron@f72026d
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants