Extend POINTER transfer to any RANGE variable in a NRN_THREAD #772

nrnhines · 2022-02-07T23:10:19Z

Prior to this, POINTER was restricted to point to voltage.
This change depends on neuronsimulator/nrn#1622
Requires bbcore_write_version 1.5
The added test on the NEURON side requires merge of #748

Many CI tests fail because file mode test data has not been updated to bbcore_write_version 1.5. I need help or instructions on how to update that test data. Edit: was updated.

See neuronsimulator/nrn for test.

CI_BRANCHES:NEURON_BRANCH=hines/POINTER-to-RANGE,

and for consistency with NEURON, after BEFORE_STEP as well.

bbpbuildbot · 2022-02-08T00:05:03Z

nrnhines · 2022-02-08T11:18:00Z

The nrn/test/coreneuron/mod/axial.inc file which is used in axial.mod and AxialPP.mod suffers from a GPU or loop vectorized race condition at the statement

       pim = pim - ia : child contributions

where pim is a POINTER to im some other instance of axial or AxialPP (located in the parent compartment).
I would normally write

PROTECT pim = pim - ia : child contributions

which surrounds the statement with MUTEXLOCK and MUTEXUNLOCK but that is only for pthreads. Another case is #768.
Some help with how NMODL and/or mod2c can make such statements atomic is solicited. Presently, when using a compiler where _PRAGMA_FOR_VECTOR_LOOP_ actually takes effect, I have to comment out that pragma manually but that is not good at all for the GPU.

bbpbuildbot · 2022-02-11T09:45:23Z

bbpbuildbot · 2022-02-25T21:16:59Z

codecov-commenter · 2022-02-26T19:12:06Z

Codecov Report

Merging #772 (f8ffdca) into master (9b6d29c) will decrease coverage by 0.56%.
The diff coverage is 16.88%.

❗ Current head f8ffdca differs from pull request most recent head 15a3d1e. Consider uploading reports for the commit 15a3d1e to get more accurate results

@@            Coverage Diff             @@
##           master     #772      +/-   ##
==========================================
- Coverage   56.01%   55.45%   -0.57%     
==========================================
  Files         108      108              
  Lines        9005     9107     +102     
==========================================
+ Hits         5044     5050       +6     
- Misses       3961     4057      +96

Impacted Files	Coverage Δ
coreneuron/io/nrn_checkpoint.cpp	`4.21% <0.00%> (-0.17%)`	⬇️
coreneuron/io/phase2.hpp	`66.66% <ø> (ø)`
...eneuron/io/reports/report_configuration_parser.cpp	`0.87% <0.00%> (-0.01%)`	⬇️
coreneuron/io/reports/sonata_report_handler.cpp	`25.00% <ø> (ø)`
coreneuron/io/reports/sonata_report_handler.hpp	`0.00% <0.00%> (ø)`
coreneuron/utils/nrnoc_aux.cpp	`26.56% <ø> (ø)`
coreneuron/permute/node_permute.cpp	`62.69% <6.89%> (-21.23%)`	⬇️
coreneuron/io/phase2.cpp	`64.91% <31.42%> (-1.50%)`	⬇️
coreneuron/apps/main1.cpp	`46.74% <37.50%> (-0.57%)`	⬇️
coreneuron/io/output_spikes.cpp	`88.97% <100.00%> (ø)`
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b6d29c...15a3d1e. Read the comment docs.

bbpbuildbot · 2022-02-26T19:32:22Z

bbpbuildbot · 2022-02-27T17:48:19Z

olupton · 2022-02-28T08:39:35Z

I believe merging master into this branch would fix the GitLab CI, and #784 will fix the test-as-submodule one. (where by "fix" I mean fix the current failures for technical reasons)

olupton · 2022-02-28T11:21:56Z

I merged master into this branch after I merged #784.

bbpbuildbot · 2022-02-28T12:28:04Z

bbpbuildbot · 2022-03-02T14:45:23Z

Logfiles from GitLab pipeline #41076 (:no_entry:) have been uploaded here!

Status and direct links:

olupton · 2022-03-02T14:46:40Z

Sorry, this needs master merged into it again. Had to iron out some more wrinkles in the GitLab CI.

nrnhines · 2022-03-05T12:27:18Z

As mentioned in neuronsimulator/nrn#1622 (comment) I'd like to explicitly support the handling of multiple BEFORE SETUP blocks in a single mod file. Although not really relevant to the POINTER topic of this pull request, the new checkpoint test on the NEURON side is the easiest way to test multiple BEFORE SETUP support and that support is likely to require some minor code changes on this CoreNEURON side as well. So unless we can get this PR merged to the master without too much delay, I can make the changes here ...

pramodk · 2022-03-07T13:37:11Z

@nrnhines : I am going to review & fix the failing tests under gitlab today. May be better to start a new branch from this branch?

bbpbuildbot · 2022-03-07T17:03:30Z

Logfiles from GitLab pipeline #41709 (:no_entry:) have been uploaded here!

Status and direct links:

pramodk · 2022-03-07T18:50:39Z

@nrnhines : Gitlab CI failing with following error. Is this expected? Otherwise I will take a look:

============================= test session starts ==============================
[226](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L226)platform linux -- Python 3.9.7, pytest-6.2.4, py-1.9.0, pluggy-0.13.0
[227](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L227)rootdir: /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41709/J178281/spack-build/spack-stage-neuron-develop-47kwkdqbfahzhd6mgo3d7lvidjenlcwo/spack-build-47kwkdq/test/coreneuron_modtests/test_pointer_py_cpu
[228](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L228)plugins: cov-2.8.1
[229](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L229)collected 2 items
[230](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L230)test/coreneuron/test_pointer.py numprocs=1
[231](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L231)[REPORTS] [info] :: Initializing PARALLEL implementation...
[232](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L232)[REPORTS] [info] :: Initializing PARALLEL implementation...
[233](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L233).[REPORTS] [info] :: Initializing PARALLEL implementation...
[234](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L234)[REPORTS] [info] :: Initializing PARALLEL implementation...
[235](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L235)[REPORTS] [info] :: Initializing PARALLEL implementation...
[236](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L236)[REPORTS] [info] :: Initializing PARALLEL implementation...
[237](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L237)--------------------
[238](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L238)rm -r -f coredat
[239](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L239)cell_permute  0
[240](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L240)--------------------
[241](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L241)x86_64/special-core -d coredat --voltage 1000 --verbose 0 --cell-permute 0 --tstop 10 -o coredat
[242](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L242)F
[243](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L243)=================================== FAILURES ===================================
[244](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L244)_______________________________ test_checkpoint ________________________________
[245](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L245)    def test_checkpoint():
[246](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L246)        if pc.nhost() > 1:
[247](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L247)            return
[248](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L248)    
[249](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L249)        # clear out the old
[250](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L250)        srun("rm -r -f coredat")
[251](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L251)    
[252](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L252)        m = Model(5, 5)
[253](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L253)        # file mode CoreNEURON real cells need gids
[254](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L254)        for i, cell in enumerate(m.cells):
[255](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L255)            pc.set_gid2node(i, pc.id())
[256](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L256)            sec = cell.secs[0]
[257](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L257)            pc.cell(i, h.NetCon(sec(0.5)._ref_v, None, sec=sec))
[258](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L258)    
[259](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L259)        # "integrate" fabs(ia) in each axial_pp and use that as a source of spikes
[260](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L260)        # for actually testing that the checkpoint is working with POINTER.
[261](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L261)        # Would be much better if I knew how to get file mode coreneuron to
[262](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L262)        # print trajectories.
[263](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L263)        for i, p in enumerate(h.List("AxialPP")):
[264](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L264)            pc.set_gid2node(i + 100, pc.id())
[265](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L265)            pc.cell(i + 100, h.NetCon(p, None))
[266](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L266)    
[267](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L267)        spktime = h.Vector()
[268](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L268)        spkgid = h.Vector()
[269](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L269)        pc.spike_record(-1, spktime, spkgid)
[270](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L270)        cvode.cache_efficient(1)
[271](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L271)        pc.set_maxstep(10)
[272](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L272)        h.finitialize(-65)
[273](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L273)        pc.nrncore_write("coredat")
[274](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L274)    
[275](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L275)        # do a NEURON run to record spikes
[276](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L276)        def run(tstop):
[277](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L277)            pc.set_maxstep(10)
[278](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L278)            h.finitialize(-65)
[279](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L279)            pc.psolve(tstop)
[280](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L280)    
[281](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L281)        run(10)
[282](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L282)        spikes_std = sortspikes(spktime, spkgid)
[283](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L283)    
[284](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L284)        # Does it work in direct mode?
[285](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L285)        from neuron import coreneuron
[286](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L286)    
[287](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L287)        coreneuron.enable = True
[288](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L288)        for perm in [0, 1]:
[289](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L289)            coreneuron.cell_permute = perm
[290](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L290)            run(5)
[291](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L291)            pc.psolve(10)
[292](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L292)            spikes = sortspikes(spktime, spkgid)
[293](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L293)            assert spikes_std == spikes
[294](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L294)        coreneuron.enable = False
[295](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L295)    
[296](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L296)        # standard to compare with checkpoint series
[297](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L297)        tpnts = [5.0, 10.0]
[298](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L298)        for perm in [0, 1]:
[299](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L299)            print("\n\ncell_permute ", perm)
[300](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L300)            common = "-d coredat --voltage 1000 --verbose 0 --cell-permute %d" % (perm,)
[301](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L301)            # standard full run
[302](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L302)>           runcn(common + " --tstop %g" % float(tpnts[-1]) + " -o coredat")
[303](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L303)test/coreneuron/test_pointer.py:273: 
[304](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L304)_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[305](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L305)test/coreneuron/test_pointer.py:144: in runcn
[306](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L306)    srun(cmd)
[307](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L307)test/coreneuron/test_pointer.py:136: in srun
[308](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L308)    subprocess.run(cmd, shell=True).check_returncode()
[309](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L309)_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[310](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L310)self = CompletedProcess(args='x86_64/special-core -d coredat --voltage 1000 --verbose 0 --cell-permute 0 --tstop 10 -o coredat', returncode=-11)
[311](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L311)    def check_returncode(self):
[312](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L312)        """Raise CalledProcessError if the exit code is non-zero."""
[313](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L313)        if self.returncode:
[314](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L314)>           raise CalledProcessError(self.returncode, self.args, self.stdout,
[315](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L315)                                     self.stderr)
[316](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L316)E           subprocess.CalledProcessError: Command 'x86_64/special-core -d coredat --voltage 1000 --verbose 0 --cell-permute 0 --tstop 10 -o coredat' died with <Signals.SIGSEGV: 11>.
[317](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L317)/gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_externals/install_gcc-11.2.0-skylake/python-3.9.7-yj5alh/lib/python3.9/subprocess.py:460: CalledProcessError
[318](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L318)----------- coverage: platform linux, python 3.9.7-final-0 -----------
[319](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L319)Coverage XML written to file coverage.xml
[320](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L320)=========================== short test summary info ============================
[321](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L321)FAILED test/coreneuron/test_pointer.py::test_checkpoint - subprocess.CalledPr...
[322](https://bbpgitlab.epfl.ch/hpc/coreneuron/-/jobs/178292#L322)==================== 1 failed, 1 passed in 63.51s (0:01:03) ====================

nrnhines · 2022-03-07T19:40:33Z

That is not supposed to fail. Seems like stderr and stdout needs to be printed.

pramodk · 2022-03-07T20:07:56Z

Ok thanks. On my local machine I am also not able to reproduce. I will check the CI build tomorrow morning.

bbpbuildbot · 2022-03-08T17:21:50Z

Logfiles from GitLab pipeline #41880 (:no_entry:) have been uploaded here!

Status and direct links:

nrnhines · 2022-03-08T22:34:25Z

@pramodk My attempt to get more information with neuronsimulator/nrn@f7be90f wasn't fruitful as there was no stderr/stdout text. So all we know is that there is a segfault.

pramodk · 2022-03-08T23:34:28Z

Sorry for delay @nrnhines! Didn't get time earlier today to look into this.

I didn't debug thoroughly but at least quickly able to reproduce the issue by using binaries + datasets created in CI. It seems like related to our reportinlibg library linked to CoreNEURON.

Here is what I did:

# copy failed test directory
cp -r /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/J179059/spack-build/spack-stage-neuron-develop-47kwkdqbfahzhd6mgo3d7lvidjenlcwo/spack-build-47kwkdq/test/coreneuron_modtests/test_pointer_py_cpu .
cd test_pointer_py_cpu/

# for testing, allocate some cpus
$ salloc -A proj16 -N 1 --constraint=cpu -n 2  -p prod

# see segfault
kumbhar@r1i7n20:~/tmp/test_pointer_py_cpu$ ./x86_64/special-core -d coredat/

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 1.0 c4f1e5bc (2022-03-07 16:58:53 +0100)

 Additional mechanisms from files
 axial.mod axial_pp.mod bacur.mod banocur.mod exp2syn.mod expsyn.mod fornetcon.mod hh.mod invlfire.mod natrans.mod netmove.mod netstim.mod passive.mod pattern.mod sample.mod stim.mod svclmp.mod unitstest.mod watchrange.mod

 Memory (MBs) :             After mk_mech : Max 10.0000, Min 10.0000, Avg 10.0000
 Memory (MBs) :            After MPI_Init : Max 10.0000, Min 10.0000, Avg 10.0000
 Memory (MBs) :          Before nrn_setup : Max 10.1172, Min 10.1172, Avg 10.1172
 Setup Done   : 0.00 seconds
 Model size   : 35.18 kB
 Memory (MBs) :          After nrn_setup  : Max 10.4375, Min 10.4375, Avg 10.4375
GENERAL PARAMETERS
--mpi=false
--mpi-lib=
--gpu=false
--dt=0.025
--tstop=100

GPU
--nwarp=65536
--cell-permute=0
--cuda-interface=false

INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=coredat/
--filesdat=files.dat
--pattern=
--report-conf=
--restore=

PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false

SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=false
--spk_compress=0
--binqueue=false

CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=6.3
--mindelay=10
--report-buffer-size=4

OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=

 Start time (t) = 0

 Memory (MBs) :  After mk_spikevec_buffer : Max 10.4375, Min 10.4375, Avg 10.4375
 Memory (MBs) :     After nrn_finitialize : Max 10.4375, Min 10.4375, Avg 10.4375
Segmentation fault

# gdb says it's report related 
$ gdb --args ./x86_64/special-core -d coredat/
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/J179059/spack-build/spack-stage-neuron-develop-47kwkdqbfahzhd6mgo3d7lvidjenlcwo/spack-build-47kwkdq/test/nrnivmodl/f37a8662f1006c013843754879ab3cc44ed227d607809d6e2bc1806460d64447/x86_64/special-core...done.
(gdb) r
Starting program: /gpfs/bbp.cscs.ch/home/kumbhar/tmp/test_pointer_py_cpu/./x86_64/special-core -d coredat/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_applications/install_intel-2021.4.0-skylake/libsonata-report-1.1-nfrzrl/lib/libsonatareport.so]
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6]
Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libgcc_s.so.1]

 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 1.0 c4f1e5bc (2022-03-07 16:58:53 +0100)

...
 Memory (MBs) :     After nrn_finitialize : Max 10.4453, Min 10.4453, Avg 10.4453

Program received signal SIGSEGV, Segmentation fault.
MPI_SGI_comm_rank (comm=1140850688) at ../../../../include/comm.h:216
216	../../../../include/comm.h: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64
(gdb) bt
#0  MPI_SGI_comm_rank (comm=1140850688) at ../../../../include/comm.h:216
#1  PMPI_Comm_rank (comm=1140850688, rank=0x7fffffff4240) at comm_rank.c:93
#2  0x00007fffeda707ec in AllReports::makeGlobalCommunicator (this=0x7fffedb05908 <_rtld_local+2312>) at /nvme/bbpcihpcdeploy/160693/spack-stage/spack-stage-reportinglib-2.5.6-gdhqypawxbwwjg2iq3g6gd6r6q3civat/spack-src/reportinglib/AllReports.cpp:474
#3  0x00007fffed7591c6 in coreneuron::setup_report_engine (dt_report=2147483647, mindelay=10) at ../spack-src/coreneuron/io/reports/nrnreport.cpp:57
#4  0x00007fffed6d0078 in run_solve_core (argc=3, argv=0x7fffffff45f8) at ../spack-src/coreneuron/apps/main1.cpp:609
#5  0x00007fffedadf702 in solve_core (argc=3, argv=0x7fffffff45f8) at ../../../../../../../software/install_intel-2021.4.0-skylake/coreneuron-develop-hsyhen/share/coreneuron/enginemech.cpp:49
#6  0x0000000000403293 in main (argc=3, argv=0x7fffffff45f8) at /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/software/install_intel-2021.4.0-skylake/coreneuron-develop-hsyhen/share/coreneuron/coreneuron.cpp:14
(gdb) quit

# enabling MPI doesn't solve the issue completely

$ srun -n 1 ./x86_64/special-core -d coredat/ --mpi
 num_mpi=1


 Duke, Yale, and the BlueBrain Project -- Copyright 1984-2020
 Version : 1.0 c4f1e5bc (2022-03-07 16:58:53 +0100)

 Additional mechanisms from files
 axial.mod axial_pp.mod bacur.mod banocur.mod exp2syn.mod expsyn.mod fornetcon.mod hh.mod invlfire.mod natrans.mod netmove.mod netstim.mod passive.mod pattern.mod sample.mod stim.mod svclmp.mod unitstest.mod watchrange.mod

 Memory (MBs) :             After mk_mech : Max 12.5664, Min 12.5664, Avg 12.5664
 Memory (MBs) :            After MPI_Init : Max 12.5664, Min 12.5664, Avg 12.5664
 Memory (MBs) :          Before nrn_setup : Max 12.8008, Min 12.8008, Avg 12.8008
 Setup Done   : 0.00 seconds
 Model size   : 35.18 kB
 Memory (MBs) :          After nrn_setup  : Max 13.1133, Min 13.1133, Avg 13.1133
GENERAL PARAMETERS
--mpi=true
--mpi-lib=
--gpu=false
--dt=0.025
--tstop=100

GPU
--nwarp=65536
--cell-permute=0
--cuda-interface=false

INPUT PARAMETERS
--voltage=-65
--seed=-1
--datpath=coredat/
--filesdat=files.dat
--pattern=
--report-conf=
--restore=

PARALLEL COMPUTATION PARAMETERS
--threading=false
--skip_mpi_finalize=false

SPIKE EXCHANGE
--ms_phases=2
--ms_subintervals=2
--multisend=false
--spk_compress=0
--binqueue=false

CONFIGURATION
--spikebuf=100000
--prcellgid=-1
--forwardskip=0
--celsius=6.3
--mindelay=10
--report-buffer-size=4

OUTPUT PARAMETERS
--dt_io=0.1
--outpath=.
--checkpoint=

 Start time (t) = 0

 Memory (MBs) :  After mk_spikevec_buffer : Max 13.1133, Min 13.1133, Avg 13.1133
 Memory (MBs) :     After nrn_finitialize : Max 13.1133, Min 13.1133, Avg 13.1133
[REPORTS] [info] :: Initializing PARALLEL implementation...

psolve |=========================================================| t: 100.00 ETA: 0h00m01s

Solver Time : 0.127889


 Simulation Statistics
 Number of cells: 5
 Number of compartments: 163
 Number of presyns: 46
 Number of input presyns: 0
 Number of synapses: 0
 Number of point processes: 46
 Number of transfer sources: 0
 Number of transfer targets: 0
 Number of spikes: 330
 Number of spikes with non negative gid-s: 330
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: node_id is 0 and input data is reported as 1-based
MPT ERROR: Rank 0(g:0) received signal SIGABRT/SIGIOT(6).
	Process ID: 259074, Host: r1i7n20, Program: /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/J179059/spack-build/spack-stage-neuron-develop-47kwkdqbfahzhd6mgo3d7lvidjenlcwo/spack-build-47kwkdq/test/nrnivmodl/f37a8662f1006c013843754879ab3cc44ed227d607809d6e2bc1806460d64447/x86_64/special-core
	MPT Version: HPE HMPT 2.25  10/22/21 03:18:39

MPT: --------stack traceback-------
MPT: Attaching to program: /proc/259074/exe, process 259074
MPT: [Thread debugging using libthread_db enabled]
MPT: Using host libthread_db library "/lib64/libthread_db.so.1".
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: (no debugging symbols found)...done.
MPT: 0x00002aaaab17b1d9 in waitpid () from /lib64/libpthread.so.0
MPT: Missing separate debuginfos, use: debuginfo-install glibc-2.17-325.el7_9.x86_64 libibverbs-54mlnx1-1.54103.x86_64 libnl3-3.2.28-4.el7.x86_64
MPT: (gdb) #0  0x00002aaaab17b1d9 in waitpid () from /lib64/libpthread.so.0
MPT: #1  0x00002aaaab4be566 in mpi_sgi_system (
MPT: #2  MPI_SGI_stacktraceback (
MPT:     header=header@entry=0x7fffffff2ad0 "MPT ERROR: Rank 0(g:0) received signal SIGABRT/SIGIOT(6).\n\tProcess ID: 259074, Host: r1i7n20, Program: /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/J179059/spack-build/spack-stage-neuro"...) at sig.c:340
MPT: #3  0x00002aaaab4be75f in first_arriver_handler (signo=signo@entry=6,
MPT:     stack_trace_sem=stack_trace_sem@entry=0x2aaab09a0080) at sig.c:489
MPT: #4  0x00002aaaab4bea33 in slave_sig_handler (signo=6, siginfo=<optimized out>,
MPT:     extra=<optimized out>) at sig.c:565
MPT: #5  <signal handler called>
MPT: #6  0x00002aaaabd05387 in raise () from /lib64/libc.so.6
MPT: #7  0x00002aaaabd06a78 in abort () from /lib64/libc.so.6
MPT: #8  0x00002aaaab85e88a in __gnu_cxx::__verbose_terminate_handler() [clone .cold] ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6
MPT: #9  0x00002aaaab86a2fa in __cxxabiv1::__terminate(void (*)()) ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6
MPT: #10 0x00002aaaab86a365 in std::terminate() ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6
MPT: #11 0x00002aaaab86a5f9 in __cxa_throw ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_compilers/install_gcc-4.8.5-haswell/gcc-11.2.0-suikmu/lib64/libstdc++.so.6
MPT: #12 0x00002aaaaac096f1 in bbp::sonata::SonataData::convert_gids_to_sonata(std::vector<unsigned long, std::allocator<unsigned long> >&, unsigned long) ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_applications/install_intel-2021.4.0-skylake/libsonata-report-1.1-nfrzrl/lib/libsonatareport.so
MPT: #13 0x00002aaaaac09ecb in bbp::sonata::SonataData::write_spikes_header(bbp::sonata::Population&) ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_applications/install_intel-2021.4.0-skylake/libsonata-report-1.1-nfrzrl/lib/libsonatareport.so
MPT: #14 0x00002aaaaac0973a in bbp::sonata::SonataData::write_spike_populations() ()
MPT:    from /gpfs/bbp.cscs.ch/ssd/apps/bsd/2022-01-10/stage_applications/install_intel-2021.4.0-skylake/libsonata-report-1.1-nfrzrl/lib/libsonatareport.so
MPT: #15 0x00002aaaaafb43b1 in _INTERNALb49cba43::coreneuron::output_spikes_parallel
MPT:     (outpath=0x7fffffff43d0 ".", filename=0x2aaaab073968 "out",
MPT:     population_name_offset=std::vector of length 0, capacity 0)
MPT:     at ../spack-src/coreneuron/io/output_spikes.cpp:216
MPT: #16 0x00002aaaaafb4b1f in coreneuron::output_spikes (
MPT:     outpath=0x7fffffff43d0 ".",
MPT:     population_name_offset=std::vector of length 0, capacity 0)
MPT:     at ../spack-src/coreneuron/io/output_spikes.cpp:292
MPT: #17 0x00002aaaaaf59237 in run_solve_core (argc=4, argv=0x7fffffff4608)
MPT:     at ../spack-src/coreneuron/apps/main1.cpp:648
MPT: #18 0x00002aaaaaadc702 in solve_core (argc=4, argv=0x7fffffff4608)
MPT:     at ../../../../../../../software/install_intel-2021.4.0-skylake/coreneuron-develop-hsyhen/share/coreneuron/enginemech.cpp:49
MPT: #19 0x0000000000403293 in main (argc=4, argv=0x7fffffff4608)
MPT:     at /gpfs/bbp.cscs.ch/ssd/gitlab_map_jobs/bbpcihpcproj12/P41880/software/install_intel-2021.4.0-skylake/coreneuron-develop-hsyhen/share/coreneuron/coreneuron.cpp:14
MPT: (gdb) A debugging session is active.
MPT:

We know now where to look at. So tomorrow we should be able to track this on our side better. cc: @jorblancoa @olupton

nrnhines · 2022-03-09T01:32:22Z

Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4)

Doesn't seem like a segfault. However maybe "5" is numerologically significant as this PR bumps the write version from 1.4 to 1.5 :)

pramodk · 2022-03-09T06:24:40Z

Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4)

I think this warning/error is not relevant, it's complaining because GDB version is too old and doesn't support DWARF used in the binary. Loading newer GDB module removes this message.

olupton · 2022-03-09T09:22:10Z

I think this is related to neuronsimulator/nrn#1619

bbpbuildbot · 2022-03-09T12:41:35Z

Logfiles from GitLab pipeline #42032 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2022-03-09T13:48:22Z

Logfiles from GitLab pipeline #42060 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2022-03-10T10:04:39Z

Logfiles from GitLab pipeline #42175 (:white_check_mark:) have been uploaded here!

Status and direct links:

nrnhines · 2022-03-10T10:43:53Z

hooray!

pramodk

LGTM. I don't have major comments but reading code related permutations aspects reminds how critical is to simplify the implementation with same data structures between NEURON and CoreNEURON. CoreNEURON should just do compute aspects...!

After 8.1 release, we should definitely revive our summer discussions and continue on major refactoring aspects that we were discussing (including C++ migration PRs like neuronsimulator/nrn/pull/1597).

iomaganaris

LGTM
Just a single suggestion

coreneuron/permute/node_permute.cpp

bbpbuildbot · 2022-03-10T13:09:36Z

Logfiles from GitLab pipeline #42207 (:no_entry:) have been uploaded here!

Status and direct links:

…ain/CoreNeuron#772) * Extend POINTER from voltage to any RANGE variable. * trajectory recording is after AFTER_SOLVE and for consistency with NEURON, after BEFORE_STEP as well. * update coreneuron ringtest integration data to version 1.5 * Handle the checkpoint for the POINTER * Initialize reporting interface only if there are reports CoreNEURON Repo SHA: BlueBrain/CoreNeuron@f72026d

nrnhines added 4 commits January 23, 2022 08:38

Extend POINTER from voltage to any RANGE variable.

a19de3c

trajectory recording is after AFTER_SOLVE

27341aa

and for consistency with NEURON, after BEFORE_STEP as well.

Merge branch 'master' into hines/POINTER-to-RANGE

8f7df39

POINTER transfer works for coreneuron.cell_permute = 1

4ebe11a

nrnhines requested review from alkino, pramodk and alexsavulescu February 7, 2022 23:10

Merge remote-tracking branch 'origin/master' into hines/POINTER-to-RANGE

18836ad

nrnhines mentioned this pull request Feb 22, 2022

Make extracellular mechanism available #782

Open

update coreneuron ringtest integration data to version 1.5

692390d

temporary (awaiting pointer2type for checkpoint and test)

ddc9698

nrnhines marked this pull request as draft February 26, 2022 19:12

forgot clang-format

ce6140f

nrnhines mentioned this pull request Feb 26, 2022

Extend CoreNEURON POINTER transfer to any RANGE variable in a NRN_THREAD neuronsimulator/nrn#1622

Merged

1 task

handle pointer2type for checkpoint

b789864

olupton mentioned this pull request Feb 28, 2022

Fix test-as-submodule CI. #784

Merged

Merge remote-tracking branch 'origin/master' into hines/POINTER-to-RANGE

6107393

tmls mechanism type not specified in file mode.

3109899

Add missing files for reporting

c4f1e5b

pramodk closed this Mar 8, 2022

pramodk reopened this Mar 8, 2022

Initialize reporting interface if there are reports

99c0b20

pramodk requested a review from iomaganaris March 9, 2022 11:39

make clang-format happy

15a3d1e

Merge branch 'master' into hines/POINTER-to-RANGE

aab0aba

pramodk approved these changes Mar 10, 2022

View reviewed changes

iomaganaris approved these changes Mar 10, 2022

View reviewed changes

coreneuron/permute/node_permute.cpp Outdated Show resolved Hide resolved

respond to review comment

18093c4

pramodk merged commit f72026d into master Mar 10, 2022

pramodk deleted the hines/POINTER-to-RANGE branch March 10, 2022 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend POINTER transfer to any RANGE variable in a NRN_THREAD #772

Extend POINTER transfer to any RANGE variable in a NRN_THREAD #772

nrnhines commented Feb 7, 2022 •

edited

Loading

bbpbuildbot commented Feb 8, 2022

nrnhines commented Feb 8, 2022

bbpbuildbot commented Feb 11, 2022

bbpbuildbot commented Feb 25, 2022

codecov-commenter commented Feb 26, 2022 •

edited

Loading

bbpbuildbot commented Feb 26, 2022

bbpbuildbot commented Feb 27, 2022

olupton commented Feb 28, 2022 •

edited

Loading

olupton commented Feb 28, 2022

bbpbuildbot commented Feb 28, 2022

bbpbuildbot commented Mar 2, 2022

olupton commented Mar 2, 2022

nrnhines commented Mar 5, 2022

pramodk commented Mar 7, 2022

bbpbuildbot commented Mar 7, 2022

pramodk commented Mar 7, 2022

nrnhines commented Mar 7, 2022 •

edited

Loading

pramodk commented Mar 7, 2022

bbpbuildbot commented Mar 8, 2022

nrnhines commented Mar 8, 2022

pramodk commented Mar 8, 2022

nrnhines commented Mar 9, 2022 •

edited

Loading

pramodk commented Mar 9, 2022

olupton commented Mar 9, 2022

bbpbuildbot commented Mar 9, 2022

bbpbuildbot commented Mar 9, 2022

bbpbuildbot commented Mar 10, 2022

nrnhines commented Mar 10, 2022

pramodk left a comment

iomaganaris left a comment

bbpbuildbot commented Mar 10, 2022

Extend POINTER transfer to any RANGE variable in a NRN_THREAD #772

Extend POINTER transfer to any RANGE variable in a NRN_THREAD #772

Conversation

nrnhines commented Feb 7, 2022 • edited Loading

bbpbuildbot commented Feb 8, 2022

nrnhines commented Feb 8, 2022

bbpbuildbot commented Feb 11, 2022

bbpbuildbot commented Feb 25, 2022

codecov-commenter commented Feb 26, 2022 • edited Loading

Codecov Report

bbpbuildbot commented Feb 26, 2022

bbpbuildbot commented Feb 27, 2022

olupton commented Feb 28, 2022 • edited Loading

olupton commented Feb 28, 2022

bbpbuildbot commented Feb 28, 2022

bbpbuildbot commented Mar 2, 2022

olupton commented Mar 2, 2022

nrnhines commented Mar 5, 2022

pramodk commented Mar 7, 2022

bbpbuildbot commented Mar 7, 2022

pramodk commented Mar 7, 2022

nrnhines commented Mar 7, 2022 • edited Loading

pramodk commented Mar 7, 2022

bbpbuildbot commented Mar 8, 2022

nrnhines commented Mar 8, 2022

pramodk commented Mar 8, 2022

nrnhines commented Mar 9, 2022 • edited Loading

pramodk commented Mar 9, 2022

olupton commented Mar 9, 2022

bbpbuildbot commented Mar 9, 2022

bbpbuildbot commented Mar 9, 2022

bbpbuildbot commented Mar 10, 2022

nrnhines commented Mar 10, 2022

pramodk left a comment

Choose a reason for hiding this comment

iomaganaris left a comment

Choose a reason for hiding this comment

bbpbuildbot commented Mar 10, 2022

nrnhines commented Feb 7, 2022 •

edited

Loading

codecov-commenter commented Feb 26, 2022 •

edited

Loading

olupton commented Feb 28, 2022 •

edited

Loading

nrnhines commented Mar 7, 2022 •

edited

Loading

nrnhines commented Mar 9, 2022 •

edited

Loading