You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I tried to run FALKON with 3 GPUs but I got the following error:
`Traceback (most recent call last):
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/utils/threading.py", line 15, in run
self.ret = self._target(*self._args, **self._kwargs)
File "/home/"user"//.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 138, in mmv_run_starter
return mmv_run_thread(X1, X2, v, out, kernel, blk_n, blk_m, mem_needed, dev, tid=proc_idx)
File "/home/"user"//.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 251, in mmv_run_thread
flat_gpu = torch.empty(size=(mem_needed,), dtype=m1.dtype, device=dev)
RuntimeError: CUDA out of memory. Tried to allocate 21.00 GiB (GPU 0; 31.75 GiB total capacity; 5.57 GiB already allocated; 20.88 GiB free; 9.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/"user"/.conda/envs/flk4/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/"user"/.conda/envs/flk4/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/"user"/research/knotty/run/main.py", line 38, in
alpha, acc_valid_ep3,nystrom_samples,knots_x,acc_ep2_test= run(**args,wandb_run=wandb_run)
File "/home/"user"/research/knotty/run/run.py", line 225, in run
Falkon_loss, accu_falkon = falkon_run(dataset, kernel_fn, options, p=num_knots, epochs=20,
File "/home/"user"/research/knotty/run/run.py", line 34, in falkon_run
flk.fit(x_train, y_train)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/models/falkon.py", line 264, in fit
beta = optim.solve(
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/optim/conjgrad.py", line 310, in solve
B = self.kernel.mmv(M, X, y_over_n, opt=self.params)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/kernels/kernel.py", line 266, in mmv
return mmv_impl(X1, X2, v, self, out, params)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 734, in fmmv
return KernelMmvFnFull.apply(kernel, opt, out, X1, X2, v, *kernel.diff_params.values())
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 695, in forward
KernelMmvFnFull.run_cpu_gpu(X1, X2, v, out, kernel, opt, False)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 641, in run_cpu_gpu
outputs = _start_wait_processes(mmv_run_starter, args)
File "/home/"user"/conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/utils.py", line 59, in _start_wait_processes
outputs.append(p.join())
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/utils/threading.py", line 22, in join
raise RuntimeError('Exception in thread %s' % (self.name)) from self.exc
RuntimeError: Exception in thread GPU-0
`
It works fine with 1,2 GPUs. I was wondering if using 3 or more GPUs can further make FALKON faster?
Thank you for your help.
The text was updated successfully, but these errors were encountered:
Hi @ahabedsoltan! Unfortunately the code which splits a dataset into blocks can produce weird behaviors, and sometimes this depends on factors such as the number of GPUs.
I just introduced an option to change the behavior of the heuristic for splitting the data into blocks: memory_slack. By default it is set to 0.9 which means that the split size is calculated considering 90% of available GPU RAM.
You try reducing it to e.g. 0.7 and the out of memory errors should go away.
Hi,
I tried to run FALKON with 3 GPUs but I got the following error:
`Traceback (most recent call last):
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/utils/threading.py", line 15, in run
self.ret = self._target(*self._args, **self._kwargs)
File "/home/"user"//.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 138, in mmv_run_starter
return mmv_run_thread(X1, X2, v, out, kernel, blk_n, blk_m, mem_needed, dev, tid=proc_idx)
File "/home/"user"//.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 251, in mmv_run_thread
flat_gpu = torch.empty(size=(mem_needed,), dtype=m1.dtype, device=dev)
RuntimeError: CUDA out of memory. Tried to allocate 21.00 GiB (GPU 0; 31.75 GiB total capacity; 5.57 GiB already allocated; 20.88 GiB free; 9.56 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/"user"/.conda/envs/flk4/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/"user"/.conda/envs/flk4/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/"user"/research/knotty/run/main.py", line 38, in
alpha, acc_valid_ep3,nystrom_samples,knots_x,acc_ep2_test= run(**args,wandb_run=wandb_run)
File "/home/"user"/research/knotty/run/run.py", line 225, in run
Falkon_loss, accu_falkon = falkon_run(dataset, kernel_fn, options, p=num_knots, epochs=20,
File "/home/"user"/research/knotty/run/run.py", line 34, in falkon_run
flk.fit(x_train, y_train)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/models/falkon.py", line 264, in fit
beta = optim.solve(
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/optim/conjgrad.py", line 310, in solve
B = self.kernel.mmv(M, X, y_over_n, opt=self.params)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/kernels/kernel.py", line 266, in mmv
return mmv_impl(X1, X2, v, self, out, params)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 734, in fmmv
return KernelMmvFnFull.apply(kernel, opt, out, X1, X2, v, *kernel.diff_params.values())
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 695, in forward
KernelMmvFnFull.run_cpu_gpu(X1, X2, v, out, kernel, opt, False)
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/fmmv.py", line 641, in run_cpu_gpu
outputs = _start_wait_processes(mmv_run_starter, args)
File "/home/"user"/conda/envs/flk4/lib/python3.10/site-packages/falkon/mmv_ops/utils.py", line 59, in _start_wait_processes
outputs.append(p.join())
File "/home/"user"/.conda/envs/flk4/lib/python3.10/site-packages/falkon/utils/threading.py", line 22, in join
raise RuntimeError('Exception in thread %s' % (self.name)) from self.exc
RuntimeError: Exception in thread GPU-0
`
It works fine with 1,2 GPUs. I was wondering if using 3 or more GPUs can further make FALKON faster?
Thank you for your help.
The text was updated successfully, but these errors were encountered: