-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda runtime error (304) : OS call failed or operation not supported on this OS #31
Comments
Hi! I see a couple of other issues in your script though:
|
Thank you so much for your response! For the same example, after changing the precision to float32 I was only able to use <= 400 centers, as anything more was resulting in the error I was getting above. I then tried setting pin_memory to False in
I was wondering if you had any other suggestions on how to deal with this issue. |
Hi again, and sorry for the slow replies. I cannot explain the fact that changing precision changes behaviour so drastically. Short term the fix you applied -- disabling memory-pinning -- is fine! Just repeat the process of setting pin_memory to False in other places if you encounter the error again. Long term, if it seems like certain hadware?software? configurations don't support pinning more than a little bit of RAM, I can wrap the calls in a try/catch so that the whole thing doesn't crash and just falls back to unpinned memory. Thanks for the bug report :) p.s. while using float32 might not be beneficial for the pinning issue, you should find that it improves the running time of Falkon by quite a bit (once you get past the pinning problem). |
When running the following example with Falkon, I run into a cuda runtime error.
Example:
`from sklearn import datasets, model_selection
import numpy as np
import torch
import falkon
from falkon.models import Falkon
from falkon.kernels import GaussianKernel
from falkon.options import FalkonOptions
Xtrain = np.random.randn(80000, 1536)
Xtest = np.random.randn(10000, 1536)
Ytrain = np.random.randn(80000, 20)
Ytest = np.random.randn(10000, 20)
Xtrain = torch.from_numpy(Xtrain)
Xtest = torch.from_numpy(Xtest)
Ytrain = torch.from_numpy(Ytrain)
Ytest = torch.from_numpy(Ytest)
print("X TRAIN SHAPE: ", Xtrain.shape, Ytrain.shape, "TEsT SHAPES: ", Xtest.shape, Ytest.shape)
kernel = GaussianKernel(sigma=5)
flk = Falkon(kernel=kernel, penalty=1e-5, M=Xtrain.shape[0])
flk.fit(Xtrain, Ytrain)`
Error:
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=304 : OS call failed or operation not supported on this OS Traceback (most recent call last): File "falkon_test.py", line 26, in <module> flk.fit(Xtrain, Ytrain) File "/home/nehap/anaconda3/envs/falkon/lib/python3.7/site-packages/falkon/models/falkon.py", line 197, in fit ny_points = ny_points.pin_memory() RuntimeError: cuda runtime error (304) : OS call failed or operation not supported on this OS at /opt/conda/conda-bld/pytorch_1616554827596/work/aten/src/THC/THCCachingHostAllocator.cpp:278
Here is my .yml file:
name: falkon
channels:
dependencies:
prefix: /home/nehap/anaconda3/envs/falkon
I'm currently using a 1 TITAN RTX GPU with 24 GB memory and my CPU has 128 GB memory. The example works if we reduce the number of dimensions from 1536 to 20, but with larger datasets it seems to be running into this issue. We would appreciate any help with this issue - thank you!
The text was updated successfully, but these errors were encountered: