Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cholesky decomposition fails #38

Open
mccajm opened this issue Jul 21, 2017 · 1 comment
Open

Cholesky decomposition fails #38

mccajm opened this issue Jul 21, 2017 · 1 comment

Comments

@mccajm
Copy link

mccajm commented Jul 21, 2017

I receive the following error when performing optimisation with GPR over 2 dimensions, using GPR with an RBF ARD kernel and a latin hypercube design of size 10. I assume this is because the matrix can't be decomposed? Is this fixable by changing the design or adding priors?

Thanks

2017-07-20 01:50:18.494935: W tensorflow/core/framework/op_kernel.cc:1158] Internal: cuSolverDN call failed with status =7
Traceback (most recent call last):
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
File "/home/adathy/miniconda3/lib/python3.6/contextlib.py", line 89, in exit
next(self.gen)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: cuSolverDN call failed with status =7
[[Node: Cholesky_1 = CholeskyT=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/gpu:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "t1-hyperparam.py", line 103, in
optimizer.optimize(run_model, n_iter=10)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflowOpt-pre_release-py3.6.egg/GPflowOpt/bo.py", line 131, in optimize
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflowOpt-pre_release-py3.6.egg/GPflowOpt/optim.py", line 79, in optimize
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflowOpt-pre_release-py3.6.egg/GPflowOpt/bo.py", line 147, in _optimize
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflowOpt-pre_release-py3.6.egg/GPflowOpt/bo.py", line 67, in _update_model_data
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflowOpt-pre_release-py3.6.egg/GPflowOpt/acquisition.py", line 122, in set_data
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflowOpt-pre_release-py3.6.egg/GPflowOpt/acquisition.py", line 254, in setup
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflow-0.3.8-py3.6.egg/GPflow/param.py", line 569, in runnable
return storage['session'].run(storage['tf_result'], feed_dict=feed_dict)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuSolverDN call failed with status =7
[[Node: Cholesky_1 = CholeskyT=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/gpu:0"]]
Caused by op 'Cholesky_1', defined at:
File "t1-hyperparam.py", line 101, in
acquisition = GPflowOpt.acquisition.ExpectedImprovement(model)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflowOpt-pre_release-py3.6.egg/GPflowOpt/acquisition.py", line 248, in init
self.setup()
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflowOpt-pre_release-py3.6.egg/GPflowOpt/acquisition.py", line 254, in setup
samples_mean, _ = self.models[0].predict_f(feasible_samples)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflow-0.3.8-py3.6.egg/GPflow/param.py", line 561, in runnable
storage['tf_result'] = tf_method(instance, *storage['tf_args'])
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflow-0.3.8-py3.6.egg/GPflow/model.py", line 373, in predict_f
return self.build_predict(Xnew)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflowOpt-pre_release-py3.6.egg/GPflowOpt/scaling.py", line 210, in build_predict
return self.output_transform.build_backward(f), self.output_transform.build_backward_variance(var)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/GPflowOpt-pre_release-py3.6.egg/GPflowOpt/transforms.py", line 112, in build_backward
L = tf.cholesky(tf.transpose(self.A))
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_linalg_ops.py", line 227, in cholesky
result = _op_def_lib.apply_op("Cholesky", input=input, name=name)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/adathy/miniconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

InternalError (see above for traceback): cuSolverDN call failed with status =7
[[Node: Cholesky_1 = CholeskyT=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/gpu:0"]]

@javdrher
Copy link
Member

This issue is indeed caused by a cholesky decomposition faillure. The reason why this happens can be a bit diverse.
Does this happen immediately after the initial 10 points? or have you done some iterations of BayesianOptimizer? In case of the former: first try to model the points with the GPflow model itself. tune the initial hyperparameters or add a prior. In case of the latter: check the data before it crashes. Do you have duplicate points? If not, try to model it again and tune the initial hyperparameters/priors.

I have also opened a PR (#40) which will make saving data in case of a crash easier. Just resolving some compatibility issues now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants