Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubleshooting dependency hell #45

Open
GemmaTuron opened this issue Oct 3, 2024 · 7 comments
Open

Troubleshooting dependency hell #45

GemmaTuron opened this issue Oct 3, 2024 · 7 comments

Comments

@GemmaTuron
Copy link
Member

Describe the bug
ZairaChem installs are getting outdated. I have not managed to set it anew in Linux machine. A few pointers:

  • Ersilia newest version (0.1.37) has compatibility issues, in the usage of the Lake (ZairaChem does use Isaura) cc @DhanshreeA can you think of what might be making this incompatible? The latest working version was 0.1.34, I have not tested 0.1.36
  • Autogluon.tabular cannot be pip installed (even in a completely bank environment fails with matplotlib dep. issues). I am currently installing from conda forge, which has also v0.7 available: conda install conda-forge::autogluon.tabular=0.7.0. Btw, v1.1.1, the newest, makes the pipeline fail because of a different way of saving the files.
  • Pandas has a mismatch between v2 and v1, apparently some packages like ersilia (in v0.1.34) require v1 and some others v2:
OR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autogluon-core 1.1.1 requires networkx<4,>=3.0, but you have networkx 2.8.8 which is incompatible.
autogluon-core 1.1.1 requires scikit-learn<1.4.1,>=1.3.0, but you have scikit-learn 1.2.2 which is incompatible.
autogluon-features 1.1.1 requires scikit-learn<1.4.1,>=1.3.0, but you have scikit-learn 1.2.2 which is incompatible.
autogluon-tabular 1.1.1 requires networkx<4,>=3.0, but you have networkx 2.8.8 which is incompatible.
autogluon-tabular 1.1.1 requires scikit-learn<1.4.1,>=1.3.0, but you have scikit-learn 1.2.2 which is incompatible.
datasets 2.20.0 requires requests>=2.32.2, but you have requests 2.29.0 which is incompatible.
datasets 2.20.0 requires tqdm>=4.66.3, but you have tqdm 4.66.1 which is incompatible.
mxnet 1.9.1 requires graphviz<0.9.0,>=0.8.1, but you have graphviz 0.20.1 which is incompatible.
scanpy 1.10.3 requires seaborn>=0.13, but you have seaborn 0.12.2 which is incompatible.
tensorboard 2.10.1 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is incompatible.
tensorflow 2.10.0 requires protobuf<3.20,>=3.9.2, but you have protobuf 3.20.3 which is incompatible.
tiledb 0.29.1 requires numpy<2.0,>=1.25; python_version >= "3.9", but you have numpy 1.23.5 which is incompatible.
  • Mellody tuner v2.3.1 has a deprecation issue with numpy:
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/melloddy_tuner/utils/lsh_folding.py", line 185, in calculate_single
    fold_id = self.run_lsh_calculation(fp_feat_raw.tolist())
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/melloddy_tuner/utils/lsh_folding.py", line 166, in run_lsh_calculation
    return np.int(folds)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'inf'?
Session file /home/gturon/github/dili/session.json
  • I have tried v3 (the newest), for which you also need to manually install pandera, but it crashes maybe due new functionalities, I have not been able to investigate further, here is the error:
- Start calculating descriptors and assign LSH folds.
Traceback (most recent call last):
  File "/home/gturon/miniconda3/envs/zairachem/bin/tunercli", line 8, in <module>
    sys.exit(main())
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/melloddy_tuner/tunercli.py", line 2406, in main    args.func(args)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/melloddy_tuner/tunercli.py", line 389, in do_calculate_desc_lsh
    calculate_lsh_folds.main(_args)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/melloddy_tuner/scripts/calculate_lsh_folds.py", line 343, in main
    df = pd.read_csv(input_file)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
    self.handles = get_handle(
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/pandas/io/common.py", line 863, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/home/gturon/github/dili/data/melloddy/results/results_tmp/descriptors/T2_descriptors.csv'
Session file /home/gturon/github/dili/session.json
Traceback (most recent call last):
  File "/home/gturon/miniconda3/envs/zairachem/bin/zairachem", line 33, in <module>
    sys.exit(load_entry_point('zairachem', 'console_scripts', 'zairachem')())
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/gturon/github/zaira-chem/zairachem/cli/commands/fit.py", line 124, in fit
    s.setup()
  File "/home/gturon/github/zaira-chem/zairachem/setup/training.py", line 230, in setup
    self._standardize()
  File "/home/gturon/github/zaira-chem/zairachem/setup/training.py", line 156, in _standardize
    Standardize(os.path.join(self.output_dir, DATA_SUBFOLDER)).run()
  File "/home/gturon/github/zaira-chem/zairachem/setup/standardize.py", line 28, in run
    dfm = pd.read_csv(self.tuner_filename)[
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1705, in _make_engine
    self.handles = get_handle(
  File "/home/gturon/miniconda3/envs/zairachem/lib/python3.10/site-packages/pandas/io/common.py", line 863, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/home/gturon/github/dili/data/melloddy/results/results_tmp/standardization/T2_standardized.csv'
@DhanshreeA
Copy link
Member

Interesting, I'll look into this

@DhanshreeA DhanshreeA self-assigned this Oct 5, 2024
@JHlozek
Copy link
Collaborator

JHlozek commented Oct 11, 2024

I found when I was developing Olinda that I needed to pin various dependencies in my ZairaChem fork based on previous working setups as the dependency updates kept breaking ZairaChem.

  • Ersilia: I don't know what the issue is with the new Ersilia versions but happy to help test any ideas.

  • AutoGluon. This issue is new but I can reproduce it on my side. The problem seems to be with matplotlib in the autogluon.tabular[fastai] package. The conda approach didn't work for me either but I found that changing the package from "autogluon.tabular[all]==0.7.0" to "autogluon.tabular==0.7.0" allowed me to install ZairaChem again. This does prevent one or two of the trials from running during fitting so it is just a temporary workaround.

  • Mellody/Pandas/Numpy: I faced similar issues but found that Mellody v2.3.1, numpy 1.23.5 and pandas 2.1.4 did not seem to be breaking despite the new warning messages.

After fixing the AutoGluon issue, I can install my ZairaChem fork again. Can you maybe give it another try? If it works, then maybe we can use it as a starting point to update ZairaChem's dependencies.

@GemmaTuron
Copy link
Member Author

Hi @JHlozek

Mellody tuner gives the following warning:

Collecting git+https://github.com/melloddy/[email protected]
  Cloning https://github.com/melloddy/MELLODDY-TUNER.git (to revision 2.3.1) to /tmp/pip-req-build-9yv5p6b3
  Running command git clone --filter=blob:none --quiet https://github.com/melloddy/MELLODDY-TUNER.git /tmp/pip-req-build-9yv5p6b3
  WARNING: Did not find branch or tag '2.3.1', assuming revision or ref.
  Running command git checkout -q 2.3.1
  error: pathspec '2.3.1' did not match any file(s) known to git
  error: subprocess-exited-with-error
  
  × git checkout -q 2.3.1 did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git checkout -q 2.3.1 did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Just for clarity, in your install you are referring to v2.1.3, so I guess this is just a typo and the correct version is 2.1.3 :)

@JHlozek
Copy link
Collaborator

JHlozek commented Oct 15, 2024

Hi @GemmaTuron

I'm a bit confused - the version I see in my install_linux script is '2.1.3'?:
python3 -m pip install git+https://github.com/melloddy/[email protected]

I'm not sure where the 2.3.1 is from? :)

@GemmaTuron
Copy link
Member Author

In your comment above - I copied the versions from there :)
Mellody/Pandas/Numpy: I faced similar issues but found that Mellody v2.3.1, numpy 1.23.5 and pandas 2.1.4 did not seem to be breaking despite the new warning messages.

@GemmaTuron
Copy link
Member Author

Hi @JHlozek and @miquelduranfrigola

To close up this troubleshooting (temporarily as this is just a fix). The instructions that work are:

WORKDIR=$PWD

#conda init bash
eval "$(conda shell.bash hook)"

# create zairachem conda environment
ZAIRACHEM_ENVIRONMENT='zairachem'
conda create -n $ZAIRACHEM_ENVIRONMENT python=3.10 -y
#source $CONDA_PREFIX/etc/profile.d/conda.sh
conda activate $ZAIRACHEM_ENVIRONMENT

# pip
python3 -m pip install -U pip
python3 -m pip install setuptools==69.5.1
python3 -m pip install tables openpyxl

# other pip-installable dependencies
python3 -m pip install tensorflow==2.10.0
python3 -m pip install autokeras==1.0.20 
# install autogluon cpu
python3 -m pip install -U "mxnet<2.0.0"
python3 -m pip install autogluon.tabular==0.7.0


# install extra dependencies
python3 -m pip install git+https://github.com/chembl/[email protected]
python3 -m pip install -q -U keras-tuner==1.4.7

# install ersilia
python3 -m pip install git+https://github.com/ersilia-os/[email protected]
ersilia --help

# install ersilia compound embedding
python3 -m pip install eosce==0.2.0

# install isaura
python3 -m pip install git+https://github.com/ersilia-os/isaura.git@ce293244ad0bdd6d7d4f796d2a84b17208a87b56

# install stylia
python3 -m pip install git+https://github.com/ersilia-os/stylia.git

# install lazy-qsar
python3 -m pip install git+https://github.com/ersilia-os/[email protected]

# install melloddy-tuner
python3 -m pip install git+https://github.com/melloddy/[email protected]

# install tabpfn
python3 -m pip install tabpfn==0.1.8

# install imblearn
python3 -m pip install imbalanced-learn==0.10.1

#install olinda
python3 -m pip install -e git+https://github.com/JHlozek/olinda.git#egg=olinda

# install zairachem
python3 -m pip install -e .

it gives some conflicts but it works (see atached .txt for error details)
zairachem_install_log.txt

@JHlozek
Copy link
Collaborator

JHlozek commented Oct 15, 2024

@GemmaTuron
Oh I understand, yes that was a typo indeed.
In your comment above - I copied the versions from there :)

Glad it installs for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants