Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Bug: Ersilia Python API Fetch refactoring #1449

Open
GemmaTuron opened this issue Dec 16, 2024 · 17 comments
Open

🐛 Bug: Ersilia Python API Fetch refactoring #1449

GemmaTuron opened this issue Dec 16, 2024 · 17 comments
Labels
bug Something isn't working

Comments

@GemmaTuron
Copy link
Member

Describe the bug.

I am trying to use Ersilia as a Python Package. The functions work if the model is fetched but it is unable to actually fetch the model from DockerHub.
As a template I am using the functions prepared for ChemSampler: https://github.com/ersilia-os/chem-sampler/blob/main/chemsampler/core/base.py

The only message I get at fetching time is:

>>> from ersilia.hub.fetch.fetch import ModelFetcher
>>> mf = ModelFetcher(force_from_dockerhub=True)
13:10:57 | DEBUG    | Initialized with URL: None
13:10:57 | DEBUG    | Getting model source
13:10:57 | DEBUG    | Model getting fetched from DockerHub
>>> mf.fetch("eos4avb")
<coroutine object ModelFetcher.fetch at 0x7998692cf890>

Describe the steps to reproduce the behavior

No response

Operating environment

Ubuntu 24.04 LTS

@GemmaTuron GemmaTuron added the bug Something isn't working label Dec 16, 2024
@Abellegese
Copy link
Contributor

Hey @GemmaTuron its async function and needed to be awaited or run using asyncio just put it like this

asyncio.run(mf.fetch("eos4avb"))

@GemmaTuron
Copy link
Member Author

ok, this works @Abellegese

When was this change introduced? Previously asyncio was not required, and why the other functions like serve and run do not need it? And important as well, where is this documented?

@Abellegese
Copy link
Contributor

Okay @GemmaTuron nice question.

This change was the first introduced before even improving the run command. I started with the fetch command making them light and non-block and concurrent (look this issue #1299). This process is intensive and I used async + concurrency to make them resource efficient and non-blocking. Serve was optimized so I had not touch it.

Does it makes sense?

@GemmaTuron
Copy link
Member Author

This makes sense @Abellegese thanks, but these changes are not acceptable without a user-wide warning and proper documentation. If you change a function that is user-facing it needs to be stated everywhere. At the moment, both ZairaChem and ChemSampler will fail because of this change.

Our current documentation on the python package is here: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/local-inference

Can you please clarify what happens if I only run, as suggested:

from ersilia import ErsiliaModel
mdl = ErsiliaModel("retrosynthetic-accessibility")
mdl.serve()

Will the serve automatically try to fetch the model, and will it do it properly using the asyncio function ?

@Abellegese
Copy link
Contributor

@GemmaTuron thats very true, needs proper docs somewhere. So apparently when you call serve it wont fail because I changed the auto fetch code in the ErsiliaModel

 if not self._is_available_locally and fetch_if_not_available:
        self.logger.info("Model is not available locally")
        try:
            do_fetch = yes_no_input(
                "Requested model {0} is not available locally. Do you want to fetch it? [Y/n]".format(
                    self.model_id
                ),
                default_answer="Y",
            )
        except:
            self.logger.debug("Unable to capture user input. Fetching anyway.")
            do_fetch = True
        if do_fetch:
            fetch = importlib.import_module("ersilia.hub.fetch.fetch")
            mf = fetch.ModelFetcher(
                config_json=self.config_json, credentials_json=self.credentials_json
            )
            asyncio.run(mf.fetch(self.model_id))

@GemmaTuron
Copy link
Member Author

In ersilia/coremodel.py
I see the following:

            if do_fetch:
                fetch = importlib.import_module("ersilia.hub.fetch.fetch")
                mf = fetch.ModelFetcher(
                    config_json=self.config_json, credentials_json=self.credentials_json
                )
                mf.fetch(self.model_id)

@Abellegese
Copy link
Contributor

I don't think you have the latest ersilia.

@GemmaTuron
Copy link
Member Author

I pip installed it, which means the asyncio was already required in the latest release but was not there actually?

@Abellegese
Copy link
Contributor

Yeah but just pull rebase from ersilia. I think this change has been made 3 weeks ago.

@GemmaTuron
Copy link
Member Author

GemmaTuron commented Dec 16, 2024

If that is the case we need to be more careful about the pip relases because we cannot release code that will not work. In ersilia v0.1.39 the ModelFetcher does not work, which makes me think that this asyncio bit was not there but was already required? have not looked deeply into it yet. Please @Abellegese let's make sure things like this do not happen moving forward. Users will not install ersilia from github, and our tools like ZairaChem will also only use the pip package. When is the next release with this fix scheduled for?

@Abellegese
Copy link
Contributor

Honestly @GemmaTuron I don't know the release pipeline work. But this change has been made long ago.

Okay I will make sure this does not happen. I will try my best.

@GemmaTuron
Copy link
Member Author

Thanks @Abellegese

I'll close this issue then as it seems solved on the codebase, and then the next release of Ersilia will already have it fixed. Ersilia releases should be automatically happening every month

@GemmaTuron GemmaTuron reopened this Dec 16, 2024
@GemmaTuron GemmaTuron reopened this Dec 16, 2024
@GemmaTuron
Copy link
Member Author

I am reopening the issue to track the maintenance of this feature. @DhanshreeA is the ErsiliaModel module tested consistently in CI/CD workflows? It seems it might not be given that the v0.1.39 is failing. Can we include it in the testing before a new version is released?

@DhanshreeA
Copy link
Member

DhanshreeA commented Dec 16, 2024

@GemmaTuron yes the ErsiliaModel class is indeed tested in the test suite - however, the way the test is set up ensures that the model is first fetched using the ModelFetcher, which is an awaited coroutine, thereafter using the ErsiliaModel class to serve it:

@patch("ersilia.core.model.ErsiliaModel")
def test_models(
    mock_ersilia_model,
    mock_fetcher,
    mock_session,
    mock_set_apis,
    mock_convn_api_get_apis,
    mock_api_task,
    mock_serve,
    mock_run,
    mock_close,
):
    MODEL_ID = MODELS[1]
    INPUT = "CCCC"

    mf = ModelFetcher(overwrite=True)
    asyncio.run(mf.fetch(MODEL_ID))

    em = ErsiliaModel(
        model=MODEL_ID, service_class="docker", output_source="LOCAL_ONLY"
    )

    result = em.run(
        input=INPUT,
        output="result.csv",
        batch_size=100,
        track_run=False,
        try_standard=False,
    )

    em.serve()
    em.close()
    ....

@DhanshreeA
Copy link
Member

DhanshreeA commented Dec 16, 2024

However there is no test for using ErsiliaModel when a model isn't fetched (ie testing that serve automatically fetches it). We should add that. So what we conclude is this - until 0.1.39, you would have to fetch the model before serving it, while this would be fixed in the 0.1.40 release.

@GemmaTuron
Copy link
Member Author

Hi @DhanshreeA

Indeed it does not make sense that we fetch the model first and then serve it if what we are documenting to the user is only the serve function. I would not duplicate the fetch, simply remove it and go directly with the serve, that will anyhow use the fetch internally. Feel free to close this issue once this is solved.

@GemmaTuron GemmaTuron changed the title 🐛 Bug: Ersilia as Python Package fails at fetch time 🐛 Bug: Ersilia Python API Fetch refactoring Dec 17, 2024
@GemmaTuron
Copy link
Member Author

After giving some thought, this is how the Python API should look like:

from ersilia import ErsiliaModel
mdl = ErsiliaModel("retrosynthetic-accessibility")
mdl.fetch()
mdl.serve()
mdl.run(input, output)

This means we have to add a small refactoring on the fetch function creating a wrapper for the asyncio.run(fetch) and of course updating the CI/CD tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: On Hold
Development

No branches or pull requests

3 participants