Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting "inspect" and context awareness udf.rst #617

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

daviddkovacs
Copy link

Hi, I made a quick description on how to use the "inspect" function to print within the UDFs.

moreover, I tried to give a brief explanation on how users can provide their own local side variables to be used within the UDF.

Hi, I made a quick description on how to use the "inspect" function to print within the UDFs.

moreover, I tried to give a brief explanation on how users can provide their own local side variables to be used within the UDF.
@jdries
Copy link
Collaborator

jdries commented Sep 13, 2024

Nice, thanks!
For the bit about inspecting variables, we may want to integrate it with the existing explanation about logging?
https://open-eo.github.io/openeo-python-client/udf.html#logging-from-a-udf
Or if somehow the existing explanation was hard to find, suggestions for improvement are welcome.

@soxofaan
Copy link
Member

Hi, thanks for taking the time to contribute!

About the part on inspect in UDF, I agree that it might be better to integrate that with the existing inspect docs at https://open-eo.github.io/openeo-python-client/udf.html#logging-from-a-udf or make them easier to discover (e.g. move it up or sprinkle some references around).

About the part on passing through the context: that is indeed a valid topic to document better. Also see the discussion at #520
A thing that should be added to the current PR is how to pass the context to the "parent" process of the UDF

docs/udf.rst Show resolved Hide resolved
docs/udf.rst Outdated Show resolved Hide resolved
docs/udf.rst Outdated Show resolved Hide resolved
@daviddkovacs
Copy link
Author

Hi, thanks for taking the time to contribute!

About the part on inspect in UDF, I agree that it might be better to integrate that with the existing inspect docs at https://open-eo.github.io/openeo-python-client/udf.html#logging-from-a-udf or make them easier to discover (e.g. move it up or sprinkle some references around).

About the part on passing through the context: that is indeed a valid topic to document better. Also see the discussion at #520 A thing that should be added to the current PR is how to pass the context to the "parent" process of the UDF

Yes, indeed it is better to integrate into the existing docs.
I am looking into passing the context ot the parent process of the UDF (i.e. the part NOT in apply_datacube function right ?), however I didnt manage to do that, nor found anything on it. Can you indicate me where it is shown, and I'll integrate it in the PR

@soxofaan
Copy link
Member

good point, it's not easy to find succint examples on properly using context in UDF.

We have some unit test coverage in the VITO backend on this for example at https://github.com/Open-EO/openeo-geopyspark-driver/blob/cdd731ce6d684eba894beff7c8ac78266ddf12b0/tests/test_api_result.py#L718-L889, but that's probably a bit cryptic.

I think there are two use cases to document:

  1. directly passing context to run_udf
udf = openeo.UDF(
    "...", 
    context={"factor": 12.34},
)
cube = cube.apply(udf)
  1. passing context to parent process, and pass it through in run_udf
udf = openeo.UDF(
    "...", 
    context={"from_parameter": "context"},
)
cube = cube.apply(udf, context={"factor": 12.34})

Both of these patterns have their usefulness .The first is simpler to reason about. The second is the approach to take when the context comes from "higher up", e.g. UDP parameters

@soxofaan
Copy link
Member

That being said, I think the python client should make it simpler to get that second usage pattern right. I made a ticket for that:

Copy link
Author

@daviddkovacs daviddkovacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added proper description, shorter lines and passing "context" to parent udf

Copy link
Member

@soxofaan soxofaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more notes

@@ -317,7 +317,58 @@ To invoke a UDF like this, the apply_neighborhood method is most suitable:
{'dimension': 'y', 'value': 128, 'unit': 'px'}
], overlap=[])
Inspecting variables within UDF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is still redundant given the existing "Logging from a UDF" section at

Logging from a UDF
=====================
From time to time, when things are not working as expected,
you may want to log some additional debug information from your UDF, inspect the data that is being processed,
or log warnings.
This can be done using the :py:class:`~openeo.udf.debug.inspect()` function.
For example: to discover the shape of the data cube chunk that you receive in your UDF function:
.. code-block:: python
:caption: Sample UDF code with ``inspect()`` logging
:emphasize-lines: 1, 5
from openeo.udf import inspect
import xarray
def apply_datacube(cube: xarray.DataArray, context: dict) -> xarray.DataArray:
inspect(data=[cube.shape], message="UDF logging shape of my cube")
cube.values = 0.0001 * cube.values
return cube
After the batch job is finished (or failed), you can find this information in the logs of the batch job.
For example (as explained at :ref:`batch-job-logs`),
use :py:class:`BatchJob.logs() <openeo.rest.job.BatchJob.logs>` in a Jupyter notebook session
to retrieve and filter the logs interactively:
.. image:: _static/images/udf/logging_arrayshape.png
Which reveals in this example a chunking shape of ``[3, 256, 256]``.

I'd propose to finetune the existing docs if that is necessary

Passing user defined variables to UDF
========================================

In order to pass variables and values that are used throughout the user side of script, these need to be put in the `context` dictionary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used throughout the user side of script

I'm not sure what you mean here, e.g. with "user side". And what "script" are you referring to? The script that build the process graph, or the UDF script?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the user side, what I mean is the script where the user runs the UDF.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused what you mean:

where the user runs the UDF.

the user does not run the UDF user side, it's the backend that executes the UDF backend-side

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the script where the user defines the UDF script.

========================================

In order to pass variables and values that are used throughout the user side of script, these need to be put in the `context` dictionary.
Once, these variables are defined within `context` dictionary, the UDF needs to be made context aware, by adding `context={"from_parameter": "context"}` at the end of your UDF.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Once, these variables are defined within `context` dictionary, the UDF needs to be made context aware, by adding `context={"from_parameter": "context"}` at the end of your UDF.
Once these variables are defined within `context` dictionary, the UDF needs to be made context aware, by adding `context={"from_parameter": "context"}` at the end of your UDF.

variables are defined within context dictionary

This is a bit confusing, because in your example you define them in a user_variable dictionary

docs/udf.rst Outdated Show resolved Hide resolved
In the example above, the user stores a preferred value of `0.0001` in the `user_variable` dictionary,
which can be passed to the UDF and used by the function.
Later, this value is accessed by calling `context["factor"]` within the UDF.
The parent UDF is called with the user's custom dictionary with `.apply(udf, context = user_variable)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The UDF is not the parent here, apply is the parent.

the hierarchical flow is apply -> run_udf -> your UDF

Co-authored-by: Stefaan Lippens <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants