-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am always struggling with dimensions, axes, and coordinates in cf-python - it seems that ChatGPT does too! #788
Comments
So ChatGPT also offers this: for axis in field.domain_axes(todict=True).values():
coord = field.coordinate(axis) which gets one deeper into the mire, as that one raises a stack dump. I guess because an axis is not a coordinate. What I find interesting is that the LLM is make the same sort of leaps between things it understands based on the rest of the corpus of knowledge about NetCDF that I do ... and getting the same wrong answers. I think most non-CF experts do, which is the problem I have with the API ... there has to be a way to distinguish better between the underlying data model and what people want to do. |
(And yes, I realise the solution is to loop over |
My main response is that you (anyone!) should be consulting the documentation before consulting ChatGPT! (One would hope that such AI tools as the latter can be trained with, and sufficiently understand, documentation especially the API reference and tutorial, but who knows how well it can interpret it sufficiently well to be able to provide practical solutions...) I imagine you probably looked at the documentation (just didn't state this explicitly in your comment), so probably this isn't the response you want to hear, but I highlight it because then the the underlying issue is that our documentation not in a format where you could find what you needed to know quickly, which is a larger problem to solve, but on the agenda for our summit. An issue which perhaps need to be given high priority!
As for your difficulty at hand... In fact, don't listen to ChatGPT! There is no need to loop over anything (though that would also work, just is less direct and elegant). There are of course many ways to do it (against the Python mantra, but then again Python itself doesn't follow it's own mantra 😄 ) but I recommend you check out the 'Filtering Metadata Constructs' part of the tutorial of the documentation, which lists methods for filtering on various different objects you might want to query on. In this case, use >>> f = cf.example_field(1)
>>> t = f.constructs.filter_by_axis("T")
>>> t
<CF Constructs: dimension_coordinate(1)>
>>> print(t)
Constructs:
{'dimensioncoordinate3': <CF DimensionCoordinate: time(1) days since 2018-12-01 >}
>>> t.__dir__()
['_field_data_axes', '_key_base', '_array_constructs', '_non_array_constructs', '_construct_axes', '_construct_type', '_constructs', '_ignore', '_prefiltered', '_filters_applied', '__module__', '__doc__', '__repr__', '_matching_values', '_flip', 'close', '_filter_by_identity', '_short_iteration', '__call__', '__contains__', '__copy__', '__deepcopy__', '__docstring_package_depth__', '__docstring_substitutions__', '__getitem__', '__init__', '__iter__', '__len__', '__str__', '_atol', '_axes_to_constructs', '_check_construct_type', '_clear', '_component_filter', '_construct_dict', '_construct_type_description', '_custom', '_default', '_del_component', '_del_construct', '_del_data_axes', '_dictionary', '_equals', '_equals_cell_method', '_equals_coordinate_reference', '_equals_domain_axis', '_equals_preprocess', '_filter_by_axis', '_filter_by_cell', '_filter_by_connectivity', '_filter_by_data', '_filter_by_key', '_filter_by_measure', '_filter_by_method', '_filter_by_naxes', '_filter_by_ncdim', '_filter_by_ncvar', '_filter_by_property', '_filter_by_size', '_filter_by_type', '_filter_convert_to_domain_axis', '_filter_parse_args', '_filter_preprocess', '_get_component', '_has_component', '_iter', '_package', '_pop', '_rtol', '_set_climatology', '_set_component', '_set_construct', '_set_construct_data_axes', '_update', '_view', 'clear_filters_applied', 'construct_type', 'construct_types', 'copy', 'data_axes', 'domain_axes', 'domain_axis_identity', 'equals', 'filter', 'filter_by_axis', 'filter_by_cell', 'filter_by_connectivity', 'filter_by_data', 'filter_by_identity', 'filter_by_key', 'filter_by_measure', 'filter_by_method', 'filter_by_naxes', 'filter_by_ncdim', 'filter_by_ncvar', 'filter_by_property', 'filter_by_size', 'filter_by_type', 'filters_applied', 'get', 'get_data_axes', 'inverse_filter', 'items', 'key', 'keys', 'new_identifier', 'ordered', 'replace', 'shallow_copy', 'todict', 'unfilter', 'value', 'values', '__doc_template__', '__dict__', '__weakref__', '__new__', '__hash__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__reduce_ex__', '__reduce__', '__getstate__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__']
>>> t.data_axes()
{'dimensioncoordinate3': ('domainaxis3',)}
>>> t.value().array
array([31.]) |
That's not actually what I wanted to do. I wanted to know the index of the time axis in the shape of the data array ... so filtering is no good if it doesn't tell me that. Frankly, most people are no longer going to consult documentation beyond page one (I don't any more, I get a much better idea MUCH FASTER from ChatGPT. Yes it's sometimes wrong, but it's informatively wrong). What we need are the docs to be good enough for ChatGPT (and friends) to get it right ... and in fact, it did help, because I was able to recognise that the loop and check was what I wanted to do, it was just that ChatGPT (and me) looped over the wrong thing first ... but it was much easier for me to work out the right thing from there, than it was in the docs, from scratch. But in this case, my point is that the API is not intuitive. I was just amused that ChatGPT finds it un-intuitive in exactly the same way that I do. |
OK fair, I misunderstood what you wanted. I appreciate the specific question is not so much the point of this thread/Issue, but FYI, we have the For example: >>> f = cf.example_field(1)
>>> i = f.indices(T=slice(None)) # slice representing the whole time/T array
>>> i
(slice(None, None, None), slice(None, None, None), slice(None, None, None))
>>> f[i]
<CF Field: air_temperature(atmosphere_hybrid_height_coordinate(1), grid_latitude(10), grid_longitude(9)) K> In this case we are talking more about sub-spacing than filtering, but either way API-wise you need a way to specify you want to know about the time axis and not anything else, right?
Well, I would say ChatGPT wasn't that helpful in this case because we offer means to do it without a loop, which should be used over a loop ideally because the user-side iteration is not necessary. But that's not to say it isn't helpful generally. I just aspire to having documentation that can be a first port of call for people, that they feel comfortable consulting and that feels like a natural and well-structured source of information and not intimidating, directing them to what they want to know without them having to wade through other information, because you are right that people won't read through anything that is too long or detailed. (Anyone can also use ChatGPT to get code for them, but that's a black box we can't control so doesn't merit much concern in my opinion).
For the most part, I would disagree. I think it is fairly intuitive already, though there will absolutely be room for improvement in many specific areas. To me, it is a documentation issue mostly - the documentation needs to make it clearer how to do such things as filtering, getting indices, and so on - common operations that users will want to do, in a way that directs them to relevant pages covering examples of what they want (code snippets as ChatGPT would provide which are especially useful). I am thinking we could even have a flow chart to point people to the right locations in the (revamped) documentation, to help this. But I am happy if you can try to persuade me otherwise about the intuitiveness or otherwise. We can discuss this further at the summit next week. I suspect David will have strong opinions 🙂 |
Does this help .... >>> f = cf.example_field(1)
>>> f # No T dimension in the data, but there is an X dimension
<CF Field: air_temperature(atmosphere_hybrid_height_coordinate(1), grid_latitude(10), grid_longitude(9)) K>
>>> f.get_data_axes()
('domainaxis0', 'domainaxis1', 'domainaxis2')
>>> T = f.domain_axis('T', key=True)
>>> T
'domainaxis3'
>>> T in f.get_data_axes()
False
>>> X = f.domain_axis('X', key=True)
>>> X
'domainaxis2'
>>> print (f"X is at position {f.get_data_axes().index(X)}")
X is at position 2 |
Not answering the question directly, but there is some mention of this sort of thing in the tutorial: https://ncas-cms.github.io/cf-python/tutorial.html#data-axes |
FYI the Xarray equivalent of getting the index of a specific (data) axis appears to be:
|
Thanks, Dan! We could of course create |
I want to find which axis in the field is associated with time, but I also want to know where it is in the chunk shape matrix.
Concrete example: I know I can do this:
t_axis = t = ff.coordinate('T')
, but I don't know (programmatically) where that might be in an array shape of (720, 1920, 2560). (The answer is 0, but it could be 1 or 2).I see that
t_axis.identity()
gives metime
. Good. As it happens, I asked ChatGPT how to do this: "I am using cf-python to read some netcdf data. How do I find out which axis of the data is the time axis (if there is a time axis)" and ChatGPT suggested thatI "Iterate through the field constructs to identify which one represents the time axis.".
Great. I thought, given the above. ChatGPT even gave me this bit of code:
That looked like it should work. But I didn't notice the difference between a domain axis and a coordinate (and neither did ChatGPT), which I expect explains that this is what the identities returned in that loop actually are:
ncdim%time, ncdim%latitude, ncdim%longitude
.So that didn't work. At this point I am thinking, this is too hard.
Yuck. I think civilians find these distinctions impossible to work with.
The text was updated successfully, but these errors were encountered: