Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Cell Property other than "Value" from dataframe functions in CellService #1115

Open
bennybooo22 opened this issue May 6, 2024 · 6 comments
Labels

Comments

@bennybooo22
Copy link

bennybooo22 commented May 6, 2024

Describe the bug
The all the execute dataframe functions only ever return "Value", no matter what cell_properties is passed

To Reproduce
Run the following code with a valid MDX: tm1.cells.execute_mdx_dataframe_pivot(mdx, cell_properties=['Updateable', 'Value'])
(After fixing the previously posted bug or doing a execute_view_dataframe_pivot instead)

Expected behavior
Return of BOTH the "Value" and "Updateable" property.

Version
TM1py: 2.0.2
TM1 Server Version: 11.8

Additional context
Possible steps to fix:

  1. Add "cell_properties: list = ['Value']" to argument of function "build_pandas_dataframe_from_cellset" in Utils.py
  2. Replace "cellset_clean[element_names] = cell['Value'] if cell else None" with "cellset_clean[element_names] = [cell[cell_property] for cell_property in cell_properties] if cell else None" in function "build_pandas_dataframe_from_cellset" in Utils.py
  3. Replace "df = pd.DataFrame(values, index=index, columns=["Values"])" with "df = pd.DataFrame(values, index=index, columns=cell_properties)" in function "build_pandas_dataframe_from_cellset" in Utils.py
  4. Add "cell_properties = kwargs.get('cell_properties', ['Value'])" to beginning of function "extract_cellset_dataframe_pivot" in CellService.py
  5. Replace "df = build_pandas_dataframe_from_cellset(data, multiindex=False)" with "df = build_pandas_dataframe_from_cellset(data, multiindex=False, cell_properties=cell_properties)" in function "extract_cellset_dataframe_pivot" in CellService.py
  6. Replace "values=["Values"]" with "values=cell_properties" in the return line of the function "extract_cellset_dataframe_pivot" in CellService.py

Note: This will make the dataframe return 'Value' as Value title instead of what was previously 'Values'. I imagine many python scripts have df = df['Values'] hard coded into them, so there is an issue of backwards compatability

@bennybooo22 bennybooo22 added the bug label May 6, 2024
@rclapp
Copy link
Collaborator

rclapp commented May 6, 2024

Can you provide an example of what you would want the DF to look like? Would it have more columns, more rows?

@bennybooo22
Copy link
Author

Can you provide an example of what you would want the DF to look like? Would it have more columns, more rows?

Absolutely!

Here are the lines of code I added to print the result:
Screenshot 1

And here is the result if you make the changes stated above to TM1py:
Screenshot 2

The result is a dataframe with a multiindex for the columns

@MariusWirtz
Copy link
Collaborator

I wonder how we can support cell properties in a convenient way in the execute_mdx_dataframe function.

Perhaps if we introduced two additional args:

  • cell_properties (list of cell properties)
  • cell_properties_axis (rows or columns)

Would that help in your case?

Re the execute_mdx_dataframe_pivot function. This one is less useful than its sister function, execute_mdx_dataframe.
Is there any particular reason you chose this one over the "normal" function?
TBH, I hope we can deprecate the execute_mdx_dataframe_pivot function soon. It doesn't have a Polars equivalent. It's also painful for us to maintain and roll out all continuous improvements from execute_mdx_dataframe to execute_mdx_dataframe_pivot. I personally don't find it very useful either.

@bennybooo22
Copy link
Author

Yeah, I think it's not an easy thing to add in a clean way. The cell_properties and cell_properties_axis seems like a good way to make an easy implementation. I think that would help in my case.

Personally, my default is always to use execute_mdx_dataframe_pivot over execute_mdx_dataframe. When using python for some sort of data analysis, I will typically construct the MDX to be the cubeview that I plan on analyzing. For example, if I am comparing 3 dimensions in a cube, and want to take the average across 2 dimensions. This may just be my lack of familiarity with pandas but it is much easier to do the aforementioned operation when the dataframe has been pivoted than in the default returned dataframe from execute_mdx_dataframe. Back before execute_mdx_dataframe_pivot existed or before I knew it existed, the following line after execute_mdx_dataframe was always ....pivot_table().

I can't really speak on the maintainability of the function, but I find it very useful.

@rclapp
Copy link
Collaborator

rclapp commented May 14, 2024 via email

@bennybooo22
Copy link
Author

Have you tried to include properties in the execute_mdx_dataframe method and set the shaped parameter as true? It takes a different approach, but also returns data frame that is pivoted to match the view spec:

df = df.pivot_table(

I just tried that by running tm1.cells.execute_mdx_dataframe(mdx, shaped=True, cell_properties=['Updateable', 'Value']) and unfortunately got the following error:
TypeError: TM1py.Services.CellService.CellService.extract_cellset_raw() got multiple values for keyword argument 'cell_properties'

In CellService.py the function extract_cellset_csv has a self.extract_cellset_raw() which includes both **kwargs and cell_properties=["Value"] in the arguments. So passing cell_properties in the **kwargs is likely causing the error.

Also worth noting that the behavior of a dataframe that has been created using pivot vs. shaped are different. I think pivot makes the row dimension the index and shaped makes the row dimension a column.

To @MariusWirtz 's point about maintainability, if the shaped function is considered maintainable, it might be possible to have the pivot function just run tm1.cells.execute_mdx_dataframe(mdx,shaped=True) followed by df.set_index( **list of row dimensions** ). I don't know if that is something that sounds good on paper but not in practice though. Might be worth looking into? If not I can just run df.set_index( **list of row dimensions** ) after getting a dataframe from TM1, it's not the biggest issue in the world haha.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants