Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram chart not able to use decimal datatype column #30301

Closed
3 tasks done
clavidk opened this issue Sep 16, 2024 · 3 comments · Fixed by #30416
Closed
3 tasks done

Histogram chart not able to use decimal datatype column #30301

clavidk opened this issue Sep 16, 2024 · 3 comments · Fixed by #30416
Assignees
Labels
viz:charts:histogram Related to the Histogram chart

Comments

@clavidk
Copy link

clavidk commented Sep 16, 2024

Bug description

When I try to create a Histogram chart and choose a decimal datatype column (e.g. start_rate in my scenario)as the main column, I get an error that says:

The column 'start_rate' must be numeric.

But I have checked that it is a decimal datatype (Postgres db) which is equivalent to the numeric datatype.

How to reproduce the bug

  1. Create a new Histogram chart (not the legacy version)
  2. For column, select the column that is a decimal data type.
  3. Click save/update chart

What I expect: histogram is created

What actually happens: I get the following error:

DB engine Error
The column 'start_rate' must be numeric.

This may be triggered by:
Issue 1011 - Superset encountered an unexpected error.

Please reach out to the Chart Owner for assistance.

Chart Owner: Superset Admin

Screenshots/recordings

image

Superset version

master / latest-dev

Python version

I don't know

Node version

I don't know

Browser

Chrome

Additional context

No response

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Copy link

dosubot bot commented Sep 16, 2024

I found a similar closed issue that might be relevant to your problem: [Table Viz] "unsupported operant type errors" in percentage metrics [1].

The error "The column 'start_rate' must be numeric" occurs because the Histogram chart in Superset requires the specified column to be numeric. The histogram function in the superset/utils/pandas_postprocessing/histogram.py file checks if the column is numeric and raises a ValueError if it is not.

To ensure that the start_rate column is treated as numeric, you should verify that its datatype in the Postgres database is correctly set to a numeric type (e.g., decimal, float, integer). Additionally, ensure that the column is included in the numeric_columns list in the form data for the Histogram chart.

Here is the relevant part of the histogram function:

def histogram(
    df: DataFrame,
    column: str,
    groupby: list[str] | None,
    bins: int = 5,
    cumulative: bool = False,
    normalize: bool = False,
) -> DataFrame:
    """
    Generate a histogram DataFrame from a given DataFrame.

    Parameters:
    df (DataFrame): The input DataFrame.
    column (str): The column of the DataFrame to calculate the histogram on.
    groupby (list[str]): The columns to group by. If empty, no grouping is performed.
    bins (int): The number of bins to use for the histogram. Default is 5.
    cumulative (bool): Whether to calculate a cumulative histogram. Default is False.
    normalize (bool): Whether to normalize the histogram. Default is False.

    Returns:
    DataFrame: A DataFrame where each row corresponds to a group (or the entire DataFrame if no grouping is performed),
               and each column corresponds to a histogram bin. The values are the counts in each bin.
    """

    if groupby is None:
        groupby = []

    # check if the column is numeric
    if not np.issubdtype(df[column].dtype, np.number):
        raise ValueError(f"The column '{column}' must be numeric.")

Ensure that the start_rate column is included in the numeric_columns list in the form data. If it is not, you will need to update the form data to include it. Additionally, verify that the column's datatype in the Postgres database is correctly set to a numeric type [2][3].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the viz:charts:histogram Related to the Histogram chart label Sep 16, 2024
@BlakeList
Copy link

Even if the columns are correctly set to numeric datatypes in the database, Superset still doesn't interpret this properly (converts to decimal type).

One way around this is to convert/cast the column to FLOAT or REAL data type (not NUMERIC) in your SQL query that produces the dataset in Superset. This is then interpreted as a float type in Superset and can be used in the current histogram visualization.

Hopefully this can be solved on Superset's end, but otherwise I hope it helps with your problem for now :)

@clavidk
Copy link
Author

clavidk commented Sep 20, 2024

@BlakeList This worked, thanks for that tip!

Though it seems it surfaced a new issue in that the histogram chart only creates bin delineations at whole integer numbers and doesn't play nicely when a lot of the values are <1 e.g. 0.33)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
viz:charts:histogram Related to the Histogram chart
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants