Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement - Make the number of Columns in LoadColumnStatistics configurable #131

Closed
corycundy opened this issue Jul 30, 2024 · 17 comments · Fixed by #135, #137 or #138
Closed

Enhancement - Make the number of Columns in LoadColumnStatistics configurable #131

corycundy opened this issue Jul 30, 2024 · 17 comments · Fixed by #135, #137 or #138
Labels
enhancement New feature or request

Comments

@corycundy
Copy link

corycundy commented Jul 30, 2024

Hello, I am running into a problem when trying to run VertiPaq Analyzer against a Direct Lake model. The model is much larger than the SKU size and our capacity is constantly running out of memory. This is being addressed from a model standpoint, however, in the mean time I wanted to get a full view of the columns. I ran the VertiPaq Analyzer both from DAX Studio and Tabular Editor 3. In DAX Studio I set Read statistics from data and in TE3 I used Read Statistics from data and set Statistics from Direct Lake models to Full.

The problem is that the there are many large columns that can easily fill up the memory. The VertiPaq Analyzer appears that it tries to get column statistics for 50 columns at a time. This simply doesn't work as running the EVALUATE UNION ( ROW ( .... over 50 large columns exceeds the SKU memory limit for a model. I strongly believe this would work fine if I was able to adjust the 50 down to a much lower number, maybe even 1. I understand the optimization to run one EVALUATE for 50 columns at a time, for my specific instance, I'd rather see it churn through 50 individual queries.

I can't imagine there is any way to configure this at the moment as 50 is hard coded into the code. My request would be that this could be configurable even if it is possible via a config file vs. a UI setting in DAX Studio or TE3.

Here is the code that I believe is causing 50 columns to be read. The profiler for some reason only had 49 in the DAX query.
image

Here is an example of the DAX query from the profiler.

image
@marcosqlbi
Copy link
Collaborator

Hi Cory, this is a good point - however, as you can imagine, there are multiple steps involved:

  1. Add the parameter to the VPAX Library (@albertospelta, what do you think about it?)
  2. Expose that parameter in DAX Studio (and Tabular Editor) configuration. We'll have to open a similar Enhancement request there.

Marco

@albertospelta albertospelta added the enhancement New feature or request label Jul 31, 2024
@albertospelta
Copy link
Collaborator

@marcosqlbi we can definitely add the new parameter to the public API. To avoid breaking changes for current users, we'll keep the default value at 50. This way, existing setups will continue to function as they do now.

@corycundy
Copy link
Author

I know there is work involved and I know the best solution should be implemented. I have a few suggestions that may speed up the time of development to implement.

While this may not seem like the best solution in the end, it would allow for immediate use any existing client tools. It would also allow me to run a full VertiPaq scan on my existing model sooner! The application could read from an appSetting from the app.Config. The logic could then use the configuration value instead of the hardcoded 50 and if it didn't exist in the app.config it could default to 50. For example, DaxStudio.exe.config an entry could be added such as .The same could be done in Tabular Editor 3. Later this could become more of a parameter that the client tools could pass in.

In the end, this may rarely be needed, so exposing it in the UI of client tools would be fine, but most often not used. If it ends up there at some point great, but I would currently be happy with this behind the scenes configuration approach as a starting point. This suggestion is really an effort to make this a simple modification and expedite the time it takes.

Unless I'm missing something, this may only be a few lines of code. I'm curious of others thoughts on this idea.

@marcosqlbi
Copy link
Collaborator

@corycundy I am not sure that reading the config from a multi-platform library is possible, and even in that case, I do not that idea much!
However, @albertospelta what do you think about that?

@albertospelta
Copy link
Collaborator

@corycundy I agree with Marco as implementing it in a multi-platform context could be complex and difficult to maintain. The VPAX library can't always know how the host application handles its configuration, and it might not always be through an app.config file. Furthermore, modifying a configuration file in the program's installation directory usually requires admin privileges, which can be difficult for many users.

Ideally, this parameter should be available as advanced settings in the host application's UI, but exposing that parameter might take some time. Meanwhile, I was wondering if you could use our command-line tool Dax.Vpax.CLI, which we recently released. We can definitely add this parameter to the CLI and get it out soon. Would you be able to use this tool to extract your VPAX files temporarily until the new setting is available in other tools?

@otykier
Copy link
Collaborator

otykier commented Aug 5, 2024

@albertospelta could you let me know once the public VertiPaq Analyzer NuGet contains this parameter? Then we'll make sure to surface it in TE3's Preferences dialog.

@albertospelta
Copy link
Collaborator

@otykier Sure thing! I’ll let you know as soon as the NuGet is available. Thanks!

@albertospelta albertospelta linked a pull request Aug 5, 2024 that will close this issue
@albertospelta
Copy link
Collaborator

@otykier the new parameter is available in version v1.7.0

@albertospelta
Copy link
Collaborator

@corycundy The new parameter columnBatchSize has been added to the VPAX library and will be included in TE3's preferences dialog. I have also opened an issue for support in DaxStudio here.

If you would like, you can use this setting with the command-line tool, which now supports it. To get started, install the CLI by running dotnet tool install Dax.Vpax.CLI --global from your command prompt. After installation, you can use the new --column-batch-size option to configure number of colums to query at a time. For a complete list of available options, run vpax export --help.

@corycundy
Copy link
Author

Hello @albertospelta , this change is great and super timely. Thank you so much for implementing this! I was able to call the CLI version with this new parameter. However, I unfortunately stumbled upon a few potential bugs.

I was not able to get --column-batch-size to work with a value of 1, which is my ultimate goal. Even 2 causes me issues in my current Direct Lake model. We are hitting memory error and are working with Microsoft CAT and support on this. I got the following error from the command line. This was against an Adventure Works Internet Sales.

Extracting VPAX metadata...Unhandled exception: Microsoft.AnalysisServices.AdomdClient.AdomdErrorResponseException: The end of the input was reached.

I was able to get the following from DAX Studio.

The end of the input was reached.

Query Text:
EVALUATE

Looking at the code, I don't see how this can happen unless the set of validColumns is slightly different than the results of the .Where statement that looks at both IsRowNumber and GroupByColumns. I tested multiple models with a parameter of 1 and received this error.

Secondly, on one Direct Lake model, I also received the following error which seems to be related. By looking at the code, I also don't understand this error either.

Extracting VPAX metadata...Unhandled exception: Microsoft.AnalysisServices.AdomdClient.AdomdErrorResponseException: Query (1, 10) Too few arguments were passed to the UNION function. The minimum argument count for the function is 2.

I was able to get the following from DAX Studio.
Query (1, 10) Too few arguments were passed to the UNION function. The minimum argument count for the function is 2.

Query Text:
EVALUATE UNION(
ROW("Table", "0000Date", "Column", "0001Update Timestamp", "Cardinality", DISTINCTCOUNT('Date'[Update Timestamp])))

I couldn't reproduce this one on a second model. It somehow is adding the UNION with only column. I don't believe I have any Group By columns in any tables in the models I am testing with. I hope that the fix to the first one turns out fixing this one as well.

Please let me know if you have any questions that I can help with regarding this. I would appreciate any additional help working through these issues.

Cory

@albertospelta
Copy link
Collaborator

Hi @corycundy,
I am able to reproduce the first error ("The end of the input was reached.") and have a fix for it.
Regarding the second error ("Too few arguments were passed to the UNION function"), I have a couple of questions:

  • What value did you use for the --column-batch-size parameter when this error occurred?
  • Could you please check if any column in the Date table has the "Group By Columns" attribute specified?

Thanks

@corycundy
Copy link
Author

@albertospelta, thanks for the quick response. I used 3 for the --column-batch-size when I received that error. Here is the command line with pathing and connection details modified.

C:\Users\Cory.Cundy>vpax export "C:\Temp\ModelName.vpax" "Provider=MSOLAP;Data Source=powerbi://api.powerbi.com/v1.0/myorg/WorkspaceName;Initial Catalog=ModelName" --column-batch-size 3
Extracting VPAX metadata...Unhandled exception: Microsoft.AnalysisServices.AdomdClient.AdomdErrorResponseException: Query (1, 10) Too few arguments were passed to the UNION function. The minimum argument count for the function is 2.

As far as Group By Columns, it turns out that there was one single column in the model that did have a Group By Column assigned.

Cory

@otykier
Copy link
Collaborator

otykier commented Aug 9, 2024

@albertospelta we saw a similar error as Cory when reducing the column batch size below 5.

image

I'm guessing a check is needed somewhere, to ensure that UNION is only added when the column batch contains 2 or more columns.

@albertospelta
Copy link
Collaborator

@otykier @corycundy The NuGet package version 1.7.1 and the CLI tool version 1.2.2 have just been released and include fix for the incorrect UNION syntax. Please let me know if this has resolved the issues, thanks!

@corycundy
Copy link
Author

@albertospelta, my testing on both issues has been successful. I am no longer getting either error.

One other observation that I made is that similar logic is used in the LoadTableStatistics method where 50 is hard coded.
var loopTables = tableList.SplitList(50);
This is not causing an issue for me and my guess is that since the generated query is using COUNTROWS vs. DISTINCTCOUNT that this is a often metadata operation (unless Direct Query) since there are not any SE queries generated (Import and Direct Lake) in my limited testing.

Thanks again for the quick turnaround time on this!

@otykier
Copy link
Collaborator

otykier commented Aug 15, 2024

@corycundy happy to let you know that the new Column Batch Size option is now available in Tabular Editor 3.17.0, using the latest VertiPaq Analyzer NuGet (1.7.1).

@albertospelta
Copy link
Collaborator

I am closing this as the new parameter has been added to the settings in both Tabular Editor 3 and DAX Studio. We have created a dedicated issue here to track the suggestion for adding a new parameter for the number of tables as well. Thanks @corycundy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment