You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I propose that a similar process model be included in your explore package related to three-dimensional modeling. So, the way this would work is very similar to how you essentially captured the essence of an entire R package (xgboost) with one function!
There is an R package called svgViewR that essentially allows a user to model multivariate data in three dimensions using html-based interactions. Using a concept called MDS or multidimensional scaling, a single value is generated after which it is colorized by gradient, then plotted.
While this algorithm is remarkable, it would be even more compelling if it were captured in a single model and resulting plot using ONE function. The algorithm in toto can be found on page 15 of the current version of svgViewR, an R package currently available on CRAN.
To facilitate this effort, I worked with a colleague of mine to replicate what is referred to as the pair distance, called pdist in the svgViewR documentation, converting it to a separate function. This function, called pairDist, can be found in the quickcode package, also stored on CRAN. Since including this function would create a dependency in your code, it's up to you whether you want to use this function or use the pair distance mathematics provided in svgViewR.
I am envisioning that the output be represented as a list object which contains the following artifacts:
A data frame containing the original variables used in the analysis along which is appended the following additional variables:
pdist
colHex
color
cluster
If possible, the html file as a ggplot object which may or may not be possible to include
To facilitate these extra proposed (4) data elements added to the output, the following pseudo-code is provided:
library(DescTools)
library(xlsx)
Optionally, you could convert the 'points3d' data object to an Excel file, or alternatively, include an argument that would control for this:
write.xlsx(x = points3d, file = "filepath/points3d.xlsx", row.names = FALSE, sheetName = "points3D")
To facilitate complete transparency on an understanding of what is being proposed, the following explanation breaks down the code on page 15 into bullet points as I have already thoroughly studied the svgViewR package and its associated code. These are my remarks:
Library Inclusion:
The code includes the svgViewR library, which is likely used for creating interactive 3D scatter plots in SVG (Scalable Vector Graphics) format.
Data Generation:
Generates a matrix points3d with 300 rows and 3 columns.
Each column is populated with random numbers generated from normal distributions with different standard deviations (3, 2, and 1).
SVG Initialization:
Opens a new SVG file named 'plot_static_points.html' for writing.
Distance Calculation:
Computes the Euclidean distance from each point in points3d to the mean point of all points.
The distances are stored in the variable pdist.
Color Mapping:
Defines a color gradient from red to blue using colorRampPalette.
col_grad holds the gradient with 50 colors.
Color Assignment:
Calculates colors for each point based on their distance using linear interpolation.
The colors are assigned to the variable col.
SVG Plotting:
Plots the 3D points in the SVG file using svg.points.
The color of each point is determined by the previously calculated col.
SVG Frame Initialization:
Initializes an SVG frame for the 3D points using svg.frame.
SVG File Closing:
Closes the SVG file with svg.close().
In summary, this code generates a 3D scatter plot with 300 points, each having random coordinates. The color of each point is determined by its distance from the mean point, and the plot is saved in an SVG file named 'plot_static_points.html'. The use of the svgViewR library suggests that the resulting SVG file can be interactive, allowing users to manipulate and explore the 3D plot.
There is one more remarkable aspect to this function if you were to accept this idea - the html generated is a single self-contained file. This makes it incredibly easy to distribute!
I know there is a lot here but Roland I believe that creating a single function that can be used to model any number of numeric variables three-dimensionally would be worth the effort to create and add to the explore package. This function would do for three-dimensional modeling what your explain_xgboost function did for feature engineering.
I can answer any questions you may have regarding this proposal.
Warmest regards,
Brice
The text was updated successfully, but these errors were encountered:
I propose that a similar process model be included in your explore package related to three-dimensional modeling. So, the way this would work is very similar to how you essentially captured the essence of an entire R package (xgboost) with one function!
There is an R package called svgViewR that essentially allows a user to model multivariate data in three dimensions using html-based interactions. Using a concept called MDS or multidimensional scaling, a single value is generated after which it is colorized by gradient, then plotted.
While this algorithm is remarkable, it would be even more compelling if it were captured in a single model and resulting plot using ONE function. The algorithm in toto can be found on page 15 of the current version of svgViewR, an R package currently available on CRAN.
To facilitate this effort, I worked with a colleague of mine to replicate what is referred to as the pair distance, called pdist in the svgViewR documentation, converting it to a separate function. This function, called pairDist, can be found in the quickcode package, also stored on CRAN. Since including this function would create a dependency in your code, it's up to you whether you want to use this function or use the pair distance mathematics provided in svgViewR.
I am envisioning that the output be represented as a list object which contains the following artifacts:
A data frame containing the original variables used in the analysis along which is appended the following additional variables:
pdist
colHex
color
cluster
If possible, the html file as a ggplot object which may or may not be possible to include
To facilitate these extra proposed (4) data elements added to the output, the following pseudo-code is provided:
library(DescTools)
library(xlsx)
points3d = as.data.frame(points3d)
points3d$pdist = pdist
points3d$colHex = col
points3d$col = HexToCol(points3d$colHex)
points3d$col = as.factor(points3d$col)
points3d$cluster = unclass(points3d$col)
Optionally, you could convert the 'points3d' data object to an Excel file, or alternatively, include an argument that would control for this:
write.xlsx(x = points3d, file = "filepath/points3d.xlsx", row.names = FALSE, sheetName = "points3D")
To facilitate complete transparency on an understanding of what is being proposed, the following explanation breaks down the code on page 15 into bullet points as I have already thoroughly studied the svgViewR package and its associated code. These are my remarks:
Library Inclusion:
svgViewR
library, which is likely used for creating interactive 3D scatter plots in SVG (Scalable Vector Graphics) format.Data Generation:
points3d
with 300 rows and 3 columns.SVG Initialization:
Distance Calculation:
points3d
to the mean point of all points.pdist
.Color Mapping:
colorRampPalette
.col_grad
holds the gradient with 50 colors.Color Assignment:
col
.SVG Plotting:
svg.points
.col
.SVG Frame Initialization:
svg.frame
.SVG File Closing:
svg.close()
.In summary, this code generates a 3D scatter plot with 300 points, each having random coordinates. The color of each point is determined by its distance from the mean point, and the plot is saved in an SVG file named 'plot_static_points.html'. The use of the
svgViewR
library suggests that the resulting SVG file can be interactive, allowing users to manipulate and explore the 3D plot.There is one more remarkable aspect to this function if you were to accept this idea - the html generated is a single self-contained file. This makes it incredibly easy to distribute!
I know there is a lot here but Roland I believe that creating a single function that can be used to model any number of numeric variables three-dimensionally would be worth the effort to create and add to the explore package. This function would do for three-dimensional modeling what your explain_xgboost function did for feature engineering.
I can answer any questions you may have regarding this proposal.
Warmest regards,
Brice
The text was updated successfully, but these errors were encountered: