Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bag <> Matrix Transformations #195

Open
akunft opened this issue Apr 22, 2016 · 1 comment
Open

Bag <> Matrix Transformations #195

akunft opened this issue Apr 22, 2016 · 1 comment

Comments

@akunft
Copy link
Contributor

akunft commented Apr 22, 2016

This issue should be used to discuss the transformation methods for bag <> matrix and bag <> vector.

The initial prototype and ongoing effort is tracked in PR #194.

I focus on matrix <> bag transformations now, but vector <> bag will follow.

In the description below, A is type bound to spire.Numeric.

Bag to Matrix

  • DataBag[Product]
    Transforms a bag of Products to a matrix where the number of rows is defined by the number of products in the bag and the number of columns is defined by the arity of product type. The elements in the product have to be of the same type and are bound by spire.Numeric (we have to cast explicitly here).
  • DataBag[Vector[A]]
    Analog to DataBag[Product], except that we can enforce the numeric type in the signature.
  • DataBag[(Int, Int, A)] with explict # rows and cols
    Transforms a bag of (rowIndex, colIndex, value) triplets into a (possibly) sparse matrix. The dimensions of the matrix is explicitly defined by the user.
  • DataBag[(Int, Int, A)] with implicit # rows and cols
    Analog to the previous transformation, except that the number of rows and cols is defined by the largest row and column indexes among the triplets in the bag.
  • DataBag[Product] / DataBag[Vector[A]] with key and value extractor methods [not there]
    I would like to have this transformations, where the user can define two functions that extract for each product/vector in the bag the key and the value. As it is easily possible for the vector (as we have the numeric type), it gets very ugly in the case of product, as the user has to work with products and also is responsible for casting the elements in the functions.

Matrix to Bag

  • matrix to DataBag[Vector[A]]
    Transforms a matrix to a bag of vectors.
  • matrix to DataBag[(Int, Vector[A])]
    Transforms a matrix to a bag of (rowIndex, Vector[A]).
  • matrix to DataBag[(B, Vector[A])] [not there]
    This seems interesting for me in case someone wants to have a labeled vector, where B can be an arbitrary type defined by, e.g. a function label[B]: (rowIdx: Int, v: Vector[A]) => B

Open Questions

  • Naming of the methods (type erasure omits using the same method names sadly)
  • Is there need for additional transformation methods?
  • I would suggest to move the methods in matrix and bag once we agreed on the API instead of keeping them in a separated Transformations object.

Vector to Bag / Bag to Vector

Once I have an initial version of these transformations I'll add them here.

@akunft
Copy link
Contributor Author

akunft commented May 4, 2016

Merged initial transformations in #194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant