Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Unpack & Repack by reducing constant polynomials. #230

Open
fionser opened this issue Jul 10, 2018 · 3 comments
Open

Improve Unpack & Repack by reducing constant polynomials. #230

fionser opened this issue Jul 10, 2018 · 3 comments

Comments

@fionser
Copy link
Contributor

fionser commented Jul 10, 2018

Motivation

It seems in the recryption procedure, the conversion time from coefficient format to DoubleCRT is quite large. I hope to reduce the number of necessary plaintext polynomials, especially in Unpack & Repack procedure.

Method

  • My main idea is that the constant polynomials in the linear transform follows some algebraic property, i.e., alpha_j = frobenius(alpha_0, j). As a result, we only need to store the first poly alpha_0 and to compute alpha_j on-fly.
  • By using the above property, it seems we can also reduce the number of ctxt-plain multiplications in some linear transform procedures.

Example

Take the Unpack method defined in class unpack_pa_impl as an example.

static void apply(const EncryptedArrayDerived<type>& ea, const CtPtrs& unpacked,
                    const Ctxt&ctxt,const std::vector<zzX>& unpackSlotEncoding)
  {
    long d = ea.getDegree(); // size of each slot

    //ctxt.cleanUp();

    // Convert the unpack constants to doubleCRT
    std::vector< std::shared_ptr<DoubleCRT> > coeff_vector(d);
    for (long i = 0; i < d; i++) {
      coeff_vector[i] = std::shared_ptr<DoubleCRT>(new
        DoubleCRT(unpackSlotEncoding[i],ctxt.getContext(),ctxt.getPrimeSet()));
    }
    // Compute the d Frobenius automorphisms of ctxt (use multi-threading)
    std::vector<Ctxt> frob(d, Ctxt(ZeroCtxtLike, ctxt));
    NTL_EXEC_RANGE(d, first, last)
	for (long j = first; j < last; j++) { // process jth Frobenius 
        frob[j] = ctxt;
        frob[j].frobeniusAutomorph(j);
        frob[j].cleanUp();
        // NOTE: Why do we apply cleanup after the Frobenus?
	}
    NTL_EXEC_RANGE_END

    // compute the unpacked ciphertexts: the j'th slot of unpacked[i]
    // contains the i'th coefficient from the j'th clot of ctxt
    Ctxt tmp1(ZeroCtxtLike, ctxt);
    for (long i = 0; i < unpacked.size(); i++) {
        *(unpacked[i]) = frob[0];
        unpacked[i]->multByConstant(*coeff_vector[i]);
        for (long j = 1; j < d; j++) {
            tmp1 = frob[j];
            tmp1.multByConstant(*coeff_vector[mcMod(i + j, d)]);
            *(unpacked[i]) += tmp1;
        }
    } // NOTE: why aren't we using multi-threading here?
  }

The unpacksSlotEncoding here follows unpackSlotEncoding[ j ] = Frobenus(unpackSlotEncoding[0], j). My refined version should work as follows

    zzX unpackEncoding;
    std::vector<Ctxt> unpacks(d, ctx);
    DoubleCRT alpha0(unpackSlotEncoding[0], ctx.getContext(), ctx.getPrimeSet()); 
    for (long i = 0; i < d; ++i) {
        auto alpha{alpha0}; 
        {
            alpha.automorph(NTL::PowerMod(t, mcMod(i, d), m)); // frobenius 
        }
        unpacks[i].multByConstant(alpha); // we can take the multiplication first, since frobenius is commute to multiplication 
        Ctxt ans(unpacks[i]);
        for (long j = 1; j < d; ++j) {
            Ctxt tmp(unpacks[i]); // we can use hoisting here
            tmp.frobeniusAutomorph(j);
            ans += tmp;
        }
        unpacks[i] = ans;
    }
  • I have ran some codes and confirmed that my code can produce the same results as the unpack function.
  • We only need to store one constant polynomial, alpha0, and apply hoisting.
@fionser
Copy link
Contributor Author

fionser commented Jul 10, 2018

I am not 100% sure the math; if you have any idea please let me know.

@shaih
Copy link
Collaborator

shaih commented Jul 11, 2018

@fionser it will take me a little while to get to it. Is this something that you want integrated very soon, or is it okay if I look at it toward the end of the summer?

@fionser
Copy link
Contributor Author

fionser commented Jul 12, 2018

@shaih No problem. It is not a hurry. Also, I found my PR actually is NOT optimal for performance.
I found that, my PR do reduce the number of constant polynomials, and number of multiplications.
BUT, it increases the number of frobeniusAutomorph; which is very expensive even with hoisting...

We can exchange the order of the inner-loop and outer-loop (a similar logic to the current extractDigits implementation); at the cost of increasing the number of multiplications.

patrick-schwarz pushed a commit to patrick-schwarz/HElib that referenced this issue Dec 6, 2019
* Doxygen documentation for the Binary Arithmetic APIs

* New API and function testing code into GTest

* Example code on how to use the APIs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants