Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Squid #369

Open
ggevay opened this issue Jun 12, 2018 · 3 comments
Open

Integrate Squid #369

ggevay opened this issue Jun 12, 2018 · 3 comments

Comments

@ggevay
Copy link
Contributor

ggevay commented Jun 12, 2018

Edit: We have resolved some more issues!

It would be really good to integrate Squid into our compiler pipeline, because its quasiquote-based API would probably make it easier to write tree transformations. I tried to do it [1], but unfortunately there were a number of issues. You can see the details of the issues in comments in the code [1], but here is a list:

Resolved issues:

  • Squid expects the tpt of ValDefs to be filled, but we don't fill them [3] (resolved by an extra transform pass (addValAndVarDefTpts) before passing the tree to Squid)
  • Implicit modifiers were falling off from ValDefs, but Lionel fixed this [6].
  • Generic inner classes were not working, but Lionel fixed this [7].
  • Squid expects implicits to be already inferred [4], but we can't really infer them before passing code to Squid. The solution suggested in [4] works (see addImplicitPlaceholders).
  • Path-dependent types were not supported by Squid [5]. This is an issue, because, e.g., TypeTag is path-dependent on the universe. Lionel fixed this in Squid.
  • The tests in emma-examples-* and emma-lib-* (e.g. TransitiveClosureFlinkSpec) rely on accesses to local variables, which is not supported by Squid by default. However, the solution suggested by Lionel in the below comment solved this issue.
  • DefDefs are not supported in Squid, which is a problem, because we represent control flow with DefDefs (DSCF). I resolved this by doing a dscfInv/dscf before/after Squid.
  • this references to enclosing classes or objects in LogRegFlinkSpec, SGDFlinkSpec, LinRegFlinkSpec, StatsFlinkSpec (and many other unit tests). Resolved by the workaround in Integrate Squid #369 (comment) (2bd6cbc)

Not yet resolved issues:

Probably Emma bugs:

  • TransitiveClosureFlinkSpec fails with squid.quasi.EmbeddingException$Unsupported: Unsupported feature: Reference to local method 'doWhile$m2'. But dscfInv should have removed this, so this is probably a bug in dscfInv.

Probably Squid missing features:

  • Existential types in NaiveBayesFlinkSpec, KMeansFlinkSpec, TokenizeFlinkSpec. The most prevalent is Array[_], which appears due to implicitly resolving CanBuildFrom in code such as line.split(",").map(_.toDouble).
    [2] says that "using type aliases may circumvent this limitation".
    We also have x$1.label.type forSome { val x$1: org.emmalanguage.lib.ml.LDPoint[Long,Double] }, but I'm not really sure how this type appears.
  • [minor] squid.quasi.EmbeddingException$Unsupported: Unsupported feature: org.emmalanguage.api.FlinkDataSet.memoizeTypeInfo[String](implicitly[scala.reflect.runtime.universe.TypeTag[String]], org.apache.flink.api.scala.`package`.createTypeInformation[String])
    This issue is not really a big problem, as these calls come in only near the end of our pipelines, so we can just avoid using Squid at that stage.

Probably Squid bugs (or maybe we give Squid some invalid trees):

  • EvalFlinkSpec, LogRegFlinkSpec, SGDFlinkSpec, LinRegFlinkSpec, BaseCodegenIntegrationSpec.'Updated tmp sink' fail with an undefined variable issue. In EvalFlinkSpec, in the tree that we give to Squid we have a var named anf$m25, and a reference to it somewhere. Squid seems to "rename" this var to anf$m25$m1 but doesn't change the reference. In the other test, the variable get renamed from p$p$r2 to p$p$r2$r1.
  • [minor] NGramsFlinkSpec fails with
java.lang.Error: bad constant value: String of class class scala.reflect.internal.Types$ClassNoArgsTypeRef
	at scala.reflect.internal.Constants$Constant.<init>(Constants.scala:52)
	at scala.reflect.internal.Constants$Constant$.apply(Constants.scala:34)
	at scala.reflect.internal.Constants$Constant$.apply(Constants.scala:272)
	at squid.ir.ScalaTyping$class.constType(ScalaTyping.scala:107)
	at squid.ir.SimpleAST.constType(SimpleAST.scala:20)
<omitted many more lines>
  • [minor] BaseCodegenIntegrationSpec.CSV fails with
(List(),Stream(java.lang.String*)) (of class scala.Tuple2)
scala.MatchError: (List(),Stream(java.lang.String*)) (of class scala.Tuple2)
	at squid.quasi.ModularEmbedding.squid$quasi$ModularEmbedding$$processArgs$1(ModularEmbedding.scala:337)
<omitted many more lines>

TODO:

  • We should consider Lionel's suggestion to implement the Base interface:
    "Note that I'm still of the opinion (as expressed earlier) that in your case, it would be more appropriate to not convert Emma's IR into a Squid IR, but instead to provide a new implementation of the Base interface so Squid can use Emma's own IR behind the scenes. Have you considered that? It represent a tighter coupling with Squid, but I could help with making it work and evolve as needed."
  • An even more extensive test of the integration would be to call Squid in compileSpecPipelines. (Or just comment in the testSquid line in Compiler.scala)

Other notes:

  • There is a list of Scala features that Squid supports [2].

[1] https://github.com/ggevay/emma-1/tree/squid
[2] https://github.com/epfldata/squid/blob/master/doc/reference/Quasiquotes.md#supported-scala-features
[3] #234
[4] epfldata/squid#55 (comment)
[5] epfldata/squid#57 (comment)
[6] epfldata/squid#55
[7] epfldata/squid#56

@LPTK
Copy link

LPTK commented Jun 12, 2018

Note: to support local variables you can always override ModularEmbedding's unknownFeatureFallback method, as is done exactly here.

@LPTK
Copy link

LPTK commented Jun 13, 2018

The last "missing feature" issue you mention (squid.quasi.EmbeddingException$Unsupported: Unsupported feature: org.emmalanguage.api.FlinkDataSet.memoizeTypeInfo[String](..., ...)) is strange. In principle what's inside the arguments should not be the problem, otherwise the error would have been reported for them. There is apparently something specific and unhandled going on with the memoizeTypeInfo call.

this references to enclosing classes or objects in LogRegFlinkSpec, SGDFlinkSpec, LinRegFlinkSpec, StatsFlinkSpec (and many other unit tests).

There is also a (unfortunately rather hacky) workaround for this, similar to the one for local variables; see here.

[minor] EvalFlinkSpec and BaseCodegenIntegrationSpec.'Updated tmp sink' fail with an undefined variable issue. In EvalFlinkSpec, in the tree that we give to Squid we have a var named anf$m25, and a reference to it somewhere. Squid seems to "rename" this var to anf$m25$m1 but doesn't change the reference.

Strange. Perhaps due to the variable having $ in the name, about which Scala's freshName implementation may be confused? (Not too likely, though.)

[minor] BaseCodegenIntegrationSpec.CSV fails

Nice catch! Seems to be due to the vararg being a Java vararg (differently encoded than Scala 😅)
which we omitted to handle.

Don't hesitate to open issues on the Squid tracker for this kind of bugs 🙂

@ggevay
Copy link
Contributor Author

ggevay commented Jun 13, 2018

OK, I resolved the this issue by mostly copy-pasting the solution that you linked above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants