-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] use a class loader to bring in dependancy classes for a connector #6043
Comments
A 'fat jar' - (a term I've used!) can be a bit of a misnomer in that not necessarily all dependencies are included. For example we can have the core egeria platform as 'provided' and other dependencies as compile/run. That way anything in the core platform isn't replicated, and we only get the delta - the additional classes unique to the connector in question. To gain /ISOLATION/ - I think the key point here - we can shade these additional jars so that they don't clash - at the expense of duplication. This is an enforced BUILD TIME choice A custom class loader can make choices at runtime over where to source classes, what to load - so it is a lot more flexible. We would of course still need to decide on the algorithm used. We can emulate shading as we wish as part of the classloader. For example we may have a connector-local lib path that we pick in preference & always use those classes. We'd also still want to package up dependencies, which can be done with the existing maven/gradle tools. So classloader = more flexible, but many more choices, alogorithm to be decided (what is the search path). medium-sized shaded jar = build time only, more restrictive, though simple I do think it's an excellent approach to evaluate (and note spring uses it's own custom classloader so when we get to dynamically loadable OMASs (something I think we should ultimately do, vs connectors) there may be a difference in terms of component scan for the spring rest binding? |
Maybe we first need to be clear on what jars / dependencies we are including under this proposed umbrella? TL;DR Details...
It would frankly be easier for someone that wants to use a connector to just build their own "fat jar", particularly as any new release of any of these components would mean that they need to go through the entire investigation loop again to figure out what libraries have changed (if any), which versions are tested to be interoperable, etc, etc. Having the connector designer make these choices up-front (and "lock-them-in" at build time) means that a connector user will typically just need to go and download a single file -- much, much easier. (This doesn't prevent someone who wants to go to the trouble of managing all these jar files from doing the other approach, but it makes life easy for simple adopters rather than forcing every adopter to manage the nightmare of all of this complexity themselves.) Then again, if we're talking about this classloader approach for only the Egeria dependencies that a connector has, I don't really understand why these are compiled in to the connector in the first place (not simply marked |
@cmgrote Thanks for these considerations. I will ask for more details from the people who suggested this. |
Hello @cmgrote and @davidradl The trouble starts since Apache is using dependencies that are already present in Egeria base, but in different versions. So my idea was to have a custom classloader for each Connector in Atlas that would allow to load in external dependencies without interfering with the main Egeria product. |
Definitely we should do something to avoid the potential conflicts -- I'm not an expert in this domain, but I would hope that in an ideal world any jar file (whether Apache Atlas or Egeria) that includes dependencies that it doesn't own itself should have some way of making those dependencies within its jar file specific to that jar file. My impression was that shading is a way to do this (@planetf1 ?) What seems to be the problem here is that neither Apache Atlas nor Egeria has shaded its non-owned dependencies. As a result, those that overlap but at different versions are therefore causing a conflict. While we could therefore split things up so that some jars are included and others not, this doesn't sound very scalable to me: today we know what that is for Apache Atlas, but the overlap could be greater or less for any other technology / connector, and that list of overlap could change for each independently-managed lifecycle of each technology -- trying to keep a handle on all of that over time sounds like it's going to be messy... I think if either (or in the ideal world, both) were to shade its dependencies then there would no longer be a conflict? I'd therefore be more in favour of being a "good citizen" ourselves and shading such dependencies -- but I'm not sure what the broader implications of such an approach are... (I'm mostly pushing in this direction thinking to the world of containers, where each container is intended to be a self-contained atomic "thing" and if you have some overall solution that needs multiple components that say each run on Python you just have Python installed inside each of those containers -- you don't have some "shared" underlying Python installation that each container makes use of, they're entirely independent. For me these jar files should be managed the same way?) |
Thanks @carstenmichel @cmgrote . I am seeing 2 approaches.
I wonder for a connector if we can do approach 1) then we should, if there are more complex considerations, then approach 2) should be used. By using approach 1) as much as we can, we reduce the connectors we need to handle using approach 2. thoughts @cmgrote @carstenmichel @planetf1 ? |
I don't know enough about shading, but my hopeful assumptions would be that it is a) something decided / designed to be used entirely at build time, and b) uses some kind of "namespace"-like concept to keep the shaded dependencies "unique". If (a) is correct it has the major advantage of not placing any complexity or additional knowledge / need to download / configure paths / etc on the runtime (operator / user). If (b) is correct then as long as we use a namespace that matches the connector name itself (or its ConnectorProvider GUID even) then it is pretty well guaranteed to be unique (if not there would be other clashes anyway). So while the approach places some extra effort on the developer, my sense is that's worth it by removing any extra effort on the operator / user -- again, if my assumptions are correct... |
My understanding of shading is the same @cmgrote The maven docs for the shade plugin at https://maven.apache.org/plugins/maven-shade-plugin/ gives a little info, referring to renaming of some, or all (so it can be partial) https://maven.apache.org/plugins/maven-shade-plugin/examples/class-relocation.html specifically refers to renaming, which 'moves' the packages over to a new, configurable, packagename We can use shading in part or in full, and alongside other techniques like use of 'provided' for the common egeria dependencies (or anything else critical) to reduce the size of shaded jars. We would need to use shading in our own jars also (such as chassis) The key issue with shading is it is build-time, not runtime. This IS the biggie. We save on runtime and code complexity, but we will have less ultimate flexibility. Simplicity is appealing though |
With a custom class loader we can do something similar to shading, by allowing a dependency to pickup favoured classes from a configurable location - ie further up the search path, or via a new name. This means:
My current feeling is that the additional complexity of the dynamic classloader approach is not yet argued enough to improve on the simplicity of the shading approach. The effort to enable shading+provided is pretty low. Note also we didn't initially do shading as we were considering shading all classes. By use of 'provided' we can reduce this to a much smaller scope, meaning duplications are less a concern |
Some interesting articles
|
I assume we are talking about the connector classloader mentioned here https://docs.oracle.com/cd/E19830-01/819-4721/beade/index.html |
Hi @carstenmichel , |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions. |
I'm going to reopen this. In recent months I've hit, and seen others hit, multiple conflicts with connectors. In particular the HMS connector with it's complex and large dependency chain (50k classes) is a major offender.
but none of these are complete solutions. Spring classloading is more complex than base jvm, but does offer customization We need to have improved documentation - for developers, operators, but also I think we need to look again at classpath isolation. opening and assigning to myself for now |
The issue above seems common in application servers, or where plugin technologies are used. This is similar to egeria's use of connectors. Firstly we need to place plugins out of the normal spring/java class loading path - this way egeria/spring code using the normal classloading approach won't find them. We need a 'parent-last' classloader. This will load the class required by looking (first) in a custom location (configured by property, ie lib or extralib). If it does not find it, it will delegate up to the parent. Note parent-last is at odds with the java spec, where delegation starts from the root ie ultimate parent first, but it is how containers like tomcat work When we load a connector we use this new classloader. We also configure the thread classloader to use it. Attempts to resolve a class within the connector resolve 'locally' - ie within the connector's jar, and only for classes that can not be found we defer to the parent & hence default behaviour Of course if we then have many connectors & place them all in the same location, we'll still could get conflicts between the connectors, but we control that search in our code - so we'll look in an uber jar first. then - a connector specific libpath (by name) before finally delegating to parent. It doesn't appear that spring will stop us doing this - but the approach needs validation and implementation. It could be local within the connector framework |
This may be an opportunity to also add some additional audit logging at the time, depending on what we can see |
I likely will not be able to make this change now, but please do get in touch if you wish to discuss any ideas. I do think it would be valuable as more varied connectors are developed. |
I agree on its value and have added it to my list ... |
Is there an existing issue for this?
Current Behavior
We have talked of creating a fat jar for connectors. We are now not favouring this approach and want the connectors to supply minimal jar files and find the dependant jars in the environment
Expected Behavior
Have a custom class loader that loads the classes for a connector, those classes should be only used by that connector and those classes would not effect any other connector. This seems to be a pattern that has been used successfully in the past.
There still might be a need for shading if the connector classes or its dependancies clash with the Egeria classes, but there would not be a need to resolve dependancies between connectors.
Alternatives
leave asis, shading or fat jars
Any Further Information?
No response
Would you be prepared to be assigned this issue to work on?
The text was updated successfully, but these errors were encountered: