[FEATURE] Get rid of the dependency on OpenSearch core
Is your feature request related to a problem?
The opensearch-java library uses RestClient from OpenSearch core, bringing in the kitchen sink and causing problems such as #156.
What solution would you like?
Get rid of the dependency.
@reta and @dlvenable - I'd like your thoughts on the problem space, the downsides, and potential solutions.
The core problem I see is that this dependency couples Java clients to the server implementation. The Java version does not impact Python or JS clients for example. But, the update the JDK 11 in core forced an update on clients. OpenSearch core should be able to update JDK versions based on the needs of system administrators and developers, not based on the needs of clients. Similarly, client developers should be able to update their JDK versions based on their needs and the needs of more complicated projects.
The low-level REST client from OpenSearch core is mostly a wrapper around Apache HTTP. It adds some logic that is relevant for communicating with an OpenSearch cluster in general, such as load balancing, error/warning handling, node selection controls.
I see a few options:
- Remove the dependency from
opensearch-javaentirely. - Invert the dependencies. That is, make
OpenSearchdepend onopensearch-java. I think this approach requires thatopensearch-javacontinue to export the low-level REST client as its own library so that core does not need the high-level APIs.
Option 1) Upsides:
- Fixes the core issue.
- Does not create a dependency between core and the client.
Some downsides:
- We duplicate some of the same logic in both projects.
- Client developers must start a migration path
Option 2) Upsides:
- Fixes the core issue.
- This could produce the same library in Maven Central that client developers use. Thus, client developers can continue to use the same setup they have been using.
- No code duplication.
Some downsides:
- OpenSearch core will not depend on opensearch-java
I agree with @dlvenable statement that
The low-level REST client from OpenSearch core is mostly a wrapper around Apache HTTP. It adds some logic that is relevant for communicating with an OpenSearch cluster in general, such as load balancing, error/warning handling, node selection controls.
It indeed does not depend on anything from the core. I would like to suggest a third option:
3) Move off RestClient to separate repo (+ test module as well), for the reasons it is not dependent nor on core nor on opensearch-java
Pros:
- not attached to core or
opensearch-java - changed very rarely
- own release cycle
- no lockstep update for core and
opensearch-java(each can do that at own pace)
Cons:
- may need core and
opensearch-javareleases in case of bugfixes
The option 1) is also on the table but I would certainly be against of option 2): the core should have no deps on opensearch-java at any time or circumstances (my opinion).
I'm good with Option 3 as well.
To clarify, my option 2 was really the same from a module perspective. I was proposing that opensearch-java would produce two jars. They would be developed in the same Git repo and released together. From a module perspective core would not really be affected since it would have the same dependency chain.
I have a few other thoughts on just removing the dependency.
Clarity - We can make a clearer story for client developers. Right now, there are three Java clients. In my opinion, this is a confusing aspect of OpenSearch for end-users, especially client developers. I also experienced something similar when I was using spring-data-elasticsearch and ES 7. I was unsure what client I should be using. If we just remove this dependency, then we have a single client to offer to client developers. This should reduce friction for getting started.
Client Configuration - Using the opensearch-java client has one main transport - RestClientTransport (there is a really nice AWS one, but that is only for some situations). By dropping the RestClient, opensearch-java could instead offer options to use either Apache HTTP Client or Netty. This can help reduce dependencies for clients. Say the client is already using Netty, they can remove an unnecessary Apache dependency and vice-versa.
Thanks @dlvenable, I think there is an agreement to recommend opensearch-java as the preferred option [1]. The RestClient is used in many plugins internally (in core) and very likely outside as well (fe security), so I think it still makes sense to reuse it (but that is arguable for sure).
By dropping the RestClient, opensearch-java could instead offer options to use either Apache HTTP Client or Netty.
You can do that now with OpenSearchTransport, one of them happens to be RestClientTransport based on RestClient as you mentioned, so nothing is preventing to offer other options (if it makes sense).
[1] https://github.com/opensearch-project/OpenSearch/issues/4256#issuecomment-1241339946
I think there is an agreement to recommend opensearch-java as the preferred option
This is what I understand as well. I'm suggesting that we go one step further and not even offer these other options externally. These clients could potentially be viewed as internal components. Dropping this dependency helps with that.
You can do that now with OpenSearchTransport, one of them happens to be RestClientTransport based on RestClient as you mentioned, so nothing is preventing to offer other options (if it makes sense).
This is true as well. But we can tighten up the transport concept some by dropping the low-level client. As a client developer I don't know what I'm getting if I choose RestClientTransport versus NettyClientTransport. Specifically, what client does RestClientTransport provide? It would be clearer to provide either a NettyClientTransport or an ApacheClientTransport.
Also, what about the basic cluster management that RestClient provides? Do we drop that in a possible NettyClientTransport.
I think we derailed a bit from the subject (but in a good sense), just to sum up the on the RestClientTransport, since it is relevant: the RestClient is at 95% wrapper around Apache HttpClient 5 (or 4 for 2.x) but there are quite a few things that needed to be implemented to make Apache HttpClient 5 work (there is no way around it). This part I would like to avoid duplicating and we will have to even if we rebrand RestClientTransport and ApacheClientTransport and drop RestClient dependency.
Why should the java client be any different from JS or Python client? It shouldn't. If code duplication is to be had, so what? Either way, the long term plan to avoid code duplication is https://github.com/opensearch-project/opensearch-clients/issues/19 which clarifies the relationship between clients and server, all while avoiding duplication for all clients.
I would merge a PR that removes the dependency by duplicating whatever code needs to be duplicated and bumping the major version and creating an UPGRADING.md that explains what to do. It could also restore support for JDK 1.8. Hint ... hint :)
Why should the java client be any different from JS or Python client? It shouldn't. If code duplication is to be had, so what?
I think the main difference is that the core uses java client internally (in plugins / modules / tests fe).
I would merge a PR that removes the dependency by duplicating whatever code needs to be duplicated and bumping the major version and creating an UPGRADING.md that explains what to do. It could also restore support for JDK 1.8. Hint ... hint :)
I will prototype that
Why should the java client be any different from JS or Python client? It shouldn't.
Agreed
If code duplication is to be had, so what?
It would be nice to avoid it. But, I think clarity to the end-users is more valuable. So I think code duplication is a good trade-off here.
I would merge a PR that removes the dependency by duplicating whatever code needs to be duplicated and bumping the major version and creating an UPGRADING.md that explains what to do. It could also restore support for JDK 1.8. Hint ... hint :)
I think a migration path could be helpful there. That is opensearch-java 2.x adds the duplicated code as a new transport. But it retains the old dependency as an optional transport. Then in the major version 3.x there is no option to use the old dependency.
@dlvenable what is a scenario in which someone benefits from keeping the old dependency as an optional transport?
@wbeckler , I only think it is useful for a smoother migration.
Minor release (say 2.2) - Adds the new transport and deprecates the old Major release (say 3.0) - Removes the old transport.
A client developer who uses the library has an opportunity in 2.2 to make these changes before they are removed. They see the deprecation and they have some time to migrate. The alternative is that they want to update to 3.0 and suddenly they have to make the change then.
Why does removing the dependency affect a user of opensearch-java when we're duplicating the old transport? Is there a use of the client that depends on the source of the transport dependency? I'm asking because I'd like to find a way to remove the core dependency and avoid the major version bump.
+1 @wbeckler if there's no end-user change we can avoid the major version increment, it looks like we're not even changing the namespace
Why does removing the dependency affect a user of opensearch-java when we're duplicating the old transport?
@dblock @wbeckler this is a tricky question: we could duplicate old transport as-is, which basically means we would have at least 2 RestClient implementations, but we technically do not need to do that (see fe https://github.com/opensearch-project/opensearch-java/pull/281) and probably should not do that.
Is there a use of the client that depends on the source of the transport dependency?
We could keep the possibility to use RestClient from core (and not breaking the existing users) by making the rest-client dependency optional in 3.0.0 and still supporting it but not requiring.
I would pick restoring JDK 8 support over not doing a major version increment. I say @reta you should decide what's best and let us contribute with our invaluable thinking (TM). I think I'd merge any ready version of #281, including with breaking changes (and a major version increment).
I also opened https://github.com/opensearch-project/OpenSearch/issues/5424. Would love your comments @reta and @dlvenable!
I also opened opensearch-project/OpenSearch#5424. Would love your comments @reta and @dlvenable!
It makes sense to me: if we drop dependency on core, the plugins may migrate to opensearch-java as well, as such RestClient could be dropped at some point. Regarding the suggested plan, it might be a bit aggressive (added a note https://github.com/opensearch-project/OpenSearch/issues/5424#issuecomment-1334412578)