Like Apache Solr has Java client Solrj, which offers a native java interface to add, update and query solr index, ElasticSearch also has Java client with which you can you can perform index operations, search queries and administrative tasks.
A typical use-case is that you are connecting to a remote ElasticSearch cluster from a Java webapp. To use Java client for ElasticSearch, you need to choose between “Node client” and “Transport Client”. First let’s see their definitions.
Node client is actually a node within the cluster. But you configured it not to hold any data, and cannot become master. (This is simple to configure by setting either node.data setting to false or node.client to true) Because it is a node, it knows the entire cluster state and configuration like which shards live in which nodes, where are all the nodes. This means when executing search query it can hit the right node directly, thus save one network hop.
The Transport Client connects to the cluster remotely using the transport module (the layer that is also used for inter-node communication within the cluster). It’s an outsider of the the cluster, i.e. it does not join the cluster, but when it’s initialized, it will be provided with one or more initial nodes to start from. If you set “client.transport.sniff” to true, the client will sniff the rest of the cluster, and add all the nodes into its list of machines to use.
The third alternative to connect to ElasticSearch is to use the REST API to perform all operations. There are a bunch of Restful-based clients written by the community for different languages, such as perl, python, ruby, php, .Net, nodejs. There’s one for Java application, called JEST which provides a POJO marshalling mechanism on indexing and queries.
Overview of the three clients
The following diagram illustrates how the three types of clients are used in various applications.
- Java Web application 1 uses “node client” which is part of the cluster (knows how to route your queries and does not need another hop). It talks to ElasticSearch via native protocol.
- Java Web application 2 uses “transport client” which is not part of the cluster. It talks to ElasticSearch via native protocol.
- Web application 3 uses “rest client” to talk to ElasticSearch via RESTFul interface over HTTP
Compare Node client and Transport Client
Each connection method has its own strengths and weaknesses. You must choose the one that best suits your requirements and environment.
- Performance is better because the query is more efficient executed since the node knows the cluster layout. Operations are automatically routed to the node(s) the operations need to be executed on,
- It moves some computation load away from ElasticSearch cluster (like merging search results from different shards that will be done locally on the node client instead of on the ES data node)
- Less configuration to connect to a cluster, just specify a cluster name and nodes will find each other and form the cluster. This will make deployment a bit easier
- Requires more resource (memory/thread/network traffic)
- Introduced dependencies (of ElasticSearch/Lucene) into your project
- Less secure as now your cluster is wide open to your web application
- Uses less resource than Node Client
- Loads faster as it doesn’t broadcasting/join cluster
- No dependencies on ElasticSearch/Lucene in your project
- A bit more secure as this decouples your application from the cluster
- Requires a bit more configurations as you need to provide at least 1-2 nodes’ address/port for transport client to start with.
- Operations are not as efficient as Node Client because most actions will probably be “two hop” operations. For example, when data resides on Node 2, but Transport client hits Node 1 first which, in turn, will route your request to Node 2, thus results in “double hop”