Java Elasticsearch Integration: Advanced Search Implementation Guide with Code Examples

java

Java Elasticsearch Integration: Advanced Search Implementation Guide with Code Examples

Learn Java Elasticsearch integration with real-world code examples. Master document indexing, advanced search queries, aggregations, and production-ready techniques. Get expert tips for building scalable search applications.

Feb 10, 2025

Java Elasticsearch Integration: Advanced Search Implementation Guide with Code Examples

Java Elasticsearch Integration enables powerful search capabilities in applications. I’ll share my experience implementing these techniques in production environments.

The foundation starts with proper client configuration. In modern Java applications, the high-level REST client is the recommended approach:

RestHighLevelClient client = new RestHighLevelClient(
    RestClient.builder(new HttpHost("localhost", 9200, "http"))
    .setRequestConfigCallback(requestConfigBuilder -> 
        requestConfigBuilder.setConnectTimeout(5000)
        .setSocketTimeout(60000))
    .setMaxRetryTimeoutMillis(60000)
);

Effective document indexing is crucial for search performance. I recommend using the bulk API for large datasets:

public class BulkIndexer {
    private static final int BATCH_SIZE = 1000;
    
    public void indexBatch(List<Document> documents) {
        BulkRequest bulkRequest = new BulkRequest();
        for (Document doc : documents) {
            bulkRequest.add(new IndexRequest("index-name")
                .id(doc.getId())
                .source(convertToMap(doc)));
            
            if (bulkRequest.numberOfActions() >= BATCH_SIZE) {
                executeBulkRequest(bulkRequest);
                bulkRequest = new BulkRequest();
            }
        }
        
        if (bulkRequest.numberOfActions() > 0) {
            executeBulkRequest(bulkRequest);
        }
    }
}

Search implementation requires careful consideration of query construction. Here’s an advanced search implementation I’ve used:

public SearchResponse performSearch(SearchParams params) {
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder()
        .query(QueryBuilders.boolQuery()
            .must(QueryBuilders.multiMatchQuery(params.getQuery())
                .field("title", 2.0f)
                .field("content")
                .type(MultiMatchQueryBuilder.Type.BEST_FIELDS))
            .filter(QueryBuilders.termQuery("status", "active")))
        .from(params.getOffset())
        .size(params.getLimit())
        .sort("_score", SortOrder.DESC)
        .sort("timestamp", SortOrder.DESC);

    return client.search(new SearchRequest()
        .indices(params.getIndices())
        .source(sourceBuilder), 
        RequestOptions.DEFAULT);
}

Aggregations provide valuable insights. Here’s a complex aggregation example:

public AggregationResults analyzeData() {
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder()
        .size(0)
        .aggregation(AggregationBuilders.terms("categories")
            .field("category.keyword")
            .subAggregation(AggregationBuilders.avg("avg_price")
                .field("price"))
            .subAggregation(AggregationBuilders.dateHistogram("sales_over_time")
                .field("timestamp")
                .calendarInterval(DateHistogramInterval.MONTH)));

    SearchResponse response = client.search(new SearchRequest()
        .source(sourceBuilder), 
        RequestOptions.DEFAULT);
    
    return processAggregations(response.getAggregations());
}

Index management is essential for maintaining optimal search performance:

public class IndexManager {
    public void createIndexWithSettings(String indexName) {
        CreateIndexRequest request = new CreateIndexRequest(indexName);
        request.settings(Settings.builder()
            .put("index.number_of_shards", 3)
            .put("index.number_of_replicas", 2)
            .put("index.refresh_interval", "1s")
            .put("index.analysis.analyzer.custom_analyzer.type", "custom")
            .put("index.analysis.analyzer.custom_analyzer.tokenizer", "standard")
            .putList("index.analysis.analyzer.custom_analyzer.filter", 
                "lowercase", "asciifolding"));

        XContentBuilder mapping = XContentFactory.jsonBuilder()
            .startObject()
                .startObject("properties")
                    .startObject("title")
                        .field("type", "text")
                        .field("analyzer", "custom_analyzer")
                    .endObject()
                .endObject()
            .endObject();

        request.mapping(mapping);
        client.indices().create(request, RequestOptions.DEFAULT);
    }
}

Real-time search capabilities require efficient query building:

public class QueryBuilder {
    public BoolQueryBuilder createSearchQuery(SearchCriteria criteria) {
        BoolQueryBuilder query = QueryBuilders.boolQuery();
        
        if (criteria.hasKeyword()) {
            query.must(QueryBuilders.multiMatchQuery(criteria.getKeyword())
                .field("title", 3.0f)
                .field("description", 2.0f)
                .field("content")
                .fuzziness(Fuzziness.AUTO));
        }
        
        if (criteria.hasFilters()) {
            criteria.getFilters().forEach((field, value) ->
                query.filter(QueryBuilders.termQuery(field, value)));
        }
        
        if (criteria.hasDateRange()) {
            query.filter(QueryBuilders.rangeQuery("timestamp")
                .from(criteria.getStartDate())
                .to(criteria.getEndDate()));
        }
        
        return query;
    }
}

Error handling and resilience are crucial in production environments:

public class ElasticsearchOperations {
    private static final int MAX_RETRIES = 3;
    private static final long RETRY_DELAY = 1000;

    public <T> T executeWithRetry(Supplier<T> operation) {
        int attempts = 0;
        while (attempts < MAX_RETRIES) {
            try {
                return operation.get();
            } catch (ElasticsearchException e) {
                attempts++;
                if (attempts == MAX_RETRIES) {
                    throw e;
                }
                try {
                    Thread.sleep(RETRY_DELAY * attempts);
                } catch (InterruptedException ie) {
                    Thread.currentThread().interrupt();
                    throw new RuntimeException(ie);
                }
            }
        }
        throw new RuntimeException("Operation failed after " + MAX_RETRIES + " attempts");
    }
}

Connection management and cleanup are important considerations:

public class ElasticsearchClient implements AutoCloseable {
    private final RestHighLevelClient client;
    
    public ElasticsearchClient(String hostname, int port) {
        this.client = new RestHighLevelClient(
            RestClient.builder(new HttpHost(hostname, port, "http"))
        );
    }
    
    @Override
    public void close() throws IOException {
        if (client != null) {
            client.close();
        }
    }
}

These integration techniques form a robust foundation for building scalable search applications. The key is to balance performance, reliability, and maintainability while implementing these patterns.

Remember to optimize your indexing strategies, implement proper error handling, and monitor your Elasticsearch cluster’s health. Regular maintenance and performance tuning are essential for long-term success.

Consider implementing connection pooling, circuit breakers, and monitoring solutions to ensure your search infrastructure remains stable under load. These practices have served me well in production environments.