How to Improve API Performance in Spring Boot Applications

May 13, 2026··

13 min read

How to Improve API Performance in Spring Boot Applications

Overview

This article is a thorough, practical, and technical guide to improving API performance in Spring Boot applications. It covers the performance problem space, theoretical foundations, measurement and profiling, concrete optimizations at every layer (network, serialization, application, persistence, JVM and container), relevant Spring-specific features and code examples, testing and monitoring, common pitfalls, and future trends.

Use this as both a reference and a checklist for diagnosing and improving real-world API performance.

Background & history
Key performance concepts
Measurement and benchmarking (how to find bottlenecks)
Request/response path: where to optimize
Network & HTTP-level optimizations
Serialization & payload size
Concurrency, thread pools & async models
Reactive/non-blocking architectures (WebFlux, R2DBC)
Database and persistence optimizations (JPA/Hibernate)
Caching strategies (in-memory, distributed)
JVM and GC tuning for containers
Container & deployment considerations
Observability, profiling & performance testing
Resilience & scaling patterns
Practical checklists and example configs
Future trends
Conclusion

Background & history

Spring Boot simplified Spring application development by bundling dependencies, auto-configuration, embedded servers, and sensible defaults. As microservices and cloud-native deployments have become common, API performance became a central concern: latency, throughput, tail-latency, memory footprint, and cold-start time.

Historically, bottlenecks were often obvious (slow DB queries, insufficient threads). As systems became distributed and high-scale, performance tuning required thinking holistically: non-blocking I/O, reactive programming, connection pool tuning, serialization formats, JVM behavior inside containers, service mesh latency, observability and automated optimization.

Key performance concepts

Latency vs Throughput vs Tail-latency. Reducing median latency is useful but reducing 95/99th percentile latency often matters more for user experience.
Blocking vs Non-blocking I/O. Thread-per-request (blocking) fits many workloads; non-blocking (reactive) reduces thread overhead for highly concurrent I/O-bound services.
CPU-bound vs I/O-bound. The tuning strategies differ: more threads or parallelism benefit CPU-bound workloads to a point; asynchronous I/O helps I/O-bound workloads.
Backpressure and resource exhaustion. Always consider limits (thread pools, DB connections) and protect them with bounded queues/bulkheads to avoid cascading failures.
Measurement first. Optimize based on measurement and profiling, not on guesswork.

Measurement and benchmarking

Before optimizing, measure. You need meaningful, repeatable metrics and a controlled test harness.

Tools:

Load generators: k6, Gatling, JMeter, wrk, hey.
APM/Tracing: Elastic APM, New Relic, Datadog, Jaeger, Zipkin.
Metrics: Micrometer -> Prometheus + Grafana.
Profilers: Java Flight Recorder (JFR), VisualVM, async-profiler, YourKit, Honorable mention: BPF-based tools for syscalls and I/O.
OS tools: iostat, sar, vmstat, netstat, ss, tcpdump.

Important metrics:

Throughput (requests/sec)
Latencies: p50/p95/p99, server-side histograms
CPU utilization, GC pauses, GC time
Heap usage, allocation rate
Thread counts and thread states
Database connection pool usage and wait time
Network queues, socket states
Error rates & timeouts

A profiling workflow:

Establish a baseline with controlled load.
Collect metrics and traces during tests.
Use flame graphs / CPU sampling to find hotspots.
Validate suspected fixes with A/B tests.

Request/response path — where to optimize

Typical request lifecycle:

TCP handshake / TLS negotiation (client → load balancer → service)
Web server receives HTTP request (Tomcat/Jetty/Netty)
Request parsed and mapped to controller
Controller logic, service layer, DB access, external calls
Serialization to JSON/Protobuf and response writing
OS network stack sends response to client

Possible optimization points:

Network & TLS setup (Keep-Alive, HTTP/2)
Connection handling (Tomcat / Netty config)
Request parsing, filters, security, and interceptors
Controller and service CPU cost or blocking waits
DB queries and remote I/O latency
Serialization/deserialization costs and payload size
Thread scheduling and GC pauses
Downstream services and caches

Network and HTTP-level optimizations

Keep-Alive and connection reuse: enable and tune keep-alive timeouts so clients reuse sockets.
HTTP/2: reduces connection churn and head-of-line blocking; enable at load balancer and server (ALPN).
Compression: enable gzip/deflate (for JSON payloads) but balance CPU cost vs bandwidth. Enable selective compression for text responses.
TLS session resumption & TLS offload: minimize TLS handshake cost; consider LB/TLS offload.
Content negotiation and caching headers: use Cache-Control, ETag, Last-Modified for cacheable responses.
Use a CDN for static assets and caching reverse proxies (e.g., Varnish, Nginx) for cacheable API responses where appropriate.
Limit request and response payload sizes; validate incoming payload early (e.g., via request size limit).

Spring Boot server-level configuration examples

Enable response compression (application.yml):

YAML

server:
  compression:
    enabled: true
    mime-types: application/json,text/html,text/xml,text/plain,application/javascript
    min-response-size: 1024

Adjust embedded Tomcat connector (application.yml):

YAML

server:
  tomcat:
    threads:
      max: 200
      min-spare: 10
    accept-count: 100
    max-connections: 10000
    connection-timeout: 20000

Serialization and payload size

Reduce payload size:
- Use projections and DTOs: return only necessary fields.
- Pagination and result-limiting to avoid sending huge lists.
- Compression (gzip) for text-based serialization.
- Use compact serial formats for internal/microservice comms (Protobuf, Avro, MessagePack).
Optimize JSON serialization:
- Jackson tuning: avoid default typing, disable features that cost performance (e.g., FAIL_ON_UNKNOWN_PROPERTIES can be left on but not heavy), use Afterburner module for faster POJO serialization (bytecode generation).
- Use immutable/primitive-friendly DTOs and avoid deep object graphs.
- Consider alternative libraries (Gson, Jackson with Afterburner, DSL-optimized serializers).
Reuse ObjectMappers: configure a single, reusable ObjectMapper bean; avoid creating new mappers per request.
For large streaming data, use streaming APIs or reactive streaming to avoid full buffering.

Jackson Afterburner example (configuration):

Plain Text

@Bean
public ObjectMapper objectMapper() {
    ObjectMapper mapper = new ObjectMapper();
    mapper.registerModule(new AfterburnerModule());
    // common optimizations
    mapper.disable(SerializationFeature.FAIL_ON_EMPTY_BEANS);
    return mapper;
}

Concurrency, thread pools & async models

Blocking (Servlet/WebMVC) model: uses a thread per request. Tune server thread pools (Tomcat/Jetty) and application thread pools for blocking tasks (DB calls, file I/O).
Asynchronous processing: use @Async, CompletableFuture, or Spring’s DeferredResult/Callable to free request threads for other work. Use bounded thread pools with sensible queue sizes.
Reactive model (non-blocking): use WebFlux (Reactor Netty) and reactive libraries for highly concurrent, I/O-bound workloads—only if all layers can be non-blocking (DB, HTTP clients).
Thread sizing rules: for blocking workloads, threads ~= cores * (1 + W/C), where W = average waiting time, C = average compute time. Too many threads cause context switching and memory pressure.
Never block Netty event loop threads; if you must block, offload to a bounded scheduler.

Example: configure a bounded Executor for @Async

Plain Text

@EnableAsync
@Configuration
public class AsyncConfig {
    @Bean(name = "taskExecutor")
    public ThreadPoolTaskExecutor taskExecutor() {
        ThreadPoolTaskExecutor ex = new ThreadPoolTaskExecutor();
        ex.setCorePoolSize(20);
        ex.setMaxPoolSize(50);
        ex.setQueueCapacity(500); // bounded helps protect resources
        ex.setThreadNamePrefix("api-async-");
        ex.initialize();
        return ex;
    }
}

Reactive/non-blocking architectures

Spring WebFlux (Project Reactor) provides non-blocking I/O and backpressure-aware flows. Use when you have many concurrent I/O-bound requests or long-lived connections (SSE, WebSocket).
To benefit from reactive stacks end-to-end, use non-blocking drivers: R2DBC for relational databases, reactive MongoDB, reactive HTTP clients (WebClient).
Benefits: fewer threads, lower memory footprint, higher concurrency for I/O-bound workloads.
Costs: complexity in programming model, maturity and feature gaps in reactive drivers, debugging difficulty, and sometimes non-trivial migration from blocking code.

Simple WebFlux controller:

Plain Text

@RestController
public class ReactiveController {
    private final ReactiveRepository repo; // Reactor-based repository

    public ReactiveController(ReactiveRepository repo) { this.repo = repo; }

    @GetMapping("/items")
    public Flux<ItemDto> getItems() {
        return repo.findAll()
                   .map(this::toDto);
    }
}

Database & persistence optimizations (JPA/Hibernate)

Databases are often the largest source of latency. Key techniques:

Minimize round trips: use joins, fetch joins, and proper queries instead of lazy loading that triggers N+1 queries.
Use projections and DTO queries to fetch only needed columns instead of full entities.
Use pagination to limit result sizes.
Use database indexes and query plan analysis (EXPLAIN ANALYZE) to find slow queries.
Connection pool tuning: use HikariCP (default in Spring Boot), set maximum pool size based on workload and DB capacity.
Use prepared statements, batching, and bulk operations for many writes.
Hibernate second-level cache or query cache for read-heavy scenarios (use cautiously, watch cache invalidation).
Tune JDBC batch settings and hibernate.jdbc.batch_size for bulk inserts/updates.
Monitor DB metrics: slow query log, connection pool wait times, lock timeouts.

Avoid N+1 example (JPA fetch join):

Plain Text

@Query("select p from Parent p join fetch p.children where p.id = :id")
Optional<Parent> findWithChildren(@Param("id") Long id);

HikariCP basic config (application.yml):

YAML

spring:
  datasource:
    url: jdbc:postgresql://db:5432/mydb
    username: user
    password: pass
    hikari:
      maximum-pool-size: 20
      minimum-idle: 5
      connection-timeout: 30000
      idle-timeout: 600000
      max-lifetime: 1800000

Caching strategies

In-process caches (Caffeine): extremely fast for per-instance cache with eviction and TTL.
Distributed caches (Redis, Memcached): for cross-instance caching, session storage, rate-limiting, and shared caches.
Spring Cache abstraction: use annotations @Cacheable/@CachePut/@CacheEvict and plug in implementations (Caffeine, Redis).
HTTP-level caching: use ETag/Last-Modified and Cache-Control to allow downstream caches and browsers to avoid unnecessary requests.
Query caching: Hibernate 2nd-level cache can help read-mostly tables, but introduces cache coherence complexity.

Caffeine configuration example:

Plain Text

@Bean
public CacheManager cacheManager() {
    CaffeineCacheManager cm = new CaffeineCacheManager("items");
    cm.setCaffeine(Caffeine.newBuilder()
        .expireAfterWrite(Duration.ofMinutes(10))
        .maximumSize(10_000));
    return cm;
}

JVM and GC tuning for containers

Choose a GC suited for your latency goals: G1GC (default in many JDKs) is good general-purpose; ZGC and Shenandoah target very low pause times for large heaps.
Avoid relying on default memory assumptions when running in containers: use -XX:+UseContainerSupport (JDK11+ defaults) and set -XX:MaxRAMPercentage to size heap relative to container limits.
Control metaspace and direct memory if using native libraries or high thread counts.
Typical flags to start with:
- -Xms and -Xmx sizing to avoid heap resizing overhead.
- -XX:+UseG1GC -XX:MaxGCPauseMillis=200 for many services.
- -XX:MaxRAMPercentage=75.0 for containerized apps (adjust).
Monitor GC logs (use -Xlog:gc* for JDK9+ or -XX:+PrintGCDetails for older) and tune accordingly.

Container & deployment considerations

Right-size CPU and memory in containers; oversubscribing CPUs leads to CPU-steal.
Set resource limits to avoid noisy neighbor effects and to enable proper scheduling.
Use horizontal scaling (replicas) rather than pushing a single JVM to extreme thread counts.
Use layered jars to enable efficient image caching in Docker with Spring Boot.
Use JVM startup optimizations or consider GraalVM native-image / Spring Native to reduce startup time and memory footprints for serverless or short-lived functions (tradeoffs: native builds reduce latency and memory but may have limitations and longer build times).
NUMA and CPU pinning may matter at very high throughput.

Observability, profiling & performance testing

Instrument with Micrometer (Spring Boot integrates) and export to Prometheus/Grafana.
Capture metrics: request latencies, error rates, GC pause times, allocations, DB wait times, thread pool usage.
Use distributed tracing (OpenTelemetry, Zipkin, Jaeger) to find hotspots across services.
Load-test realistic user patterns: ramp up/down, bursts, different payloads, caching warm-up, and long-running tests to see memory growth.
Use flame graphs and CPU sampling for hotspots; allocation profiling for high object churn.

Resilience & scaling patterns

Circuit breakers (Resilience4j) to fail fast on slow downstream services.
Bulkheads (bounded thread pools, semaphores) to isolate failures and preserve throughput in other parts.
Timeouts on all downstream calls and DB operations.
Rate limiting at the gateway or service level to protect resources.
Backpressure-aware endpoints (reactive) or client-side rate limiting.

Practical checklist (quick start)

Measure baseline: metrics, traces, and synthetic load.
Tune connection pools: DB (Hikari), HTTP client pools (Apache HttpClient or WebClient).
Optimize slow database queries: EXPLAIN, indexes, projections, fetch joins.
Add caching for expensive reads (Caffeine/Redis).
Reduce payloads: remove fields, paginate, and compress responses.
Reuse expensive objects (ObjectMapper) and enable Jackson Afterburner if appropriate.
Limit and tune server threads; add proper timeouts so threads don’t hang.
Implement timeouts and retries with backoff for external calls (WebClient with Reactor Retry, Resilience4j).
Profile CPU and memory; tune JVM/G1GC or evaluate low-pause GC if necessary.
Add tracing and dashboards to track p50/p95/p99 latency over time.

Concrete examples & code snippets

WebClient with connection pooling and timeouts

Plain Text

@Bean
public WebClient webClient() {
    HttpClient httpClient = HttpClient.create()
        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000)
        .responseTimeout(Duration.ofSeconds(5))
        .doOnConnected(conn ->
            conn.addHandlerLast(new ReadTimeoutHandler(5))
                .addHandlerLast(new WriteTimeoutHandler(5)));

    return WebClient.builder()
        .clientConnector(new ReactorClientHttpConnector(httpClient))
        .build();
}

Database batching with Spring Data JPA

Plain Text

spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true

And ensure your entities use ID generation compatible with batching (e.g., avoid GenerationType.IDENTITY).

Avoid N+1 with DTO projection:

Plain Text

@Query("select new com.example.dto.UserDto(u.id, u.name, a.city) from User u join u.address a where u.id = :id")
UserDto findUserDto(@Param("id") Long id);

Enable Micrometer metrics (application.yml):

YAML

management:
  endpoints:
    web:
      exposure:
        include: health,prometheus,metrics,logfile
  metrics:
    export:
      prometheus:
        enabled: true

Performance testing examples

Use k6 script to simulate realistic traffic (ramping users, payload sizes).
Warm caches before measuring read-heavy scenarios.
Run tests for long enough to capture GC cycles and slow-path behavior.

Common pitfalls and anti-patterns

Optimizing the wrong thing (not measuring).
Large unbounded thread pools and unbounded queues leading to OOM and high GC.
Blocking within reactive stacks (blocks Netty event loops).
Caching without eviction strategy or staleness handling.
Using huge entity graphs (JPA) and returning entities directly from controllers — prefer DTOs.
Relying on autowiring per-request ObjectMapper / creating many temporary objects — increase allocation pressure.

Future trends & implications

Reactive and non-blocking I/O adoption will grow, but only where the whole stack supports non-blocking paradigms.
Native images (GraalVM) and AOT-compiled Spring (Spring Native / Spring AOT) reduce startup times and memory footprints, useful for serverless and microservices with many small instances.
WebTransport/HTTP/3 might influence latency patterns in future network stacks.
Observability and automated tuning (AI-driven optimizers) will increasingly propose configuration changes based on real-time telemetry.
Hardware acceleration (e.g., specialized TLS offload, eBPF-based observability) will be leveraged more in high-scale deployments.

Conclusion

Improving API performance in Spring Boot is a multi-layer task: measure first, then apply targeted optimizations at the network, application, serialization, database, caching, JVM, and deployment layers. Use Spring Boot features (Hikari, WebClient, caching, WebFlux) wisely, match the concurrency model to your workload, and always validate changes under realistic load.

A pragmatic approach:

Start with instrumentation and a baseline.
Fix the highest-impact bottlenecks (DB queries, caches).
Tune resource constraints (connection pools, thread pools).
Optimize serialization and network settings.
Consider reactive or native images only when they fit the full stack and requirements.

How to Improve API Performance in Spring Boot Applications

How to Improve API Performance in Spring Boot Applications

Overview

Contents

Background & history

Key performance concepts

Measurement and benchmarking

Request/response path — where to optimize

Network and HTTP-level optimizations

Spring Boot server-level configuration examples

Serialization and payload size

Concurrency, thread pools & async models

Reactive/non-blocking architectures

Database & persistence optimizations (JPA/Hibernate)

Caching strategies

JVM and GC tuning for containers

Container & deployment considerations

Observability, profiling & performance testing

Resilience & scaling patterns

Practical checklist (quick start)

Concrete examples & code snippets

Performance testing examples

Common pitfalls and anti-patterns

Future trends & implications

Conclusion

Further reading and tools