13 sept. 2010

Howto: Hibernate second-level cache with Ehcache Part 1

This post will show a step-by-step guide on how to setup Hibernate's second-level cache using Ehcache as cache provider. We won't explore distributed caches for clustered environments for the moment.

The best reference for this topic is the Hibernate "bible" JPA Persistence with Hibernate, that contains a full explanation on both theory and practice on using second-level caches. You cand find more tutorials and articles on this in the References section at the bottom.

Let's remember briefly some key concepts:

Hibernate's second level cache can be configured with different scopes such as transaction-scope (objects are cached inside a unit of work), process scope (objects are cached inside a JVM, this sharing the cache by many concurrent threads) or cluster scope (distributed caching in a cluster environmet). Since transaction scope is provided by Hibernate's Session, we'll set up a process scope cache in this howto.

Hibernate supports different cache providers implementations: Ehcache, JBossCache, OSCache, SwarmCache. Ehcache is a mature project and provides full support for distributed caching, wich is a good thing if we need, anytime in the future, to deploy our webapps in a clustered environment. Hibernate architects Gavin King and Greg Luke are also involved in Ehcache development, and Ehcache's web site announce that Ehcache will remain a first-class 2nd level cache for Hibernate. We'll use Ehcache for this example.

Hibernate has two types of second-level cache: one for entities and collentions and one more for query caching. The first type (we'll call it main second-level cache) will only cache our domain objects when loaded via Session methods like get() or load(). The query cache will hold the queries, bounded parameters and resultsets (not exactly, we'll see next) of the queries performed (HQL, SQL and Criteria). Query caching is enabled independently from the main second-level cache. Queries that return scalar values or DTO POJOs will be held enirely in the query cache. If we plan to cache queries that return domain entities, we must ensure that the main cache is also configured for this entities. That's because query cache doesn't cache actual entities, but a hash-like data structure holding entity types, id's and a timestamp that allows query cache expiration. The actual entities are being stored in the cache region configured for that entity in the main second-level cache. If we don't set up properly, we'll see how query cache finds a valid cached resultset for the query, but right after that, Hibernate will issue one select for each of the entity ids cached in the query, and that's definitely something we don't like, right? Hibernate will use a timestamp to detect stale cached queries.

We can define different cache concurrency strategies for our second-level cache, and different cache provider implementors supports only a subset of this strategies: transactional, read-write, nonstrict-read-write, read-only. Different strategies provide different behaviour in terms of transaction isolation related with write operations over cached data.

We must remember that second-level caching is not a golden hammer that will speed up our webapp just because we cache everything. In fact, we must choose carefully what and how to cache. Second-level caching will work great with data that is read-intensive, with few write operations on the tables mapped to the cached entities.

After reading and understanding what the second cache is and how it works, we're ready to configure our webapp to use a Ehcache second-level cache with Hibernate. We'll cover this on Part 2.


No hay comentarios: