This topic discusses data serialization with PDX in Spring Boot for VMware Tanzu GemFire.
Anytime data is overflowed or persisted to disk, transferred between clients and servers, transferred between peers in a cluster or between different clusters in a multi-site WAN topology, all data stored in VMware GemFire must be serializable.
To serialize objects in Java, object types must implement the java.io.Serializable
interface. However, if you have a large number of application domain object types that currently do not implement java.io.Serializable
, refactoring hundreds or even thousands of class types to implement java.io.Serializable
would be a tedious task just to store and manage those objects in VMware GemFire.
Additionally, it is not only your application domain object types you necessarily need to consider. If you used third-party libraries in your application domain model, any types referred to by your application domain object types stored in VMware GemFire must also be serializable. This type explosion may bleed into class types for which you may have no control over.
Furthermore, Java serialization is not the most efficient format, given that metadata about your types is stored with the data itself. Therefore, even though Java serialized bytes are more descriptive, it adds a great deal of overhead.
Then, along came serialization using VMware GemFire’s PDX format. PDX stands for Portable Data Exchange and achieves four goals:
-
Separates type metadata from the data itself, streamlining the bytes during transfer. VMware GemFire maintains a type registry that stores type metadata about the objects serialized with PDX.
-
Supports versioning as your application domain types evolve. It is common to have old and new versions of the same application deployed to production, running simultaneously, sharing data, and possibly using different versions of the same domain types. PDX lets fields be added or removed while still preserving interoperability between old and new application clients without loss of data.
-
Enables objects stored as PDX to be queried without being de-serialized. Constant serialization and deserialization of data is a resource-intensive task that adds to the latency of each data request when redundancy is enabled. Since data is replicated across peers in the cluster to preserve High Availability (HA) and must be serialized to be transferred, keeping data serialized is more efficient when data is updated frequently, since it is likely the data will need to be transferred again in order to maintain consistency in the face of redundancy and availability.
-
Enables interoperability between native language clients (such as C, C++ and C#) and Java language clients, with each being able to access the same data set regardless from where the data originated.
However, PDX does have limitations.
For instance, unlike Java serialization, PDX does not handle cyclic dependencies. Therefore, you must be careful how you structure and design your application domain object types.
Also, PDX cannot handle field type changes.
While VMware GemFire’s general Data Serialization handles Deltas, this is not achievable without de-serializing the object, since it involves a method invocation, which defeats one of the key benefits of PDX: preserving format to avoid the cost of serialization and deserialization.
However, we think the benefits of using PDX outweigh the limitations and, therefore, have enabled PDX by default.
You need do nothing special. You can code your domain types and rest assured that objects of those domain types are properly serialized when overflowed and persisted to disk, transferred between clients and servers, transferred between peers in a cluster, and even when data is transferred over the network when you use VMware GemFire’s multi-site WAN topology.
Example 1. EligibilityDecision is automatically serialiable without implementing Java Serializable.
@Region("EligibilityDecisions")
class EligibilityDecision {
// ...
}
VMware GemFire supports the standard Java Serialization format. For more information, see Data Serialization in the VMware GemFire product documentation.
MappingPdxSerializer versus ReflectionBasedAutoSerializer
Spring Boot for Tanzu GemFire enables and uses Spring Data for VMware GemFire’s MappingPdxSerializer
to serialize your application domain objects with PDX.
The MappingPdxSerializer
class offers several advantages above and beyond VMware GemFire’s own ReflectionBasedAutoSerializer
class (see VMware GemFire Java API Reference).
See VMware GemFire’s User Guide for more details about the ReflectionBasedAutoSerializer
.
The Spring Data for VMware GemFire MappingPdxSerializer
class offers the following benefits and capabilities:
-
PDX serialization is based on Spring Data’s powerful mapping infrastructure and metadata.
-
Includes support for both
includes
andexcludes
with first-class type filtering. Additionally, you can implement type filters by using Java’sjava.util.function.Predicate
interface as opposed to the limited regex capabilities provided by VMware GemFire’sReflectionBasedAutoSerializer
class. By default,MappingPdxSerializer
excludes all types in the following packages:java
,org.apache.geode
,org.springframework
andcom.gemstone.gemfire
. -
Handles transient object fields and properties when either Java’s
transient
keyword or Spring Data’s@Transient
annotation is used. -
Handles read-only object properties.
-
Automatically determines the identifier of your entities when you annotate the appropriate entity field or property with Spring Data’s
@Id
annotation. -
Allows additional
o.a.g.pdx.PdxSerializers
to be registered to customize the serialization of nested entity/object field and property types.
The support for includes
and excludes
deserves special attention, since the MappingPdxSerializer
excludes all Java, Spring, and VMware GemFire types, by default. However, what happens when you need to serialize one of those types?
For example, suppose you need to serialize objects of type java.security.Principal
. Then you can override the excludes by registering an include
type filter:
package example.app;
import java.security.Principal;
@SpringBootApplication
@EnablePdx(serializerBeanName = "myCustomMappingPdxSerializer")
class SpringBootGemFireClientCacheApplication {
public static void main(String[] args) {
SpringApplication.run(SpringBootGemFireClientCacheApplication.class, args);
}
@Bean
MappingPdxSerializer myCustomMappingPdxSerializer() {
MappingPdxSerializer customMappingPdxSerializer =
MappingPdxSerializer.newMappingPdxSerializer();
customMappingPdxSerializer.setIncludeTypeFilters(
type -> Principal.class.isAssignableFrom(type));
return customMappingPdxSerializer;
}
}
Tip | Normally, you need not explicitly declare Spring Data for VMware GemFire's
@EnablePdx annotation to enable and configure PDX. However,
if you want to override auto-configuration, as we have demonstrated
above, you must do this. |
Content feedback and comments