Examining Data Migration

Reliable, repeatable data migration is essential to operating Liferay environments at scale. Manual imports are error-prone, slow, and difficult to audit, creating a serious bottleneck for teams promoting models across multiple environments. Liferay's Batch Engine and batch client extensions address this challenge by automating bulk data imports using configurable execution strategies, ensuring rapid and consistent deployments. By packaging JSON data into these extensions, you can leverage headless APIs to automatically create or update models in a target environment. To execute these imports, every batch client extension relies on two core components: configuration elements and payload resources.

Once data is exported and appropriately formatted, batch client extensions leverage the batch engine to import data.

In this lesson, you'll learn about the components of batch client extensions and how to configure them to migrate data reliably between environments.

Batch Client Extension Configurations

Every client extension project contains a client-extension.yaml file defining its structure, resources, and configurations. This file is crucial for managing and organizing your project’s extensions. For further details, refer to official documentation (see Configuring Client Extensions) or take Foundations of Liferay Client Extensions for a holistic overview.

For batch client extensions, the client-extension.yaml file’s configuration elements are set in these three sections:

  • The Assemble Block
  • The (extension) Definition Block
  • The Security (OAuth) Block

Assemble Block

The assemble block organizes the extension’s assets, ensuring all necessary resources are packaged correctly for deployment. This block guides the processor in finding the extension's required files and resources. A typical batch client extension includes an assemble block that looks like this:

assemble:
    - from: batch
      into: batch

The two batch references in this example refer to the folder named “batch” within the Liferay Sample Workspace’s liferay-sample-batch. If data is organized differently, paths in the assemble block will need to be adjusted accordingly.

Paths in the assemble block are always assumed to start at the root of the client extension, so there is no need to specify a leading slash.

Extension Definition Block

Core client extension configurations are defined in this block. For a batch client extension, this includes properties such as the name, type, and associated APIs. The name value identifies the client extension in the Liferay UI, and the type value of batch ensures Liferay handles it as a batch client extension when deployed. The oAuthApplicationHeadlessServer value is a required attribute specifying the OAuth 2.0 configuration used for secure API access. This value must match the name of the block that defines those properties.

It is not recommended to configure a headless endpoint to be accessible without an authentication context.

A typical batch client extension will include an extension definition block similar to this example:

liferay-ticket-batch-list-type-definition:
    name: Liferay Ticket Batch List Type Definition
    oAuthApplicationHeadlessServer: liferay-ticket-batch-list-type-definition-oauth-application-headless-server
    type: batch

Security (OAuth) Block

To securely interact with Liferay's APIs, the extension is configured with an OAuth 2.0 application. This setup includes standard OAuth criteria such as the service address, security scheme, and access scopes. Using the previous example's oAuthApplicationHeadlessServer value, a typical batch client extension's OAuth block might include:

liferay-ticket-batch-list-type-definition-oauth-application-headless-server:
    .serviceAddress: localhost:8080
    .serviceScheme: http
    name: Liferay Ticket Batch List Type Definition OAuth Application Headless Server
    scopes:
        - Liferay.Headless.Admin.List.Type.everything
        - Liferay.Headless.Batch.Engine.everything
    type: oAuthApplicationHeadlessServer

The security block is critical to the functionality of the batch execution. Without it, the process can't enforce the necessary security requirements to process the batch payload. Defining scopes in this block grants permissions for specific API resources as needed by the extension.

To find the correct API scopes for your batch client extensions, open the Global Menu (Global Menu icon), go to the Control Panel tab, and click OAuth 2 Administration. Select an OAuth 2.0 application from the list and go to the Scopes tab, which displays available Liferay API scopes.

Together, these three blocks compose the client-extension.yaml file in batch client extensions. This YAML file is crucial because it provides Liferay with essential metadata and instructions for handling the deployed archive.

Understanding Payloads

While client-extension.yaml files define a batch client extension’s metadata, payload files are equally critical. JSON payload files, stored in single or multiple files, define the specific actions and behaviors of the extension. Using multiple files typically simplifies maintenance, particularly for large or diverse datasets.

When leveraging a multiple file approach, file processing occurs in alphanumeric order. It's crucial to name and number files accordingly to ensure that prerequisite elements are processed before they are referenced by other files (e.g., importing a model’s definitions and relationships prior to its entries).

Payloads for batch client extensions consist of two top-level JSON objects:

  1. The Configuration Block
  2. The Data Block

Configuration Block

The configuration block provides instructions on how to process data within the JSON file. This block defines how to process data, what entities are involved, and how to handle potential errors during payload processing. Within the liferay-sample-batch example's batch folder, the object-definition.batch-engine-data.json contains this configuration block:

"configuration": {
    "className": "com.liferay.object.admin.rest.dto.v1_0.ObjectDefinition",
    "parameters": {
       "containsHeaders": "true",
       "createStrategy": "UPSERT",
       "importStrategy": "ON_ERROR_FAIL",
       "updateStrategy": "UPDATE"
    },
    "taskItemDelegateName": "DEFAULT"
},

In this example, the className directs the Batch Engine to process any items as object definitions, and the importStrategy directs Liferay to surface an error if the import fails.

Options for Batch Strategies

Each import batch contains parameters in the configuration block specifying the import's data mapping, validation rules, and error handling procedures:

  • Import Strategy (importStrategy): Defines behavior in case of errors. The default of ON_ERROR_FAIL stops the import immediately upon encountering an error, while ON_ERROR_CONTINUE continues processing records even when errors occur.
  • Create Strategy (createStrategy): Specifies how file records are handled. The default of INSERT creates only new records (and duplicate records trigger errors), while UPSERT updates existing records and creates new ones when necessary.
  • Update Strategy (updateStrategy): Configures the behavior during record updates when using the UPSERT strategy. UPDATE completely overwrites the record, replacing missing values in the import file with null, while PARTIAL_UPDATE updates only the fields present in the import file, preserving existing values for other fields.

You can modify each of these strategies by manually overwriting the configuration block for batch client extension JSON files.

Data Block

The data block contains a list of items, where each item represents a data record to be created. The processing defined by the configuration block relies on this items array as its data source. It's crucial that this array includes complete definitions for all data records, including attributes, relationships, and any other relevant information. In the object-definition.batch-engine-data.json example, below the configuration block, the items array contains a definition for one object (named Sample):

	"items": [
		{
          ...
			"label": {
				"en_US": "Sample"
			},
			"name": "Sample",
			"objectActions": [
				{
					"active": true,
					"conditionExpression": "",
					"dateCreated": "2023-03-03T15:10:15.037+00:00",
					"dateModified": "2023-03-03T15:10:15.037+00:00",
					"description": "",
					"errorMessage": {
					},
					"label": {
						"en_US": "Sample Node Update 1"
					},
          ...

Instead of manually creating information in the data block, use Liferay's available tools to export existing definitions. The JSON files for exported models do not include the configuration block or the items array by default, so you'll need to either augment the exported schema with these elements or incorporate it into a pre-existing file structure that already contains them.

Maintaining External Reference Codes

Items in the data block frequently contain an external reference code (ERC), supported by many of Liferay's headless APIs. The ERC tracks and manages unique external data entities with human-readable keys. ERCs are unique identifiers that empower consistent migration between environments when other values such as structure IDs may vary. Before migrating data, set and maintain descriptive ERCs in the Liferay UI for your environment's objects and other assets.

Troubleshooting Data Migration with Batch Client Extensions

When working with batch client extensions, the following strategies can assist with troubleshooting:

  • Check Compatibility: Ensure all resources inside the client extension’s JSON files match the schema, capabilities, and Liferay version of the target system.
  • Evaluate Feature Flags: Be aware of any enabled feature flags that may impact client extension behavior.
  • Pre-populate Scopes: To check for suspected conflicts with scope requests, access http://localhost:8080/o/api to populate all objects inside Liferay’s memory before deploying client extensions.

Conclusion

Batch client extensions offer significant advantages over manual methods for automating data migration. By understanding data migration concepts and batch client extension components, you can streamline the process of migrating object definitions, picklists, workflow definitions, and more between environments.

Next, you’ll build and deploy a batch client extension from Clarity’s exported distributor management definitions.

Loading Knowledge