Listen to Lesson

Performing Batch Operations

Liferay’s batch engine APIs provide a powerful framework for efficiently managing and integrating large datasets. Simplifying and automating operations like system synchronization and data migration, batch APIs offer significant performance benefits when interacting with data at scale.

Liferay’s batch engine APIs provide a powerful framework for efficiently managing and integrating large datasets.

For Clarity, these APIs are essential for integrating historic data sets into Liferay, including bulk customer records, ticketing data, and product updates. They also facilitate exporting data to external systems, which ensures accurate sharing and system alignment with minimal manual intervention. These benefits enable Clarity to

Seamlessly transition customer and order data from external systems.
Bulk-create distributor accounts without individual POST requests for each entry.
Ensure error-free data imports and exports while maintaining system performance.
Migrate thousands of records into internal systems to maintain continuity.

Understanding Batch Engine Capabilities

The batch engine forms the core of Liferay's bulk data operation handling. It manages export and import tasks within a structured REST interface. With batch, you can use PUT, POST, or DELETE requests through designated batch endpoints for efficient and customizable data ingestion. For example, the /o/headless-delivery/v1.0/sites/{siteId}/blog-postings/batch endpoint accepts a site ID in the URL and a JSON payload for creating, updating, or deleting blog posts.

With batch, you can use PUT, POST, or DELETE requests through designated batch endpoints.

The batch engine exposes an additional set of REST services, providing an alternate method for leveraging individual batch methods. Found within /o/headless-batch-engine, these provide direct export-task and import-task methods.

For Clarity to create multiple blog posts using these batch endpoints, they can execute a POST to http://localhost:8080/o/headless-batch-engine/v1.0/import-task/com.liferay.headless.delivery.dto.v1_0.BlogPosting. In this call, the POST body will include the data to be imported, with payload content matching the component’s data format.

The batch engine exposes an additional set of REST services, providing an alternate method for leveraging individual batch methods.

Once a batch process is invoked, its task ID can be leveraged to check the job’s status and retrieve the exported content.

These capabilities are essential tools for high-volume and complex data ingestion scenarios where efficiency and customization are key.

Performing Exports with Batch

Exporting entities requires specifying an output data format, list of included fields, and parameters. Required parameters always include the class name and may require taskItemDelegateName when exporting data from components (e.g., Liferay Objects). By default, exports include all fields, but you can refine the data by selecting specific fields. Also, since batch exports leverage the standard GET method, you can use valid service parameters (e.g., search, sort, filter) to tailor exported data for specific business needs.

See Batch Engine API Basics - Exporting Data for more information.

Performing Imports with Batch

Importing with batch involves a similar schema structure to individual POST, PUT, or PATCH requests. The payload includes a JSON array of individual entities separated by commas and encapsulated in square brackets.

When importing data from external sources, it’s essential to ensure the data is formatted to fit Liferay’s expected schema and contains valid field references. You can determine the batch payload structure for a component by examining the POST request body for a single entity import, which includes all fields. When source data contains fields with different names, you can transform the data during import using the fieldNameMappingMap property.

See Batch Engine API Basics - Importing Data for more information.

Benefits of Leveraging Batch APIs

Designed for efficiency and scalability, batch engine APIs are ideal for large data volumes. Here are some key advantages:

Efficient Bulk Data Imports and Exports: Single-request operations simplify data extraction and uploads. Reduced API overhead and network calls make this approach ideal for data migration and platform integrations.
Reliable Task-Based Execution: Batch engine API calls are handled as individual tasks, enabling detailed progress tracking and identification. This approach allows retrying failed entries without reprocessing entire datasets, identified through reliable and predictable INITIAL, COMPLETED, or FAILED status updates.
Error Handling and Validation: Incoming data validation and detailed logs for failed entries prevent invalid or duplicate records, ensuring data integrity and minimizing data anomalies.
Scalability for Large Data Sets: Batch engine APIs are designed to process large datasets efficiently without encountering API rate limits or payload size restrictions. This makes them ideal for tasks like archiving historical data or importing thousands of updates.
Customized Exported Data: The fieldNames parameter enables you to customize data exports, reducing payload size and improving performance by retrieving only necessary fields (e.g., excluding internal IDs and metadata for reporting).
Flexible Export Formats: JSON, CSV, and JSONL formats enable tailored exports matching environment requirements. Typically, JSON is well-suited for API integrations, CSV for spreadsheet tools, and JSONL for line-delimited JSON processing in high-volume workflows.

Deciding When to Leverage Batch APIs

While both batch and individual API calls create and update data, batch APIs excel in large-scale imports, migrations, and system synchronization. However, batch APIs may not be ideal when intricate data transformations are required before submission, as those transformations need to be managed beforehand. For real-time, on-demand updates or intricate data transformations, standard API calls offer more control, though they may be slower and subject to rate limits for large datasets.

In essence, use batch APIs for scenarios where efficiency, scalability, and customization are paramount, particularly for bulk operations. For ad hoc queries, standard GET methods remain a better fit. For implementation guidance, see Batch Engine API Basics.

Leveraging Batch APIs Effectively

In order to leverage batch APIs effectively, consider these recommendations:

For imports exceeding 1GB, increase the configured Upload Servlet Request maximum or consider breaking API import calls into multiple smaller portions.
Test batch calls in lower environments to ensure desired performance, as performance varies based on the internal structure of each component.
For object definition imports using the UPDATE strategy, include all fields in the batch call to avoid redefining the object with excluded fields omitted.
When entries must be processed sequentially, arrange your batch import data within the API call from top to bottom.
To achieve multi-threaded processing, submit multiple batch requests with different data segments, as batch requests are generally single-threaded.
To simplify data management in multi-site environments, leverage the siteID parameter for site-specific batch export capabilities.

Conclusion

Batch APIs are indispensable for efficient, large-scale data imports and exports. Their versatility and ability to handle massive datasets makes batch APIs ideal for data synchronization and content migration, reducing manual effort and providing accurate data transitions. By leveraging batch engine APIs, Clarity can ensure fast, scalable, and error-resistant data integration, enhancing operational efficiency.

Next, you’ll leverage batch APIs to import and export bulk data sets.