Legacy Knowledge Base
Published Jun. 30, 2025

Multilingual PDF search indexing fails with web content

Written By

Rishabh Agrawal

How To articles are not official guidelines or officially supported documentation. They are community-contributed content and may not always reflect the latest updates to Liferay DXP. We welcome your feedback to improve How To articles!

While we make every effort to ensure this Knowledge Base is accurate, it may not always reflect the most recent updates or official guidelines.We appreciate your understanding and encourage you to reach out with any feedback or concerns.

Legacy Article

You are viewing an article from our legacy "FastTrack" publication program, made available for informational purposes. Articles in this program were published without a requirement for independent editing or verification and are provided"as is" without guarantee.

Before using any information from this article, independently verify its suitability for your situation and project.

Issue

  • Why does the main document entry contain only the metadata and not the content of PDFs?
  • Is it possible to add the content of PDFs along with the metadata of PDFs in the main document entry?

Steps to reproduce:

1. Create a Structure: Navigate to Site Control Panel > Content Data > Web Content > Structures and create a new structure with an upload field applicable for both English and Hindi languages. Save the structure.
2. Create an Article: Go to Web Content and click "New" to add a new article. Select the newly created structure.
3. Upload PDFs: Within the English language section, upload a PDF containing English content. Switch the language selector to Hindi and upload a different PDF containing Hindi content. Provide titles for both language versions. Save the article.
4. Create a Blueprint: Navigate to the Application Menu and select the Application tab. Under "Search Experience," go to Blueprints and create a new blueprint.
5. Configure Blueprint: Before previewing, go to the "Configuration" tab within the blueprint. Under "Advanced Configuration," add the following JavaScript snippet (provided below). Enable the "Fetch Source Field" option to "True."

{
"source": {
"fetchSource": true
}
}

6. Preview and Search: Click the "Preview" button (top right). This opens a search bar in the right-hand panel. Enter the title of the newly created article and perform a search.

Actual Behavior: Only the fileEntryId and title are indexed as part of the article. The actual content of the PDFs is not indexed, thus preventing users from finding the article via keyword searches within the PDF content.



Desired Behavior: The content within both the English and Hindi PDF files should be indexed and searchable as part of the article. The search results should reflect the content within these PDFs.

Here is the attached video showcasing the detailed steps.

Environment

  • Liferay DXP [all versions]

Resolution

  • Indexing the ddmField is not considered a part of the product.
  • However, the use case of using the DDM Fields with its current construction can have unwanted impacts on the user experience like performance.
  • Therefore, the concern is not considered a product bug but can be taken further as a Feature Request (if needed by customers) in the upcoming releases of DXP.
  • Moreover, since the use case is reported for Web Content as well as Document and Media, similar resolution is applicable for both.

Additional Information

Did this article resolve your issue ?

Legacy Knowledge Base