Issue
- Why does the main document entry contain only the metadata and not the content of PDFs?
- Is it possible to add the content of PDFs along with the metadata of PDFs in the main document entry?
Steps to reproduce:
1. Create a Structure: Navigate to Site Control Panel > Content Data > Web Content > Structures and create a new structure with an upload field applicable for both English and Hindi languages. Save the structure.
2. Create an Article: Go to Web Content and click "New" to add a new article. Select the newly created structure.
3. Upload PDFs: Within the English language section, upload a PDF containing English content. Switch the language selector to Hindi and upload a different PDF containing Hindi content. Provide titles for both language versions. Save the article.
4. Create a Blueprint: Navigate to the Application Menu and select the Application tab. Under "Search Experience," go to Blueprints and create a new blueprint.
5. Configure Blueprint: Before previewing, go to the "Configuration" tab within the blueprint. Under "Advanced Configuration," add the following JavaScript snippet (provided below). Enable the "Fetch Source Field" option to "True."
{
"source": {
"fetchSource": true
}
}
6. Preview and Search: Click the "Preview" button (top right). This opens a search bar in the right-hand panel. Enter the title of the newly created article and perform a search.
Actual Behavior: Only the fileEntryId and title are indexed as part of the article. The actual content of the PDFs is not indexed, thus preventing users from finding the article via keyword searches within the PDF content.
Desired Behavior: The content within both the English and Hindi PDF files should be indexed and searchable as part of the article. The search results should reflect the content within these PDFs.
Here is the attached video showcasing the detailed steps.
Environment
- Liferay DXP [all versions]
Resolution
- Indexing the ddmField is not considered a part of the product.
- However, the use case of using the DDM Fields with its current construction can have unwanted impacts on the user experience like performance.
- Therefore, the concern is not considered a product bug but can be taken further as a Feature Request (if needed by customers) in the upcoming releases of DXP.
- Moreover, since the use case is reported for Web Content as well as Document and Media, similar resolution is applicable for both.
Additional Information
- Creating and Upvoting Feature Requests: https://help.liferay.com/hc/en-us/articles/360018123132-Requesting-a-New-Feature-or-Feature-Improvement
- Info about the Feature Request Process: https://liferay.dev/en/feedback/feature-requests