Issue
- How to disable robots in some pages and how to disallow the robots indexers to reach an environment.
Environment
- DXP 7.0
- DXP 7.1
- DXP 7.2
- DXP 7.3
- DXP 7.4
- Liferay PaaS
Resolution
- You can define de robots indexation site configuration in the robots.txt
- To disable de robots indexation, you can define it in the robots configuration with the Disallow: / parameter.
For example, from the public pages configuration to disable the index of all pages:
- To hide individual pages:
-
Disallow: /page-name
-
- This means that the robots indexers can reach the site and their information but their are not going to index the defined pages because of the configuration policies.
- With Liferay PaaS it is easy to disallow robots to reach an environment with an authorization layer. This layer guarantees that the robots will be unable to reach the site. Not simply indexer policies.
From the Webserver, we can define an authorization layer before reach the site with the following properties in their .conf file:
auth_basic "Authentication Required";
auth_basic_user_file /var/www/html/.htpasswd;
Additional Information