Issue
- Several requests coming from the SemrushBot/7~bl web crawler are observed on the site, with the potential of hitting 500,000 requests within 1 week
- The large amount of requests may lead to stability concerns during the scan
Environment
- Liferay DXP 7.2
- Liferay PaaS
Resolution
- These bots are harmless, but they are scanning some Liferay Cloud sites
- More information on these bots can be found here: https://www.semrush.com/bot/
- To prevent them from crawling, use the robots.txt file, as instructed by the site, adding the correct content and syntax, depending on the use case of the sit and tools that are applied
- If no tools are applied, use the following to block SemrushBots from crawling the site:
User-agent: SiteAuditBot
Disallow: /