There are two ways to prevent our crawlers from accessing specific sections of your website: through the site’s robots.txt file and through the ContentKing URL Exclusion List.
IMPORTANT: when you use the methods below to exclude certain parts of your website from monitoring, all of the existing data ContentKing has collected related to the URLs matching the exclusion patterns will be deleted. Although you can always remove URL patterns to allow monitoring again, this historical data won’t be restored.
Using the URL Exclusion List
The easiest method to set up is our URL Exclusion List. Inside the app, go to the website’s settings via Account -> Websites. At the bottom of that screen, click on “Set up URL exclusion list.” In the first screen a brief explanation of the URL Exclusion List is shown.
If the website has a robots.txt file, ContentKing will show the directives found there in the next screen – the second step. Use this step to select which directives (if any) you want to import:
If you wish, use the next screen to add your own URL patterns to exclude from monitoring. The URL Exclusion List follows the robots.txt format. If you have imported existing robots.txt directives, they are shown here as well.
Once everything is set up as you wish, click on “Apply changes,” and the URL exclusions will take effect within the next few minutes.
The second way to prevent ContentKing from monitoring certain parts of your website is through the robots.txt file. However, since one of the use-cases for ContentKing is to detect incorrectly configured robots.txt files, we ignore the wildcard in user-agent strings. To target our crawlers, use the KingKevinBot user-agent string instead.
Our crawlers fully support the robots.txt format.