, using Robots.txt to save server bandwidth
two, website security protection directory
Robots.txt file is a simple TXT text, but focus on the website construction and website optimization Shanghai dragon Er are aware of its importance, its existence can be do not want search engines crawl the page can also be shielded, like a map of the same as the spider road navigation. When a spider crawling to a site, the first visit is Robots.txt if the file exists, then according to the content of the guidelines in the index to access, if the file does not exist if it were in accordance with the order of the links in the page access. So we can use it to shield some do not need to search engine to index directory or site map will be described in the Robots.txt guide the spider crawling, so the security of the website or save server bandwidth and guide index is very awesome, it can be said is to avoid Yang has long have short effect the following, we do detailed analysis to:
generally when we set the Robots.txt to the directory and database management and backup directory settings in hexadecimal, spider crawling, otherwise easily lead to data leakage effect of website security. Of course, there are some administrators do not want other spiders index directory, can also be set, so the search engine can strictly abide by this rule to be indexed.
because of the spider is the first access to the site to view the Robots.txt file, then we can be set in the site map, more conducive to the latest information spiders index, and less number of path. As a professional website construction company pilot technology map page shows: 贵族宝贝****.net.cn/ s>
general webmaster rarely do such a setting, however, when the server visits the content is too much and it is necessary to do a set to save the server’s bandwidth, such as shielding: image this folder to the search engine index is not what practical significance but also waste a lot of bandwidth. If the picture for a website, consumption is amazing, so the use of Robots.txt can fully resolve this point.
The total number of
a website does not want the public to see the page, this time we can use Robots.txt to set the index to avoid, such as the early days of the relatively slow speed results to update an article, which leads to continuous repeat released 3 results, all indexed by search engines, how to do? Duplicate content of website optimization is not good, it can be set up through the Robots.txt to shield the extra pages.
three, stop the search engine index page
four, Robots.txt site map