At first, I ‘d like to thanks for xinbin.chen’s help.
He should be a good teacher who could teach u how to fish not just gave u some fishes.
No more words, let’s get to the point.
There are two popular methods to kick out the Robots & Spiders.
1: put robots.txt in the www root.
the rules are:
#that u want to allow the spider get ur site
User-agent: somespider
Disallow:
#disallow all the spiders get ur site
User-agent: *
Disallow: /
#disallow certain spider get ur site
User-agent: spider
Disallow: /
but now many spider can pretend themselves as FF, Opera, IE.
so we need some other method. and then the 2nd.
2: If u use apache, U got the good idea.
add the following lines to the httpd.conf
At first, I ‘d like to thanks for xinbin.chen’s help.
He should be a good teacher who could teach u how to fish not just gave u some fishes.
No more words, let’s get to the point.
There are two popular methods to kick out the Robots & Spiders.
1: put robots.txt in the www root.
the rules are:
#that u want to allow the spider get ur site
User-agent: somespider
Disallow:
#disallow all the spiders get ur site
User-agent: *
Disallow: /
#disallow certain spider get ur site
User-agent: spider
Disallow: /
but now many spider can pretend themselves as FF, Opera, IE.
so we need some other method. and then the 2nd.
2: If u use apache, U got the good idea.
add the following lines to the httpd.conf
At first, I ‘d like to thanks for xinbin.chen’s help.
He should be a good teacher who could teach u how to fish not just gave u some fishes.
No more words, let’s get to the point.
There are two popular methods to kick out the Robots & Spiders.
1: put robots.txt in the www root.
the rules are:
#that u want to allow the spider get ur site
User-agent: somespider
Disallow:
#disallow all the spiders get ur site
User-agent: *
Disallow: /
#disallow certain spider get ur site
User-agent: spider
Disallow: /
but now many spider can pretend themselves as FF, Opera, IE.
so we need some other method. and then the 2nd.
2: If u use apache, U got the good idea.
add the following lines to the httpd.conf
SetEnvIfNoCase User_Agent Robot a_robot=1
SetEnvIfNoCase User_Agent Spider a_robot=1
the Robot/Spider can change to the spider/robot u want to forbidden.
# omit some lines……………………
Order allow,deny
Allow from all
Deny from env=a_robot
apache graceful , it will work.
The is a story when I did this mission. we have squid for our site. I change the proxy and don’t let it cache the dest site. but it still sth wrong. I don’t know how to reslove it until chen told me some princples of squid.
change the paras.
#this two lines for direct the domain not cache.
acl targetdomain dstdomain .urdomain.com
always_direct allow targetdomain
#this line just tell squid not cache the errors, such as ERROR/ forbidden information.
negative_ttl 0