Jump to content

Robots.txt harmless? Or dangerous?


K1u

Recommended Posts

Since people clearly liked my other tutorial I thought I would share another tutorial I wrote not to long back on my site. Enjoy guys!

By K1u

So alright... what is this strange file in the root of your directories you question?

Let me break down what it basically is... all it basically is, is a rule set for search engines.

Example of a robot.txt file.

# This is my robots.txt file!

User-agent: *

Disallow: /idontwantthisindexedbysearchengines/

Now let me explain what it is line by line.

# This is a User agent... example Firefox or Konqueror, * is anything.

User-agent: *

# This is a rule for search engines not to index this folder.

Disallow: /idontwantthisindexedbysearchengines/

Now lets talk about why robots.txt can be dangerous.

All websites out there that are using the Robots file most likely have it exposed.

Here take this - http://k0h.org/robots.txt

Well your probably asking what do I do now? Instead of using root folders of your "private" things, make a new folder named something like 021873257923 then store the other folder in there. Note... never ever store very important things on your Webserver, even if its protected by robots.txt.

Now lets build our own robots.txt file.

# This is a comment... these are ignored.

User-agent: *

Disallow: /273432087423374242/

User-agent: Googlebot-Image

Disallow: /images

# Alexa's bot is a bit aggressive so I think I shall make it wait 1 minute (60 seconds) until it can view another page.

User-agent: IA_Archiver

Crawl-Delay: 60

Questions!

Ok... see I have over 300 folders staring with admin... none should be indexed... what do I do? Is there some sort of wildcard I can use?

Simply Disallow: /admin without the ending /.

Are there engines that do not obey robots.txt?

Yep.

My host disallows Robots.txt...

They probably don't... you just have not tryed selecting view hidden files in your FTP client. Look into others methods... google is your friend.

Edit: This has been added to the wiki.

Edit 2: If you guys also like this I have one more tutorial for you.

Link to comment
Share on other sites

Dangerous?!

and if your host disallows robots.txt you host sucks ass find a better one NOW

The only remote reason I could think of is that they want to generate more linkage on google. But still if your host is doing this... find a better one.

Link to comment
Share on other sites

and for security reasons if there's a page you don't want anyone to see don't include it in the robots.txt file

ie

User-Anget *

Disallow /secret-plans/to/destroy-the-planet

now everyone who knows about robots.txt will now your you plans

here so long as no one ever links to that page no robots will no about it and you can leave it out of robots

security though obscurity is always a bad idea but it does work(sometimes)

Link to comment
Share on other sites

and for security reasons if there's a page you don't want anyone to see don't include it in the robots.txt file

ie

User-Anget *

Disallow /secret-plans/to/destroy-the-planet

now everyone who knows about robots.txt will now your you plans

here so long as no one ever links to that page no robots will no about it and you can leave it out of robots

security though obscurity is always a bad idea but it does work(sometimes)

Better yet.

Disallow /secret-plans/

Inside put your folder like 983259832953i90234325 for example.

Link to comment
Share on other sites

I stopped relying on robots.txt because 99% of the BOT's out there don't follow rules and it really isn't safe to put sensitive data in them anyway. I instead use .htaccess to ban all bots that I do not want including some browser addons in the user agent, like Yahoo slurp, etc. Depending on your servers setup, a MOD Rewrite may be configured differently, but I use the following methods to block bots on my site

RewriteEngine On 
RewriteCond %{HTTP_USER_AGENT} ^-$ [OR] 
RewriteCond %{HTTP_USER_AGENT} asterias [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} asterias/2.0 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} del.icio.us-thumbnails [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] 
RewriteCond %{HTTP_USER_AGENT} Biz360 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} BecomeBot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bot mailto:craftbot@yahoo.com [OR] 
RewriteCond %{HTTP_USER_AGENT} zibber-v0.1 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^zibber [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} zibber [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} HTML2JPG [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Download Demon [OR] 
RewriteCond %{HTTP_USER_AGENT} DA 7.0 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Express WebPictures [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] 
RewriteCond %{HTTP_USER_AGENT} Exabot-Images/1.0 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Exabot/3.0 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^findlinks/1.1.1-a1 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} FunWebProducts [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] 
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] 
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ia_archiver  [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image Stripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Indy Library [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Internet Ninja [OR] 
RewriteCond %{HTTP_USER_AGENT} IRLbot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JOC Web Spider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] 
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mass Downloader [OR] 
RewriteCond %{HTTP_USER_AGENT} ^MIDown tool [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mister PiX [OR] 
RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR]  
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Net Vampire [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline Explorer [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline Navigator [OR] 
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Papa Foto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] 
RewriteCond %{HTTP_USER_AGENT} ^PEAR HTTP_Request [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Teleport Pro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web Image Collector [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebGo IS [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website eXtractor [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website Quester [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Xaldon WebSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bitacle [OR] 
RewriteCond %{HTTP_USER_AGENT} Slurp China [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Yahoo [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} yplus [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} YPC [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Yahoo! Slurp [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Yahoo-MMCrawler [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^YahooSeeker [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Yahoo. [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Yahoo-Blogs [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^msnbot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NaverBot-1.0 [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SurveyBot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bitacle bot [NC,OR]  
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/2.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Inktomi Slurp [OR] 
RewriteCond %{HTTP_USER_AGENT} LeechGet [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer_Bot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^ichiro [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} WebVulnCrawl.blogspot.com [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bookmark-Manager [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} HbTools [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} SBIder [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} edgeio-retriever [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} BlogsSay [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Sphere Scout&v4.0 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Snapbot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Trend Micro [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} TencentTraveler [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} psycheclone [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Jakarta Commons-HttpClient/3.0-rc4 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} TargetYourNews.com [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} vobsub [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} NimbleCrawler [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} DOJ3jx7bf [NC,OR] 

# Referrer
RewriteCond %{HTTP_REFERER} shroudedbunnies.net [NC,OR] 
RewriteCond %{HTTP_REFERER} fourth-plateau.org [NC,OR] 
# Hosts
RewriteCond %{REMOTE_HOST} dana.com [NC,OR] 
RewriteCond %{REMOTE_HOST} bezeqint.net [NC,OR] 
RewriteCond %{REMOTE_HOST} ai.net [NC,OR] 
RewriteCond %{REMOTE_HOST} videotron.ca [NC,OR] 
RewriteCond %{REMOTE_HOST} amazon.com [NC,OR] 
RewriteCond %{REMOTE_HOST} cipherkey.net [NC,OR] 
RewriteCond %{REMOTE_HOST} t-dialin.net [NC,OR] 
RewriteCond %{REMOTE_HOST} exabot.com [NC,OR] 
RewriteCond %{REMOTE_HOST} yahoo.* [NC,OR] 
RewriteCond %{REMOTE_HOST} msmcorp.com [NC,OR] 
RewriteCond %{REMOTE_HOST} husfranchisee.com [NC,OR] 
#Specific Domains or countrie codes
RewriteCond %{REMOTE_HOST} .ch [NC,OR] 
RewriteCond %{REMOTE_HOST} .gov [NC,OR] 
RewriteCond %{REMOTE_HOST} .mil [NC,OR] 
RewriteCond %{REMOTE_HOST} .sc [NC,OR] 
RewriteCond %{REMOTE_HOST} .ws [NC,OR] 
RewriteCond %{REMOTE_HOST} inktomisearch.* [NC] 
RewriteRule ^.* - [F,L]

Link to comment
Share on other sites

I stopped relying on robots.txt because 99% of the BOT's out there don't follow rules and it really isn't safe to put sensitive data in them anyway. I instead use .htaccess to ban all bots that I do not want including some browser addons in the user agent, like Yahoo slurp, etc. Depending on your servers setup, a MOD Rewrite may be configured differently, but I use the following methods to block bots on my site

RewriteEngine On 
RewriteCond %{HTTP_USER_AGENT} ^-$ [OR] 
RewriteCond %{HTTP_USER_AGENT} asterias [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} asterias/2.0 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} del.icio.us-thumbnails [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] 
RewriteCond %{HTTP_USER_AGENT} Biz360 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} BecomeBot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bot mailto:craftbot@yahoo.com [OR] 
RewriteCond %{HTTP_USER_AGENT} zibber-v0.1 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^zibber [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} zibber [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} HTML2JPG [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Download Demon [OR] 
RewriteCond %{HTTP_USER_AGENT} DA 7.0 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] 
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Express WebPictures [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] 
RewriteCond %{HTTP_USER_AGENT} Exabot-Images/1.0 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Exabot/3.0 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^findlinks/1.1.1-a1 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} FunWebProducts [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] 
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] 
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] 
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ia_archiver  [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image Stripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Image Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Indy Library [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Internet Ninja [OR] 
RewriteCond %{HTTP_USER_AGENT} IRLbot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] 
RewriteCond %{HTTP_USER_AGENT} ^JOC Web Spider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] 
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mass Downloader [OR] 
RewriteCond %{HTTP_USER_AGENT} ^MIDown tool [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Mister PiX [OR] 
RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR]  
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Net Vampire [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline Explorer [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Offline Navigator [OR] 
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Papa Foto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] 
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] 
RewriteCond %{HTTP_USER_AGENT} ^PEAR HTTP_Request [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Teleport Pro [OR] 
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web Image Collector [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Web Sucker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebGo IS [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website eXtractor [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Website Quester [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] 
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Xaldon WebSpider [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bitacle [OR] 
RewriteCond %{HTTP_USER_AGENT} Slurp China [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Yahoo [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} yplus [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} YPC [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Yahoo! Slurp [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Yahoo-MMCrawler [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^YahooSeeker [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Yahoo. [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Yahoo-Blogs [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^msnbot [OR] 
RewriteCond %{HTTP_USER_AGENT} ^NaverBot-1.0 [OR] 
RewriteCond %{HTTP_USER_AGENT} ^SurveyBot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bitacle bot [NC,OR]  
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/2.0 [OR]
RewriteCond %{HTTP_USER_AGENT} ^Inktomi Slurp [OR] 
RewriteCond %{HTTP_USER_AGENT} LeechGet [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer_Bot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^ichiro [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} WebVulnCrawl.blogspot.com [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Bookmark-Manager [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} HbTools [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} SBIder [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} edgeio-retriever [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} BlogsSay [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} ^Sphere Scout&v4.0 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Snapbot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Trend Micro [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} TencentTraveler [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} psycheclone [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} Jakarta Commons-HttpClient/3.0-rc4 [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} TargetYourNews.com [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} vobsub [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} NimbleCrawler [NC,OR] 
RewriteCond %{HTTP_USER_AGENT} DOJ3jx7bf [NC,OR] 

# Referrer
RewriteCond %{HTTP_REFERER} shroudedbunnies.net [NC,OR] 
RewriteCond %{HTTP_REFERER} fourth-plateau.org [NC,OR] 
# Hosts
RewriteCond %{REMOTE_HOST} dana.com [NC,OR] 
RewriteCond %{REMOTE_HOST} bezeqint.net [NC,OR] 
RewriteCond %{REMOTE_HOST} ai.net [NC,OR] 
RewriteCond %{REMOTE_HOST} videotron.ca [NC,OR] 
RewriteCond %{REMOTE_HOST} amazon.com [NC,OR] 
RewriteCond %{REMOTE_HOST} cipherkey.net [NC,OR] 
RewriteCond %{REMOTE_HOST} t-dialin.net [NC,OR] 
RewriteCond %{REMOTE_HOST} exabot.com [NC,OR] 
RewriteCond %{REMOTE_HOST} yahoo.* [NC,OR] 
RewriteCond %{REMOTE_HOST} msmcorp.com [NC,OR] 
RewriteCond %{REMOTE_HOST} husfranchisee.com [NC,OR] 
#Specific Domains or countrie codes
RewriteCond %{REMOTE_HOST} .ch [NC,OR] 
RewriteCond %{REMOTE_HOST} .gov [NC,OR] 
RewriteCond %{REMOTE_HOST} .mil [NC,OR] 
RewriteCond %{REMOTE_HOST} .sc [NC,OR] 
RewriteCond %{REMOTE_HOST} .ws [NC,OR] 
RewriteCond %{REMOTE_HOST} inktomisearch.* [NC] 
RewriteRule ^.* - [F,L]

Thank you. I recently had Shelob 1.0 visit my site but he ran off and has not been seen since.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...