K1u Posted November 17, 2007 Share Posted November 17, 2007 Since people clearly liked my other tutorial I thought I would share another tutorial I wrote not to long back on my site. Enjoy guys! By K1u So alright... what is this strange file in the root of your directories you question? Let me break down what it basically is... all it basically is, is a rule set for search engines. Example of a robot.txt file. # This is my robots.txt file! User-agent: * Disallow: /idontwantthisindexedbysearchengines/ Now let me explain what it is line by line. # This is a User agent... example Firefox or Konqueror, * is anything. User-agent: * # This is a rule for search engines not to index this folder. Disallow: /idontwantthisindexedbysearchengines/ Now lets talk about why robots.txt can be dangerous. All websites out there that are using the Robots file most likely have it exposed. Here take this - http://k0h.org/robots.txt Well your probably asking what do I do now? Instead of using root folders of your "private" things, make a new folder named something like 021873257923 then store the other folder in there. Note... never ever store very important things on your Webserver, even if its protected by robots.txt. Now lets build our own robots.txt file. # This is a comment... these are ignored. User-agent: * Disallow: /273432087423374242/ User-agent: Googlebot-Image Disallow: /images # Alexa's bot is a bit aggressive so I think I shall make it wait 1 minute (60 seconds) until it can view another page. User-agent: IA_Archiver Crawl-Delay: 60 Questions! Ok... see I have over 300 folders staring with admin... none should be indexed... what do I do? Is there some sort of wildcard I can use? Simply Disallow: /admin without the ending /. Are there engines that do not obey robots.txt? Yep. My host disallows Robots.txt... They probably don't... you just have not tryed selecting view hidden files in your FTP client. Look into others methods... google is your friend. Edit: This has been added to the wiki. Edit 2: If you guys also like this I have one more tutorial for you. Quote Link to comment Share on other sites More sharing options...
SomeoneE1se Posted November 17, 2007 Share Posted November 17, 2007 Dangerous?! and if your host disallows robots.txt you host sucks ass find a better one NOW Quote Link to comment Share on other sites More sharing options...
K1u Posted November 17, 2007 Author Share Posted November 17, 2007 Dangerous?! and if your host disallows robots.txt you host sucks ass find a better one NOW The only remote reason I could think of is that they want to generate more linkage on google. But still if your host is doing this... find a better one. Quote Link to comment Share on other sites More sharing options...
SomeoneE1se Posted November 17, 2007 Share Posted November 17, 2007 and for security reasons if there's a page you don't want anyone to see don't include it in the robots.txt file ie User-Anget * Disallow /secret-plans/to/destroy-the-planet now everyone who knows about robots.txt will now your you plans here so long as no one ever links to that page no robots will no about it and you can leave it out of robots security though obscurity is always a bad idea but it does work(sometimes) Quote Link to comment Share on other sites More sharing options...
K1u Posted November 17, 2007 Author Share Posted November 17, 2007 and for security reasons if there's a page you don't want anyone to see don't include it in the robots.txt file ie User-Anget * Disallow /secret-plans/to/destroy-the-planet now everyone who knows about robots.txt will now your you plans here so long as no one ever links to that page no robots will no about it and you can leave it out of robots security though obscurity is always a bad idea but it does work(sometimes) Better yet. Disallow /secret-plans/ Inside put your folder like 983259832953i90234325 for example. Quote Link to comment Share on other sites More sharing options...
digip Posted November 18, 2007 Share Posted November 18, 2007 I stopped relying on robots.txt because 99% of the BOT's out there don't follow rules and it really isn't safe to put sensitive data in them anyway. I instead use .htaccess to ban all bots that I do not want including some browser addons in the user agent, like Yahoo slurp, etc. Depending on your servers setup, a MOD Rewrite may be configured differently, but I use the following methods to block bots on my site RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^-$ [OR] RewriteCond %{HTTP_USER_AGENT} asterias [NC,OR] RewriteCond %{HTTP_USER_AGENT} asterias/2.0 [NC,OR] RewriteCond %{HTTP_USER_AGENT} del.icio.us-thumbnails [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} Biz360 [NC,OR] RewriteCond %{HTTP_USER_AGENT} BecomeBot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Bot mailto:craftbot@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} zibber-v0.1 [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^zibber [NC,OR] RewriteCond %{HTTP_USER_AGENT} zibber [NC,OR] RewriteCond %{HTTP_USER_AGENT} HTML2JPG [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^Download Demon [OR] RewriteCond %{HTTP_USER_AGENT} DA 7.0 [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} Exabot-Images/1.0 [NC,OR] RewriteCond %{HTTP_USER_AGENT} Exabot/3.0 [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^findlinks/1.1.1-a1 [NC,OR] RewriteCond %{HTTP_USER_AGENT} FunWebProducts [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Image Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^Indy Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet Ninja [OR] RewriteCond %{HTTP_USER_AGENT} IRLbot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC Web Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister PiX [OR] RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^PEAR HTTP_Request [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web Image Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR] RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR] RewriteCond %{HTTP_USER_AGENT} ^Bitacle [OR] RewriteCond %{HTTP_USER_AGENT} Slurp China [NC,OR] RewriteCond %{HTTP_USER_AGENT} Yahoo [NC,OR] RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR] RewriteCond %{HTTP_USER_AGENT} yplus [NC,OR] RewriteCond %{HTTP_USER_AGENT} YPC [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Yahoo! Slurp [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Yahoo-MMCrawler [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^YahooSeeker [NC,OR] RewriteCond %{HTTP_USER_AGENT} Yahoo. [NC,OR] RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Yahoo-Blogs [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^msnbot [OR] RewriteCond %{HTTP_USER_AGENT} ^NaverBot-1.0 [OR] RewriteCond %{HTTP_USER_AGENT} ^SurveyBot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Bitacle bot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla/2.0 [OR] RewriteCond %{HTTP_USER_AGENT} ^Inktomi Slurp [OR] RewriteCond %{HTTP_USER_AGENT} LeechGet [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer_Bot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^ichiro [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebVulnCrawl.blogspot.com [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Bookmark-Manager [NC,OR] RewriteCond %{HTTP_USER_AGENT} HbTools [NC,OR] RewriteCond %{HTTP_USER_AGENT} SBIder [NC,OR] RewriteCond %{HTTP_USER_AGENT} edgeio-retriever [NC,OR] RewriteCond %{HTTP_USER_AGENT} BlogsSay [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Sphere Scout&v4.0 [NC,OR] RewriteCond %{HTTP_USER_AGENT} Snapbot [NC,OR] RewriteCond %{HTTP_USER_AGENT} Trend Micro [NC,OR] RewriteCond %{HTTP_USER_AGENT} TencentTraveler [NC,OR] RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR] RewriteCond %{HTTP_USER_AGENT} psycheclone [NC,OR] RewriteCond %{HTTP_USER_AGENT} Jakarta Commons-HttpClient/3.0-rc4 [NC,OR] RewriteCond %{HTTP_USER_AGENT} TargetYourNews.com [NC,OR] RewriteCond %{HTTP_USER_AGENT} vobsub [NC,OR] RewriteCond %{HTTP_USER_AGENT} NimbleCrawler [NC,OR] RewriteCond %{HTTP_USER_AGENT} DOJ3jx7bf [NC,OR] # Referrer RewriteCond %{HTTP_REFERER} shroudedbunnies.net [NC,OR] RewriteCond %{HTTP_REFERER} fourth-plateau.org [NC,OR] # Hosts RewriteCond %{REMOTE_HOST} dana.com [NC,OR] RewriteCond %{REMOTE_HOST} bezeqint.net [NC,OR] RewriteCond %{REMOTE_HOST} ai.net [NC,OR] RewriteCond %{REMOTE_HOST} videotron.ca [NC,OR] RewriteCond %{REMOTE_HOST} amazon.com [NC,OR] RewriteCond %{REMOTE_HOST} cipherkey.net [NC,OR] RewriteCond %{REMOTE_HOST} t-dialin.net [NC,OR] RewriteCond %{REMOTE_HOST} exabot.com [NC,OR] RewriteCond %{REMOTE_HOST} yahoo.* [NC,OR] RewriteCond %{REMOTE_HOST} msmcorp.com [NC,OR] RewriteCond %{REMOTE_HOST} husfranchisee.com [NC,OR] #Specific Domains or countrie codes RewriteCond %{REMOTE_HOST} .ch [NC,OR] RewriteCond %{REMOTE_HOST} .gov [NC,OR] RewriteCond %{REMOTE_HOST} .mil [NC,OR] RewriteCond %{REMOTE_HOST} .sc [NC,OR] RewriteCond %{REMOTE_HOST} .ws [NC,OR] RewriteCond %{REMOTE_HOST} inktomisearch.* [NC] RewriteRule ^.* - [F,L] Quote Link to comment Share on other sites More sharing options...
K1u Posted November 18, 2007 Author Share Posted November 18, 2007 I stopped relying on robots.txt because 99% of the BOT's out there don't follow rules and it really isn't safe to put sensitive data in them anyway. I instead use .htaccess to ban all bots that I do not want including some browser addons in the user agent, like Yahoo slurp, etc. Depending on your servers setup, a MOD Rewrite may be configured differently, but I use the following methods to block bots on my site RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^-$ [OR] RewriteCond %{HTTP_USER_AGENT} asterias [NC,OR] RewriteCond %{HTTP_USER_AGENT} asterias/2.0 [NC,OR] RewriteCond %{HTTP_USER_AGENT} del.icio.us-thumbnails [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR] RewriteCond %{HTTP_USER_AGENT} Biz360 [NC,OR] RewriteCond %{HTTP_USER_AGENT} BecomeBot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Bot mailto:craftbot@yahoo.com [OR] RewriteCond %{HTTP_USER_AGENT} zibber-v0.1 [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^zibber [NC,OR] RewriteCond %{HTTP_USER_AGENT} zibber [NC,OR] RewriteCond %{HTTP_USER_AGENT} HTML2JPG [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR] RewriteCond %{HTTP_USER_AGENT} ^Custo [OR] RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR] RewriteCond %{HTTP_USER_AGENT} ^Download Demon [OR] RewriteCond %{HTTP_USER_AGENT} DA 7.0 [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR] RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR] RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR] RewriteCond %{HTTP_USER_AGENT} ^Express WebPictures [OR] RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR] RewriteCond %{HTTP_USER_AGENT} Exabot-Images/1.0 [NC,OR] RewriteCond %{HTTP_USER_AGENT} Exabot/3.0 [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR] RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR] RewriteCond %{HTTP_USER_AGENT} ^findlinks/1.1.1-a1 [NC,OR] RewriteCond %{HTTP_USER_AGENT} FunWebProducts [OR] RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR] RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [OR] RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR] RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR] RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR] RewriteCond %{HTTP_USER_AGENT} ^HMView [OR] RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR] RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Image Stripper [OR] RewriteCond %{HTTP_USER_AGENT} ^Image Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^Indy Library [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR] RewriteCond %{HTTP_USER_AGENT} ^Internet Ninja [OR] RewriteCond %{HTTP_USER_AGENT} IRLbot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR] RewriteCond %{HTTP_USER_AGENT} ^JOC Web Spider [OR] RewriteCond %{HTTP_USER_AGENT} ^larbin [OR] RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Mass Downloader [OR] RewriteCond %{HTTP_USER_AGENT} ^MIDown tool [OR] RewriteCond %{HTTP_USER_AGENT} ^Mister PiX [OR] RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR] RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR] RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR] RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Net Vampire [OR] RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline Explorer [OR] RewriteCond %{HTTP_USER_AGENT} ^Offline Navigator [OR] RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR] RewriteCond %{HTTP_USER_AGENT} ^Papa Foto [OR] RewriteCond %{HTTP_USER_AGENT} ^pavuk [OR] RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR] RewriteCond %{HTTP_USER_AGENT} ^PEAR HTTP_Request [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR] RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR] RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR] RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR] RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR] RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport Pro [OR] RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR] RewriteCond %{HTTP_USER_AGENT} ^Web Image Collector [OR] RewriteCond %{HTTP_USER_AGENT} ^Web Sucker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR] RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR] RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR] RewriteCond %{HTTP_USER_AGENT} ^WebGo IS [OR] RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [OR] RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR] RewriteCond %{HTTP_USER_AGENT} ^Website eXtractor [OR] RewriteCond %{HTTP_USER_AGENT} ^Website Quester [OR] RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR] RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR] RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %{HTTP_USER_AGENT} ^Widow [OR] RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [OR] RewriteCond %{HTTP_USER_AGENT} ^Xaldon WebSpider [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus [OR] RewriteCond %{HTTP_USER_AGENT} ^Gigabot [OR] RewriteCond %{HTTP_USER_AGENT} ^Bitacle [OR] RewriteCond %{HTTP_USER_AGENT} Slurp China [NC,OR] RewriteCond %{HTTP_USER_AGENT} Yahoo [NC,OR] RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR] RewriteCond %{HTTP_USER_AGENT} yplus [NC,OR] RewriteCond %{HTTP_USER_AGENT} YPC [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Yahoo! Slurp [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Yahoo-MMCrawler [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^YahooSeeker [NC,OR] RewriteCond %{HTTP_USER_AGENT} Yahoo. [NC,OR] RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Yahoo-Blogs [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^msnbot [OR] RewriteCond %{HTTP_USER_AGENT} ^NaverBot-1.0 [OR] RewriteCond %{HTTP_USER_AGENT} ^SurveyBot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Bitacle bot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Mozilla/2.0 [OR] RewriteCond %{HTTP_USER_AGENT} ^Inktomi Slurp [OR] RewriteCond %{HTTP_USER_AGENT} LeechGet [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^OmniExplorer_Bot [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^ichiro [NC,OR] RewriteCond %{HTTP_USER_AGENT} WebVulnCrawl.blogspot.com [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Bookmark-Manager [NC,OR] RewriteCond %{HTTP_USER_AGENT} HbTools [NC,OR] RewriteCond %{HTTP_USER_AGENT} SBIder [NC,OR] RewriteCond %{HTTP_USER_AGENT} edgeio-retriever [NC,OR] RewriteCond %{HTTP_USER_AGENT} BlogsSay [NC,OR] RewriteCond %{HTTP_USER_AGENT} ^Sphere Scout&v4.0 [NC,OR] RewriteCond %{HTTP_USER_AGENT} Snapbot [NC,OR] RewriteCond %{HTTP_USER_AGENT} Trend Micro [NC,OR] RewriteCond %{HTTP_USER_AGENT} TencentTraveler [NC,OR] RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR] RewriteCond %{HTTP_USER_AGENT} psycheclone [NC,OR] RewriteCond %{HTTP_USER_AGENT} Jakarta Commons-HttpClient/3.0-rc4 [NC,OR] RewriteCond %{HTTP_USER_AGENT} TargetYourNews.com [NC,OR] RewriteCond %{HTTP_USER_AGENT} vobsub [NC,OR] RewriteCond %{HTTP_USER_AGENT} NimbleCrawler [NC,OR] RewriteCond %{HTTP_USER_AGENT} DOJ3jx7bf [NC,OR] # Referrer RewriteCond %{HTTP_REFERER} shroudedbunnies.net [NC,OR] RewriteCond %{HTTP_REFERER} fourth-plateau.org [NC,OR] # Hosts RewriteCond %{REMOTE_HOST} dana.com [NC,OR] RewriteCond %{REMOTE_HOST} bezeqint.net [NC,OR] RewriteCond %{REMOTE_HOST} ai.net [NC,OR] RewriteCond %{REMOTE_HOST} videotron.ca [NC,OR] RewriteCond %{REMOTE_HOST} amazon.com [NC,OR] RewriteCond %{REMOTE_HOST} cipherkey.net [NC,OR] RewriteCond %{REMOTE_HOST} t-dialin.net [NC,OR] RewriteCond %{REMOTE_HOST} exabot.com [NC,OR] RewriteCond %{REMOTE_HOST} yahoo.* [NC,OR] RewriteCond %{REMOTE_HOST} msmcorp.com [NC,OR] RewriteCond %{REMOTE_HOST} husfranchisee.com [NC,OR] #Specific Domains or countrie codes RewriteCond %{REMOTE_HOST} .ch [NC,OR] RewriteCond %{REMOTE_HOST} .gov [NC,OR] RewriteCond %{REMOTE_HOST} .mil [NC,OR] RewriteCond %{REMOTE_HOST} .sc [NC,OR] RewriteCond %{REMOTE_HOST} .ws [NC,OR] RewriteCond %{REMOTE_HOST} inktomisearch.* [NC] RewriteRule ^.* - [F,L] Thank you. I recently had Shelob 1.0 visit my site but he ran off and has not been seen since. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.