USBHacker Posted August 29, 2008 Share Posted August 29, 2008 Hello! Is there a way (with an online utility, command line or a program) to find and make a list of all URLs and links that come from any specific domain? I don't know if this is possible, which is why I am asking here! Thanks in advance for any suggestions, USBHacker Quote Link to comment Share on other sites More sharing options...
Tenzer Posted August 29, 2008 Share Posted August 29, 2008 I'm not sure of what you are asking for. Do you want a list of all links which are on a page, ie. hak5.org, or a list of pages which links to a specific page? Quote Link to comment Share on other sites More sharing options...
ls Posted August 29, 2008 Share Posted August 29, 2008 to find al the links on different websites to a site you can use google like : link:hak5.org if you need to find al the links on a website you will need to search the source to href=site.com and to find things like wiki.hak5.org you need to write a script that tries to open different subdomains like this python script: import urllib2 subs = ["w","wi","wik","wiki"] for sub in subs: site ="http://"+sub+".hak5.org" try: urllib2.urlopen(site).readlines() print site except: pass is this what you was asking for ? Quote Link to comment Share on other sites More sharing options...
digip Posted August 29, 2008 Share Posted August 29, 2008 You could try an nslookup for internal site pages/subdomains, but most likely it will be blocked. Also, try wget and spider the site, but it won't find pages that aren't linked to directly from each page. There are other methods, but you need to start doing the research yourself. You ask a zillion questions like these, but seems you don't put any effort in to learning it yourself. This has also been discussed in another thread, if you happened to do a search on the forums, you might have known this. "USBHacker" From now on, I will refer to you as "LazyHacker". Sound a bit harsh, but maybe it's time you start reading and researching and stop with the trivial questions. Quote Link to comment Share on other sites More sharing options...
USBHacker Posted August 30, 2008 Author Share Posted August 30, 2008 I'm not sure of what you are asking for. Do you want a list of all links which are on a page, ie. hak5.org, or a list of pages which links to a specific page? I think the second question is what I want. Not just links from a certain domain, but all the URLs (if it is possible to get that!) to find al the links on different websites to a site you can use google like : link:hak5.org if you need to find al the links on a website you will need to search the source to href=site.com and to find things like wiki.hak5.org you need to write a script that tries to open different subdomains like this python script: import urllib2 subs = ["w","wi","wik","wiki"] for sub in subs: site ="http://"+sub+".hak5.org" try: urllib2.urlopen(site).readlines() print site except: pass is this what you was asking for ? Yes, thanks. I'm pretty sure that's what I'm looking for. Sorry if I sound stupid (I've only ever written python as backend for a website) but how do I use it? Please reply, and thanks for what you have already coded, USBHacker EDIT: Is that a default library? You could try an nslookup for internal site pages/subdomains, but most likely it will be blocked. Also, try wget and spider the site, but it won't find pages that aren't linked to directly from each page. There are other methods, but you need to start doing the research yourself. Thanks, I'll do as much research as I can. But you said that spidering the site won't find pages that aren't directly linked to the domain? Can you please give me a method (name) for what I can use to make this work? (or will the above python code do it for me?) Thanks in advance, USBHacker Quote Link to comment Share on other sites More sharing options...
digip Posted August 30, 2008 Share Posted August 30, 2008 Thanks, I'll do as much research as I can. But you said that spidering the site won't find pages that aren't directly linked to the domain? Can you please give me a method (name) for what I can use to make this work? (or will the above python code do it for me?) You basically would have to script a brute force to look for specific words as directories and then have the script run similar to a spider, but against your list of words. I am not goign to write it for you, as it tkaes only a fe wminutes to write a windows bat script to use wget to do this, all you would need is the prepend/append of the words to search the site against. ex: hak5.org/word1 hak5.org/word2 hak5.org/word3 , etc. You will most likely be seen on their server logs and if they have any IDS, banned form the site, so it really doesn't help much to constantly http get a site to death, as it starts to look like a DoS attack. Without ROOT access to the server, brute force enumeration is about the only way, other than any recursive file listing exploits that may exist on the target system. Quote Link to comment Share on other sites More sharing options...
USBHacker Posted August 30, 2008 Author Share Posted August 30, 2008 Thanks, for all the information. I'll do my best to make it work with your method... Would it be better to use wget on windows, wget on cygwin or wget on Linux? Or wouldn't it make a difference? Please reply, thanks in advance, USBHacker Quote Link to comment Share on other sites More sharing options...
digip Posted August 30, 2008 Share Posted August 30, 2008 Quote Link to comment Share on other sites More sharing options...
SupaRice Posted August 30, 2008 Share Posted August 30, 2008 I'm a n00b, but.... What about a recursive wget, and then grep the output for "href"? wget -l 2 -r www.hak5.org grep -R 'hak5.org' * -or- grep -R 'html://' * Not exactly elegant, but should work nonetheless. And you could offer a different browser string -U because some sites watch for wget. -U "Mozilla/4.0 (compatible; MSIE 6.0; Microsoft Windows NT 5.1)" Quote Link to comment Share on other sites More sharing options...
VaKo Posted August 30, 2008 Share Posted August 30, 2008 If hak5.org/test/ isn't linked from hak5.org/index.php then a recursive wget search will not find it. Quote Link to comment Share on other sites More sharing options...
SupaRice Posted August 30, 2008 Share Posted August 30, 2008 Oh, I must have misread the initial post.... Quote Link to comment Share on other sites More sharing options...
USBHacker Posted August 31, 2008 Author Share Posted August 31, 2008 Not to worry, nonetheless, thanks for putting in the effort. ;) Oh, and I still haven't completely worked it out. I think that I should use the python script written by ls... Thanks, I'll do as much research as I can. But you said that spidering the site won't find pages that aren't directly linked to the domain? Can you please give me a method (name) for what I can use to make this work? (or will the above python code do it for me?) Or if someone could give me the name of a technique that could be used to make it work for me... And I'll also try the wget method that digip suggested. I'll tell you how it goes! And if you know/remember/learnt of the name of a technique that could be used for this, please don't hesitate to suggest! Thanks in advance, and thanks for all suggestions made so far, USBHacker Quote Link to comment Share on other sites More sharing options...
snakey Posted August 31, 2008 Share Posted August 31, 2008 theres a Google command that does this um it on the forums somewhere go find it Quote Link to comment Share on other sites More sharing options...
digip Posted August 31, 2008 Share Posted August 31, 2008 theres a Google command that does this um it on the forums somewhere go find it google ex: "site:sitenme.com" but again, only linkable pages will be found. Quote Link to comment Share on other sites More sharing options...
USBHacker Posted August 31, 2008 Author Share Posted August 31, 2008 ^This the one? link:hak5.org EDIT: Beat me to it Quote Link to comment Share on other sites More sharing options...
ls Posted August 31, 2008 Share Posted August 31, 2008 site:hak5.org will find things like hak5.org/cast or hak5.org/feed link:hak5.org will find sites who link to hak5.org Quote Link to comment Share on other sites More sharing options...
USBHacker Posted August 31, 2008 Author Share Posted August 31, 2008 Ah, Thanks. Oh, and ls, how do I use your python script (the one you wrote earlier in the topic)? Please reply Thanks in advance, USBHacker Quote Link to comment Share on other sites More sharing options...
ls Posted August 31, 2008 Share Posted August 31, 2008 well first you will need to write a file with the subdomains to search for like this: forums intranet sales video .... then paste this code into another file called subfind.py import urllib2,sys site = sys.argv[1] subs = sys.argv[2] subs = open(subs,'r').readlines() for sub in subs: sub = sub.replace("\n","") site2 = "http://"+sub+"."+site try: urllib2.urlopen(site2).readlines() print site2 except: pass print "done" and save it now run it from the commandline like this python subfind.py google.com <the file with the subdomains> if a valid subdomain is found it will print it out Quote Link to comment Share on other sites More sharing options...
USBHacker Posted August 31, 2008 Author Share Posted August 31, 2008 Thanks Sorry to sound annoying, but wordlists won't help me. As most of the sites I need to do this against have losts of number, and might, just might have words... I will never know for sure if I am getting all the information. I will still use your technique, but if you think of a better way of doing it, please don't hesitate to suggest! Thanks in advance, USBHacker Quote Link to comment Share on other sites More sharing options...
moonlit Posted August 31, 2008 Share Posted August 31, 2008 If there's no links to a particular subdomain you probably won't find it. Brute force/dictionary attacks are the only other way. Otherwise try an offline browser. Oh, and stop with the "please reply!" and "please don't hesitate to give me all the answers!" because it's starting to fucking grate. Quote Link to comment Share on other sites More sharing options...
USBHacker Posted August 31, 2008 Author Share Posted August 31, 2008 Happy to try bruteforcing the domain, if you can tell me how! Oh, and for the readymade python script that ls kindly created for me, I couldn't get it to work. I just tried. I created a folder called Coding on my C drive. Here is everything that I have done, and attempted (much easier to give a screenshot then to attempt an explanation); Tell me what to do to get this to work! Thanks! Quote Link to comment Share on other sites More sharing options...
USBHacker Posted September 3, 2008 Author Share Posted September 3, 2008 Thanks for all help given so far, but as you can see, it still isn't working! Please help me get it to work! Quote Link to comment Share on other sites More sharing options...
USBHacker Posted October 16, 2008 Author Share Posted October 16, 2008 If I wanted the python script to work, what would I need to do? I'm happy to try it on any of the following OS's; XP 64-bit (Microsoft) Linux 64-bit (OpenSuSE) Mac Leopard (Apple) All I need to know is how to get it to work! Tell me how! Thanks in advance, Panarchy Quote Link to comment Share on other sites More sharing options...
digip Posted October 16, 2008 Share Posted October 16, 2008 If I wanted the python script to work, what would I need to do? I'm happy to try it on any of the following OS's; XP 64-bit (Microsoft) Linux 64-bit (OpenSuSE) Mac Leopard (Apple) All I need to know is how to get it to work! Tell me how! Thanks in advance, Panarchy In order for you to get it to work you would have to RTFM! Quote Link to comment Share on other sites More sharing options...
USBHacker Posted October 24, 2008 Author Share Posted October 24, 2008 LOL BTW: Still haven't gotten it to work Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.