Jump to content

Finding all the links from a domain?


USBHacker

Recommended Posts

Hello!

Is there a way (with an online utility, command line or a program) to find and make a list of all URLs and links that come from any specific domain?

I don't know if this is possible, which is why I am asking here!

Thanks in advance for any suggestions,

USBHacker

Link to comment
Share on other sites

to find al the links on different websites to a site you can use google like :

link:hak5.org

if you need to find al the links on a website you will need to search the source to href=site.com

and to find things like wiki.hak5.org you need to write a script that tries to open different subdomains

like this python script:

import urllib2
subs = ["w","wi","wik","wiki"]
for sub in subs:
    site ="http://"+sub+".hak5.org"
    try:
          urllib2.urlopen(site).readlines()
          print site
    except:
        pass

is this what you was asking for ?

Link to comment
Share on other sites

You could try an nslookup for internal site pages/subdomains, but most likely it will be blocked. Also, try wget and spider the site, but it won't find pages that aren't linked to directly from each page. There are other methods, but you need to start doing the research yourself. You ask a zillion questions like these, but seems you don't put any effort in to learning it yourself. This has also been discussed in another thread, if you happened to do a search on the forums, you might have known this.

"USBHacker" From now on, I will refer to you as "LazyHacker". Sound a bit harsh, but maybe it's time you start reading and researching and stop with the trivial questions.

Link to comment
Share on other sites

I'm not sure of what you are asking for. Do you want a list of all links which are on a page, ie. hak5.org, or a list of pages which links to a specific page?

I think the second question is what I want. Not just links from a certain domain, but all the URLs (if it is possible to get that!)

to find al the links on different websites to a site you can use google like :

link:hak5.org

if you need to find al the links on a website you will need to search the source to href=site.com

and to find things like wiki.hak5.org you need to write a script that tries to open different subdomains

like this python script:

import urllib2
subs = ["w","wi","wik","wiki"]
for sub in subs:
    site ="http://"+sub+".hak5.org"
    try:
          urllib2.urlopen(site).readlines()
          print site
    except:
        pass

is this what you was asking for ?

Yes, thanks. I'm pretty sure that's what I'm looking for. Sorry if I sound stupid (I've only ever written python as backend for a website) but how do I use it?

Please reply, and thanks for what you have already coded,

USBHacker

EDIT: Is that a default library?

You could try an nslookup for internal site pages/subdomains, but most likely it will be blocked. Also, try wget and spider the site, but it won't find pages that aren't linked to directly from each page. There are other methods, but you need to start doing the research yourself.

Thanks, I'll do as much research as I can. But you said that spidering the site won't find pages that aren't directly linked to the domain? Can you please give me a method (name) for what I can use to make this work? (or will the above python code do it for me?)

Thanks in advance,

USBHacker

Link to comment
Share on other sites

Thanks, I'll do as much research as I can. But you said that spidering the site won't find pages that aren't directly linked to the domain? Can you please give me a method (name) for what I can use to make this work? (or will the above python code do it for me?)

You basically would have to script a brute force to look for specific words as directories and then have the script run similar to a spider, but against your list of words. I am not goign to write it for you, as it tkaes only a fe wminutes to write a windows bat script to use wget to do this, all you would need is the prepend/append of the words to search the site against. ex: hak5.org/word1 hak5.org/word2 hak5.org/word3 , etc.

You will most likely be seen on their server logs and if they have any IDS, banned form the site, so it really doesn't help much to constantly http get a site to death, as it starts to look like a DoS attack. Without ROOT access to the server, brute force enumeration is about the only way, other than any recursive file listing exploits that may exist on the target system.

Link to comment
Share on other sites

I'm a n00b, but....

What about a recursive wget, and then grep the output for "href"?

wget -l 2 -r www.hak5.org
grep -R 'hak5.org' *
-or-
grep -R 'html://' *

Not exactly elegant, but should work nonetheless. And you could offer a different browser string -U because some sites watch for wget.

-U "Mozilla/4.0 (compatible; MSIE 6.0; Microsoft Windows NT 5.1)"

Link to comment
Share on other sites

Not to worry, nonetheless, thanks for putting in the effort.

;)

Oh, and I still haven't completely worked it out. I think that I should use the python script written by ls...

Thanks, I'll do as much research as I can. But you said that spidering the site won't find pages that aren't directly linked to the domain? Can you please give me a method (name) for what I can use to make this work? (or will the above python code do it for me?)

Or if someone could give me the name of a technique that could be used to make it work for me...

And I'll also try the wget method that digip suggested.

I'll tell you how it goes!

And if you know/remember/learnt of the name of a technique that could be used for this, please don't hesitate to suggest!

Thanks in advance, and thanks for all suggestions made so far,

USBHacker

Link to comment
Share on other sites

theres a Google command that does this um it on the forums somewhere go find it

google ex:

"site:sitenme.com" but again, only linkable pages will be found.

Link to comment
Share on other sites

well first you will need to write a file with the subdomains to search for like this:

forums
intranet
sales
video
....

then paste this code into another file called subfind.py

import urllib2,sys
site = sys.argv[1]
subs = sys.argv[2]
subs = open(subs,'r').readlines()
for sub in subs:
	sub = sub.replace("\n","")
	site2 = "http://"+sub+"."+site
	try:
		urllib2.urlopen(site2).readlines()
		print site2
	except:
		pass
print "done"

and save it

now run it from the commandline like this

python subfind.py google.com <the file with the subdomains>

if a valid subdomain is found it will print it out

Link to comment
Share on other sites

Thanks

Sorry to sound annoying, but wordlists won't help me. As most of the sites I need to do this against have losts of number, and might, just might have words... I will never know for sure if I am getting all the information.

I will still use your technique, but if you think of a better way of doing it, please don't hesitate to suggest!

Thanks in advance,

USBHacker

Link to comment
Share on other sites

If there's no links to a particular subdomain you probably won't find it. Brute force/dictionary attacks are the only other way. Otherwise try an offline browser.

Oh, and stop with the "please reply!" and "please don't hesitate to give me all the answers!" because it's starting to fucking grate.

Link to comment
Share on other sites

:rolleyes:

Happy to try bruteforcing the domain, if you can tell me how!

Oh, and for the readymade python script that ls kindly created for me, I couldn't get it to work. I just tried.

I created a folder called Coding on my C drive.

Here is everything that I have done, and attempted (much easier to give a screenshot then to attempt an explanation);

119xgxu.jpg

Tell me what to do to get this to work!

Thanks!

Link to comment
Share on other sites

  • 1 month later...

If I wanted the python script to work, what would I need to do?

I'm happy to try it on any of the following OS's;

XP 64-bit (Microsoft)

Linux 64-bit (OpenSuSE)

Mac Leopard (Apple)

All I need to know is how to get it to work!

Tell me how!

Thanks in advance,

Panarchy

Link to comment
Share on other sites

If I wanted the python script to work, what would I need to do?

I'm happy to try it on any of the following OS's;

XP 64-bit (Microsoft)

Linux 64-bit (OpenSuSE)

Mac Leopard (Apple)

All I need to know is how to get it to work!

Tell me how!

Thanks in advance,

Panarchy

In order for you to get it to work you would have to RTFM!

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...