Jump to content

How to spider a website on a local PHP server with CeWL?


reftoken
 Share

Recommended Posts

Hi all,

I am trying to use CeWL in order to get a wordlist from a website I am developing locally. Since trying with the direct path leads to errors, I tried to start a local development server and did the following:

# In the local directory
$ php -S localhost:8080
$ cewl -m 5 -w output.txt http://localhost:8080/

However, CeWL aborts almost immediately, leaving me with a list of 30 - 40 words. It basically just spiders the homepage and doesn't go further. Even with -d 5 or -o, it doesn't seem to proceed as expected.

Do you know of an alternative way to fetch words from local files?

Link to comment
Share on other sites

If you run it with --debug it will show you all the URLs it finds and will say either why it is following them or why it is ignoring them. My guess would be that the links coming off the homepage go to a different URL and so are considered offsite and not touched.

Link to comment
Share on other sites

I have got the following error when trying with --debug:

/usr/bin/cewl: unrecognized option `--debug'

So I tried this and tried to allow it to go offsite:

$ cewl -v -o -m 5 -c -w output.txt http://localhost:8080/

...and almost immediately got:

Starting at http://localhost:8080/blog/
Visiting: http://localhost:8080/blog/, got response code 200
Attribute text found:

Unable to connect to the site (http://localhost:80/blog/index.html)

The following error may help:
Failed to open TCP connection to localhost:80 (Connection refused - connect(2) for "localhost" port 80)
/usr/lib/ruby/2.3.0/net/http.rb:882:in `rescue in block in connect'
/usr/lib/ruby/2.3.0/net/http.rb:879:in `block in connect'
/usr/lib/ruby/2.3.0/timeout.rb:91:in `block in timeout'
/usr/lib/ruby/2.3.0/timeout.rb:101:in `timeout'
/usr/lib/ruby/2.3.0/net/http.rb:878:in `connect'
/usr/lib/ruby/2.3.0/net/http.rb:863:in `do_start'
/usr/lib/ruby/2.3.0/net/http.rb:852:in `start'
/usr/lib/ruby/2.3.0/net/http.rb:1398:in `request'
/usr/bin/cewl:281:in `get_page'
/usr/bin/cewl:212:in `block (2 levels) in start!'
/usr/bin/cewl:210:in `each'
/usr/bin/cewl:210:in `block in start!'
/usr/bin/cewl:198:in `each'
/usr/bin/cewl:198:in `start!'
/usr/bin/cewl:165:in `start_at'
/usr/bin/cewl:744:in `block in <main>'
/usr/bin/cewl:734:in `catch'
/usr/bin/cewl:734:in `<main>'

Caller
/usr/bin/cewl:233:in `get_page'
/usr/bin/cewl:212:in `block (2 levels) in start!'
/usr/bin/cewl:210:in `each'
/usr/bin/cewl:210:in `block in start!'
/usr/bin/cewl:198:in `each'
/usr/bin/cewl:198:in `start!'
/usr/bin/cewl:165:in `start_at'
/usr/bin/cewl:744:in `block in <main>'
/usr/bin/cewl:734:in `catch'
/usr/bin/cewl:734:in `<main>'


Writing words to file

Any idea what's going on here? Also, it's strange that it tries to access localhost:80 when I specify localhost:8080.

Link to comment
Share on other sites

Is the web server started? Have apache installed? If you open "http://localhost:8080/blog/" in a browser, does it load properly? If not, that is where I would start.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...