Forgiven Posted September 16, 2013 Share Posted September 16, 2013 (edited) I don't know, maybe somebody will find this useful in their pentesting arsenal. #!/usr/local/bin/python# HTMLgetter v1.0 by Forgiven# This is a handy bit of python that will reap the HTML code of any page# and output it to a txt file of your choice.import urllib2urlStr = raw_input('Input the full URL of the webpage whose HTML code you which to reap:')fileName = raw_input("Input the *.txt filename for the output:")fileName = fileName + ".txt"fileOut = open(fileName, "w")try: fileHandle = urllib2.urlopen(urlStr) str1 = fileHandle.read() fileHandle.close() print '-'*50 print 'HTML code of URL =', urlStr print '-'*50except IOError: print 'Cannot open URL %s for reading' % urlStr str1 = 'error!'fileOut.writelines(str1)print str1fileOut.close() I thought it was cool, creates a nice txt file of the HTML from a web page...I guess I don't have permission to upload the .py for this above. But the code is small and simple enough to cp. You can find it on github at the link. Edited September 16, 2013 by Forgiven Quote Link to comment Share on other sites More sharing options...
digip Posted September 16, 2013 Share Posted September 16, 2013 wget http://www.somesite.com/page.html -O file.txt would work too but good to see someone writing scripts since I'm a n00b at scripting and nice to see how python works since I mostly just do simple bash scripts for things. Quote Link to comment Share on other sites More sharing options...
Forgiven Posted September 16, 2013 Author Share Posted September 16, 2013 wget http://www.somesite.com/page.html -O file.txt would work too but good to see someone writing scripts since I'm a n00b at scripting and nice to see how python works since I mostly just do simple bash scripts for things. Gotta love linux man. Quote Link to comment Share on other sites More sharing options...
newbi3 Posted September 17, 2013 Share Posted September 17, 2013 I just wrote something similar to this yesterday for a much different purpose and I used the requests library. I suggest you check it out! http://docs.python-requests.org/en/latest/ Quote Link to comment Share on other sites More sharing options...
Mr-Protocol Posted September 17, 2013 Share Posted September 17, 2013 Even easier: curl anysiteyouwish.com > local.html Curl has a ton of flexability: man curl Quote Link to comment Share on other sites More sharing options...
Forgiven Posted September 17, 2013 Author Share Posted September 17, 2013 The bash scripts you guys shared are so tight! I'm going to have to learn me some of that...science is my gig. Here's a question for you gurus: lets say that I want to logon to my favorite horse wagering site, twinspires.com from the command line. Is there a script that will pass the username and password through the form so that I can gain access to live toteboard odds when the page redirects to the wagering home page? I can't find live odds data for horsetracks anywhere else. I want to pass the odds to an app I'm writing. OR once I have already logged onto a website, a simple script that will scarf the data I need and pass it to a .csv or .txt file? ...Requests and Mechanize are pretty awesome, the BASH is way awesomer. Quote Link to comment Share on other sites More sharing options...
Forgiven Posted September 17, 2013 Author Share Posted September 17, 2013 Here's the HTML of the login section of twinspires <div class="column col1" id="sidebar-left"> <div id="sidebar-outer-wrapper"> <div class="bottom-wrapper"> <div class="sidebar-container"> <div id="logged-in-user"> <div class="ajax-loading"></div> <div class="panel-pane pane-type1 anonymous-content" id="pane-login-block"> <h2 class="pane-title">Login</h2> <div id="login-section" class="pane-content"> <form method="post" action="https://www.twinspires.com/php/login.php"> <input type="hidden" name="destination" value=""> <input type="hidden" value="user_login" name="form_id"> <input type="hidden" value="2800" name="affid"> <input type="hidden" value="0" name="blocklogin"> <input type="hidden" value="1" name="wager"> <input id="edit-redirect" type="hidden" value="http://www.twinspires.com/wager" name="redirect"> <ul class="field-set"> <li> <label for="username">Username:</label> <input type="text" name="acct" id="username" class="text-box" maxlength="100" size="20"> </li> <li> <label for="password">Password:</label> <input type="password" name="pin" id="password" class="text-box" maxlength="16" size="20"> </li> <li> <span id="reset-login-link"><a href="http://www.twinspires.com/account/password/request">forgot your login information?</a></span> <input type="submit" class="button" value="Login" id="Login" name="Login"> </li> </ul> </form> Quote Link to comment Share on other sites More sharing options...
digip Posted September 17, 2013 Share Posted September 17, 2013 curl can do data posts with usernames and passwords, but so can wget and some sites, if don't take post but use like 401 auth, can just encode in url itself, ie: http://user:pass@site.com but I DON'T reccomend doing that on http sites or in a browser others use since it can be seen in address bar and sent in the clear. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.