Jump to content

HTMLgetter: Python Code gets HTML code and outputs to .txt file


Recommended Posts

I don't know, maybe somebody will find this useful in their pentesting arsenal.


# HTMLgetter v1.0 by Forgiven
# This is a handy bit of python that will reap the HTML code of any page
# and output it to a txt file of your choice.

import urllib2

urlStr = raw_input('Input the full URL of the webpage whose HTML code you which to reap:')
fileName = raw_input("Input the *.txt filename for the output:")
fileName = fileName + ".txt"
fileOut = open(fileName, "w")
fileHandle = urllib2.urlopen(urlStr)
str1 = fileHandle.read()
print '-'*50
print 'HTML code of URL =', urlStr
print '-'*50
except IOError:
print 'Cannot open URL %s for reading' % urlStr
str1 = 'error!'
print str1

I thought it was cool, creates a nice txt file of the HTML from a web page...I guess I don't have permission to upload the .py for this above. But the code is small and simple enough to cp.

You can find it on github at the link.

Edited by Forgiven
Link to comment
Share on other sites

wget http://www.somesite.com/page.html -O file.txt

would work too but good to see someone writing scripts since I'm a n00b at scripting and nice to see how python works since I mostly just do simple bash scripts for things.

Link to comment
Share on other sites

The bash scripts you guys shared are so tight! I'm going to have to learn me some of that...science is my gig.

Here's a question for you gurus: lets say that I want to logon to my favorite horse wagering site, twinspires.com from the command line. Is there a script that will pass the username and password through the form so that I can gain access to live toteboard odds when the page redirects to the wagering home page? I can't find live odds data for horsetracks anywhere else. I want to pass the odds to an app I'm writing. OR once I have already logged onto a website, a simple script that will scarf the data I need and pass it to a .csv or .txt file?

...Requests and Mechanize are pretty awesome, the BASH is way awesomer.

Link to comment
Share on other sites

Here's the HTML of the login section of twinspires

<div class="column col1" id="sidebar-left">
<div id="sidebar-outer-wrapper">
<div class="bottom-wrapper">
<div class="sidebar-container">
<div id="logged-in-user">
<div class="ajax-loading"></div>
<div class="panel-pane pane-type1 anonymous-content" id="pane-login-block">
<h2 class="pane-title">Login</h2>
<div id="login-section" class="pane-content">

<form method="post" action="https://www.twinspires.com/php/login.php">
<input type="hidden" name="destination" value="">
<input type="hidden" value="user_login" name="form_id">
<input type="hidden" value="2800" name="affid">
<input type="hidden" value="0" name="blocklogin">
<input type="hidden" value="1" name="wager">
<input id="edit-redirect" type="hidden" value="http://www.twinspires.com/wager" name="redirect">

<ul class="field-set">
<label for="username">Username:</label>
<input type="text" name="acct" id="username" class="text-box" maxlength="100" size="20">
<label for="password">Password:</label>
<input type="password" name="pin" id="password" class="text-box" maxlength="16" size="20">
<span id="reset-login-link"><a href="http://www.twinspires.com/account/password/request">forgot your login information?</a></span>
<input type="submit" class="button" value="Login" id="Login" name="Login">


Link to comment
Share on other sites

curl can do data posts with usernames and passwords, but so can wget and some sites, if don't take post but use like 401 auth, can just encode in url itself, ie: http://user:pass@site.com but I DON'T reccomend doing that on http sites or in a browser others use since it can be seen in address bar and sent in the clear.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Create New...