digip Posted January 1, 2008 Share Posted January 1, 2008 Is there a way to use php to scrape the contents of a frame or iframe before rendering it and paste it into a string file for parsing? I have the whole concept of "file_get_contents" for extracting urls to a temp file and then pasting them wherever I want, like into the current page, etc but I want to take data from the current page from a navigated subframe and parse it out as text without passing a url using "file_get_contents". So basically take the inner html and pass it to the outer html and then return it back to the inner html sanitized. Example, if I follow links in a frame or iframe the pages I go to display in the frames. I want to scrape that data using somthing like dom or php to pull specific things out, like remove images, ads, javascript, etc and rewrite it to the frame or iframe. Basically making a safe browser or text browser in php. So if a person navigates through the frames before it loads it into the frame it catches its request and strips out certain data before returning it to the user. Im thinking it probalby would need some sort of xml request or somthing to do the pulling and reqrite all links within the frame to send back to the outer html/php side of the page. The key reason being is that with file_get_contents I usually supply it with a url either through a post form or directly in the script as a variable, and I want to do it on the fly when they move from page to page within the frame. This way they never have to post the url to trough a form or script, but just surf normally and the php outer html will do all the safegaurding dynamically to the inner html contents. Quote Link to comment Share on other sites More sharing options...
psychoaliendog Posted January 2, 2008 Share Posted January 2, 2008 I browsers prevent that type of access, unless the page in the frame is from the same domain as the page containing it. Its a XSS vulnerability. you could have a form like: <form action="cleanMe.php" method="get"> <input type="text" id="url" name="url" /> <input type="submit" /> </form> and cleanMe.php could look like: <?php $page = file_get_contents($_GET['url']); //process $page echo $page; ?> when you process $page you would then rewrite all the links as cleanMe.php?url=<url> Quote Link to comment Share on other sites More sharing options...
jollyrancher82 Posted January 2, 2008 Share Posted January 2, 2008 Look at the DOMDocument class in PHP. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.