Enumerate images on websites

Ruck · June 26, 2017

Recently I have subscribed to a website for Real Estate agency, since I am look to buy a house. One of the requirements was to upload a scan of my passport. Already a bit anxious about the security I have covered sensitive elements with black tape before scanning and added a watermark stating the purpose of the scan before uploading.

Not really suprised I got a mail stating my subscription was received with a direct link to the passport image uploaded. I have subscribed a second time to find out that the proces is the same and the link showed similar layout/components:

<URL> / attachment_answers/000/428/835/<filename>.jpg.jpg?<id value>

Compairing the received URL's to the image I did not find usable logic (for me).

The numbering in the URL seems to be site generated (and problably related to project and subscription numbering), but not easy guessable/predictable. The filename corresponds to my uploaded filename + jpg extension (hence the double jpg). The ID value does not seem to prevent anything (removing still displays the passport image).

Since I did not discover any real security countermeasures, I am wondering if tooling exists or could easily be created/scripted that is able to discover other images on the site, some sort of image scraping. Googling this question only returns scrapers with (wordlisting? bruteforcing?) filenames or directories. I want to know both for my own education (I'm active in the IT audit & security domain) and to be able to notify the Real Estate agency, but provided with a Proof of Concept (if it's easy to perform and within legal boundaries/responsible disclosure).

Could anyone indicate wheter or not this is possible? And maybe an indication of effort and/or tooling required?

If this requires additional information and/or further background of my intentions, please let me know what is required.

digininja · June 26, 2017

Yes, it would be fairly simple to script up a tool to enumerate something like this but with the large potential address space it is unlikely to find anything. If you want to see an example, this is a similar tool I wrote years ago to look through Amazon buckets:

https://digi.ninja/projects/bucket_finder.php

I would strongly advise against doing it and I wouldn't take any proof of concepts to the estate agents as doing so would be admitting to performing unauthorised testing against their site. They may be grateful, they may get police and lawyers involved.

Dave-ee Jones · June 26, 2017

8 hours ago, digininja said:

Yes, it would be fairly simple to script up a tool to enumerate something like this but with the large potential address space it is unlikely to find anything. If you want to see an example, this is a similar tool I wrote years ago to look through Amazon buckets:

https://digi.ninja/projects/bucket_finder.php

I would strongly advise against doing it and I wouldn't take any proof of concepts to the estate agents as doing so would be admitting to performing unauthorised testing against their site. They may be grateful, they may get police and lawyers involved.

Well you could at least state that their site may be very insecure. You have the right to do that if they are asking for personal information - and they probably have a license agreement saying that they will not disclose your information to the outside world. If they are, accidentally, on their website you are allowed to go to them and say, "Hey, why aren't you trying to secure our passports? You know, those really important things that have details on who we are? Shouldn't you NOT put those on the internet?"

Usually they ask you to bring your passport so they can scan it there in person, they should NEVER ask you to upload photos of it. That's just poor quality service, and security too.

digip · June 27, 2017

There are a number of http fuzzing tools out there, and also wordlist based directory busters as well. gobuster is one of my favs, takes a wordlist to check for directories and can pass file extensions to it. By default it does that and sub domain DNS queries, but you can build on it for http fuzzing as well by scripting parts of the URL to reproduce for each request, which can be set to only show the desired HTTP responses, ie: http 200 ok, 301, 302, 404, 403, etc, or a combination there in. You could do similar with curl and wget as well, although curl will just pass what it sees to console unless you tell it to write to output or redirect the output, wget will actually download the files each time. wfuzz is another one I've played with briefly and works similar to the directory brute list scripts with some more variation on the parts of the URL and timing. There is also dirb and dirbuster, similar to gobuster but one of them has a GUI interface as well, if that is your thing. Most of these are linux based only too.

Ruck · June 27, 2017

Thanks for your responses. Those tools you mention (gobuster, dirbuster and HTTP fuzzing) have a lot of options to look into and unless I am able to figure out what a tool does and create/generates I am not going to use them in real life environments.

I do not understand how fuzzing could benefit the goal mentioned ("submitting lots of invalid or unexpected data to a target"). Although error messages can provide loads of information, it does not indicates wheter an (specific) file exists in a directory, right?

Based on the concept of directory busters though: <URL> / attachment_answers/000/428/835/<filename>.jpg.jpg?<id value>

The URL domain is fixed, and so is the first directory (attachment_answers), so no hard parts there. The next part is variable (/000/428/835), but I assume based on project/subscription id's.

With those tooling you mention, would it require to provide a directory listing, including (all?) possible combinations? And thus wordlisting the directory/path? Since this part is integer based it wouldn't be to hard to script manually if required, but creates a lot of combinations.

The filename is user-provided, so cannot be guessed easily (although I would expect a lot of users using passport.jpg?!). Is in these kind of 'attacks' only wordlisting the possible solution? Or is there a way to retrieve from a directory only based on file-type? (so the name is black box?)

And I am going to point out the Real Estate agency that there policies and security are not up to date, I have found several contradictions in their Privacy policies, processing of data and email confirmations received. But I would also like to mention how easy/difficult it would be for a person to discover those (passport) files, not solely based on an assumption, but based on own expectations/experience):

"I notice passport files out to the open with limited/no security, I expect those files can be accessed (very) easy by doing ABC" This way I can create an open window for further testing with approval/assignment. I mentioned some privacy concers earlier in the process and they cared very less, due to vivid amount of customers (high housing turn-over currently, thus lots of willing customers to not care about privacy and not complaining).

The other option would be stepping up a step towards (privacy) authorities or privacy fighters/journalists, but I want to give the agency a fair chance.

digininja · June 27, 2017

When you don't know the correct values for a parameter then it is fuzzing so you'd be fuzzing the filenames and the numeric parameters. You could technically say you are just iterating through the numbers but it is just a type of fuzzing.

You would look at the responses and base decisions on that. You might find that if you get a 500 back rather than a 302 if you change the 428 in your example to 429 so you know that 429 is not a valid value and move on, if 430 gets you a 302 then you can assume you've hit a valid value and move on to the 835 part.

The only way to work out what is valid and what isn't is through experimentation, some times it is obvious, a 200 is OK, anything else is wrong, or it may be really tricky and you have to base answers on the time of responses.

digip · June 27, 2017

Easiest test is manually done by hand, changing one number at a time and then seeing what is shown. Once you get an idea of what the base formats are you can script it to grab everything. In cases where you aren't sure what they are due to some function that creates the naming convention of stored output, fuzzing is more or less the only way to find them, either with or without the use of wordlists, which aid in the process. If you don't know what a program does and it's output, then try on a test box to learn.

If you know certain things are going to happen and always be static, you incorporate this into your process, only randomizing bits you are uncertain of. That's where the fuzzing helps. numbers are easier to test with since looking at above, there are only 999 combinations in each directory(or so it would seem from what you posted) replacing any of the /000/428/835 sections with incrementing numbers. If it's only the <filename> and<id> parts you need to randomize and are text based words and numbers for the ID, then a wordlist would help for the filename and using the same ID number to test, then increment would be my thoughts. If the filename is a random string, that is more where fuzzing helps, but if its a hash, you could test using your original filename through a series of hashed output to look for a match to determine if it's something like md5 or such, then you have more to work with and enumerate upon using wordlists to create a hashed naming part to work from as well.

It all depends on what is happing and if you can enumerate enough about the system to determine the basics that are static, and then fuzz the rest.

Sign In

Enumerate images on websites

Recommended Posts

Ruck

Link to comment

Share on other sites

digininja

Link to comment

Share on other sites

Dave-ee Jones

Link to comment

Share on other sites

digip

Link to comment

Share on other sites

Ruck

Link to comment

Share on other sites

digininja

Link to comment

Share on other sites

digip

Link to comment

Share on other sites

Join the conversation

Recently Browsing 0 members

Browse

Activity