Jump to content

OCR, Barcode Reading, and UPC lookup


Darren Kitchen

Recommended Posts

In 4x11 I demonstrated GOCR in combination with the UPCDatabase from http://jocr.sf.net and http://www.upcdatabase.com

The image I used is www.hak5.org/temp/codes2.jpg

The commands I used for the proof of concept were:

djpg -pnm -grayscale codes2.jpg codes2.pbm
gocr -m 4 codes2.pbm
grep 725274831586 items.csv

With a little glue and a PHP twitter API script I figure this could be turned into a fun bot. Anyone want to help with this project?

Link to comment
Share on other sites

Small ideas:

Zebra (http://zebra.sourceforge.net/) is also great for decoding UPC/EAN - for linux it supports live-capture from a webcam, windows image files only. Works well even if the barcode is rotated some degrees.

German users may have a look at http://openean.kaufkauf.net/ - nearly the same concept but german.

I have a little setup to fetch metadata for my dvd/cds based on a Webcam, Zebra and Amazon/IMDB. Its mostly c&p from mythtv, http://www.linux-magazin.de/heft_abo/ausga...kettenschwindel (german) and http://amazon.com/soap

Link to comment
Share on other sites

Imagine if you will a bot that could not only add the item to a wishlist, but could ALSO get your location from your iPhone, read the barcode, and respond with a message that says "Hey, you're standing in FYE and that movie is $25 here, but at the Virgin Records two blocks down it's only $20."

Actually, their was a service that you could use with a cell phone that worked like a moblog, only you send in your image and it responded with like the top five lowest prices for the item. I forget the name of it though, I believe it's defunct.

What you need in there Darren is a little grep and cut ninja action to pull the barcode out. Lemme play with the output here and getcha some commands.

Link to comment
Share on other sites

Well, the windows version of cut won't let me use " as a delimiter, but this should work in theory:

.\bin\djpeg.exe -pnm -grayscale temp.jpg temp.pbm
.\bin\gocr.exe -m 4 temp.pbm | .\bin\grep.exe code | .\bin\cut.exe -d " -f 6

which should spit out your UPC. I can get it to work with -d = -f 4, but that gives me something like

"BARCODE" crc

Waitaminute. Does that thing produce a valid block of XML?

Link to comment
Share on other sites

Please correct me if I am wrong, but GOCR is used to convert images, jpeg, to text. So I could take my class note and scan then, then use GOCR to convert them to odt?

ALSO

Can some one post an example screenshot of what is being discussed? Please, I still do not understand fully what is being discussed.

Link to comment
Share on other sites

Please correct me if I am wrong, but GOCR is used to convert images, jpeg, to text. So I could take my class note and scan then, then use GOCR to convert them to odt?

ALSO

Can some one post an example screenshot of what is being discussed? Please, I still do not understand fully what is being discussed.

Sure, you could use any OCR software to take your notes and convert them to editable text. Handwriting is seldom easy to convert though. What we're talking about is snapping a photo of a barcode on a product in a store and having the software do internet queries with it.

Link to comment
Share on other sites

I've got a prototype up at @upc_test on twitter. Feel free to reply to @upc_test with an direct barcode image url or twitpic link.

I'll be checking it manually for a few hours, but I should have the script on minutely cron tonight.

The real limitation here seems to be the upcdatabase.com data.

Link to comment
Share on other sites

I just tried the twitter bot. Haven't gotten a reply back but it's only been about 5 minutes.

I sent a link to this image: http://www.hak5.org/temp/code2.jpg (The 2600 magazine bar code)

I'd love to see what kind of code you guys are using.

Not to derail this but on a semi-related topic I was working in great length building a twitter bot using the twitter XMPP when they supported IM. I really wanted to finish the twitter bot I was writing but got sidetracked. What Twitter code are you using?

Link to comment
Share on other sites

I just tried the twitter bot. Haven't gotten a reply back but it's only been about 5 minutes.

I sent a link to this image: http://www.hak5.org/temp/code2.jpg (The 2600 magazine bar code)

It has worked by now...the Twitter API doesn't seem to be exactly up to the minute I guess...

I'd love to see what kind of code you guys are using.

I implemented this bot in Python. Here's the code. I added lots of probably unnecessary comments in case anyone has trouble following. You need djpeg and gocr in your path. You also need to be able to write to the current directory and items.csv from upcdatabase.com needs to be in the same directory as the script. I don't do any explicit checking for any of this...the code is just a quick hack that works.

If anyone wants me to polish up this code for further development or use, just let me know.

Not to derail this but on a semi-related topic I was working in great length building a twitter bot using the twitter XMPP when they supported IM. I really wanted to finish the twitter bot I was writing but got sidetracked.

Let me know if you need another coder on this.

What Twitter code are you using?

I am using python-twitter, and it is wonderful.

Link to comment
Share on other sites

I see it did work but it replied "@Darren Kitchen" instead of @hak5darren so I didn't see it right away.

I'll have a look at the script and try to feature it on the show we're shooting tonight (413). I don't have a lot of python experience but I can tweak just about anything that's well commented and this looks to be (as well as hella tight dude)

Link to comment
Share on other sites

I see it did work but it replied "@Darren Kitchen" instead of @hak5darren so I didn't see it right away.

I'll have a look at the script and try to feature it on the show we're shooting tonight (413). I don't have a lot of python experience but I can tweak just about anything that's well commented and this looks to be (as well as hella tight dude)

Ugh sorry...that's a bug. Here's the script with the fix (also fixed the link in my original post). I had name where I needed screen_name, and I didn't set my test account's name so I didn't notice :angry:

I'll make this fix live on @upc_test tomorrow night.

ok so what do i have to do compile the code and then what?

haha

where do i post it so that i can text/email from my phone to check the barcode in the database

Just message a direct image link to @upc_test on twitter. If you want to run the code on your own bot account, you just need a recent install of python and the python-twitter module.

Link to comment
Share on other sites

I've been working on a bespoke project for uni where you can scan in a bar code and it creates a Harvard Reference and Cite for the book/cd/dvd you scanned, using the Amazon Web Search as a Service in VB.NET 2008. There's a lot of info on it, although you do need an amazon account (free to set-up) to use it. (Throws of my first thought to roll it out as an open source project to the uni.

The coding is fairly simple, it reads the bar code and searches the database for the corresponding ISBN or CDDB code, it then strips (for the books) the author's name, year, place and company of publication, title and everything else you need for an academic reference and concatenates it into a single formatted Rich Text string which is then copied to clipboard. I'm going to add functionality to create full bibliographies with it. It'll make it a hell of a lot easier to just scan in the UPC's with a £12 bar code scanner from Maplins than trawling through the pages trying to find the year of publication or the name of the person that wrote that chapter.

Here's the C# reference for the AWS.

http://developer.amazonwebservices.com/con...;categoryID=195

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...