Jump to content

what mp3md5 feature in CF is all about


Recommended Posts

Hey,

I've been releasing pandora's-jar CF versions, one of the most important features is the mp3md5. (and one of the most important bug-fixes is that anything after an "&" in the song name / artist is cropped by the plain 'noob' 7.4.0) [scroll to the end of this message for info on upgrading]

I'll try to explain what mp3md5 it is all about.

*** WHY IT IS NEEDED: ***

Pandora's jar, as great as it is, sometimes wrongfully names/tags a file.

grabbing one file and naming it with an entirely different artist/title.

if this was just a bug - it would be better to simply correct it rather than creating a workaround. but this happens for lots of different reasons that are dependent on many parameters such as internet connection speed, what other programs are running, amount of files in the temp directory and so on.

While I'm working on solutions for many of the known scenarios that cause mismatches (slow connection, changing stations without immediately refreshing the window and so on), I am aware that this issue will never be solved entirely (in the current way pandora's jar 7.4.0 is working).

so the MP3MD5 is here to help us all to reduce the mismatches to minimum.

*** WHAT MD5 IS: ****

MD5 is a hash function, it returns a fixed length (32 byte) value for any input. this is a good way of finger-printing stuff. it is highly unlikely (odds are astronomical) that two different mp3 files will have the same md5-hash.

by collecting "votes" on pairs of [MD5 , artist/title] one can know how certain it is that a specific file is actually what it is tagged to be.

at first - you don't know any files - so every vote for a pairing of md5 and artist/title is also the first vote (and hence the highest vote). this is not of much use - but once you have more than one vote per file - you can distinguish the mismatched ones.

*** EXAMPLE: ***

Let's say you're listening to the song "Bleeding Heart" by "The Go Find",

the md5 file of the file that is about to be grabbed is "X". PJ sends in a vote saying "I think the file with md5 X is Bleeding Heart/The Go Find".

1)

If your vote matches the highest vote found for "X" in the db - your file will be saved in the archive along with all the rest of your grabbed files.

If it does not match the highest vote - the file will be saved in a folder named "mismatches" under the directory from which you are executing the jar. the file name will be it's 32 byte MD5. it will still have the ID3 tags of whatever it was detected to be.

2)

The amount of votes for "X is matched to Bleeding Heart/The Go Find" in the DB is increased by 1.

*** WHAT'S PLANNED FOR THE FUTURE: ***

1) a utility to do two important things

a) re run an md5 verification of the songs grabbed, against the online db - hence enabling you to find songs that have been wrongly named (as the db grows - more and more songs are indexed)

b) find out what the correct artist/title for files that have been tagged as mismatches / unclassifieds and then tag and name them correctly into the archive.

I'm working on these right now.

*** CONFIGURATION: ***

in versions CF_1 to CF_6 mp3md5-online-participation is set to off, hence it works with a local 'db' file, collecting votes for files played locally. this means that you will be able to spot mistakes only after you've heard the song at least once. this is not so great. it is good for files that are marked as "positive" and which pandora will play a lot.

if 'mp3md5.enabled' (in CF1-6) is set to true, the votes are made on an online database (which to date contains over 15,000 different songs). by opting-in to participate you are exposing your ip-address to whomever is running the database [by default it is mapped to a server I'm running]. I don't see this as any risk - but as it is not something I would want to do behind anyones back - it is set as an opt-in option (off unless enabled manually).

by participating in the online distributing voting system - you have a better chance of detecting mismatched tagging - as most chances are that your vote is not the first one. And of course you are contributing to the database - hence helping others.

in version CF7 and on the configuration format has changed, it is still set by default to work locally, but now instead of uncommenting a line in order to opt in, you have to change it from "local" to "online" (or to "off" to disable it altogether)

*** WHERE DO I GET PJ_CF ***

The project's source and binaries are available on sourceforge.net

you can get both the 'noobs' 7.4.0 baseline and the jars of CF at the download page:

http://sourceforge.net/project/showfiles.php?group_id=184854

once you have 7.4.0 installed and working - simply replace the jar file and the configuration file with the newer ones.

if you have made any changes to the configuration - integrate them into the new config file before overwriting your old one.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...