A vocabulary for media fingerprint algorithms

Posted by on Apr 14, 2016 in Uncategorised

When we begun work on Videorooter, we felt that one of the most difficult tasks ahead of us would be to find algorithms suitable for our use. While there are definitely not so many algorithms for videos, there’s definitely a fair share of algorithms for images and sound (and remember, a video is essentially a sequence of images with sound). In one way, having many algorithms is good. The algorithm you decide on for a particular project depend on your application: algorithms have different strengths and weaknesses.

On the other hand, if I give you the fingerprint f81bf91ffb803400e07f0c7d049f058706013e033fe33fe11f600e618ea30def without any other information, you’d be hard pressed to know what to do with it, and how to compare this against any other fingerprints you have which may or may not have been generated with the same algorithm. Even if I wanted to convey to you that this fingerprint is a 256 bit blockhash, I don’t really have a language to do so which can be interpreted unequivocally by a computer. I can’t just say “it’s a 256 bit blockhash” and expect a computer to understand that this is the same as if I say “blockhash (256 bits)”.

We need a vocabulary for fingerprint algorithms. Something which can be used in computer to computer interaction and convey with no uncertainty which algorithm we’re talking about when we’re communicating a fingerprint.

To this end, we’ve started putting this in practice over at the Videorooter github, establishing a list of known algorithms, giving them unique identifiers and outlining what we consider important for an algorithm to have. Essentially:

  • We assign an algorithm a URN (actually a namespace identifier, and in most cases an experimental or informal one, unless there’s a draft or published URN for an algorithm)
  • We describe which media types this specific algorithm is intended for
  • We record the URL of the specification document, and two links to reference implementations

The namespace specific string in the URN, which would follow after the namespace identifier, depend on the algorithm implementation. But at least, having this specified would allow us to give the fingerprint urn:x-bhvideo-phash:f81bf91ffb803400e07f0c7d049f058706013e033fe33fe11f600e618ea30def and there would be little uncertainty as to how that fingerprint should be interpreted. You could just look it up in the table, and you’d even have links to reference implementations!

Leave a Reply