Stumbled on the TED video below today while looking at another search engine. It’s becoming a hobby of mine. This one is DuckDuckGo, which just has to be the most ridiculous naming idea i’ve seen in a while! Still, simple interface, no tracking, interesting built-in functions, automatic filtering of those link harvesting scum, mediocre to poor search results (at least for me).
The talk very clearly expresses a point that i’ve struggled to get my head around for a while: personalised search is impossible to do correctly. Why? Because the ranking algorithm can never see into the mind of the user. It has to assume that ‘past behaviour is an indication of future success’, which is obviously problematic.
By way of example: I’m a hardcore climate change denier. The search history that my search provider has dutifully collected for me over years of my denial greatly helps them in presenting me with highly relevant climate change denial links. If there are two groups of sites out there, one presenting straightforward scientific analysis and another trashing the latest bunk for those crazy government fed hippy freaks, it’s far more likely that the top results will be from the debunkers. No wonder my denial is bordering on delusion.
Yes, it’s an obviously hyperbolic example, but not actually all that dissimilar to those given in the video. The tools we’re using are insulating us from views that conflict with those we’re already deemed to hold.
The idea that FB would presume to decide which of your friends you’d like to hear from is bizarre to me. If you have so many contacts that you can’t process all their output, it probably a sign that you’ve exceeded your online Dumbar number (I made that part up), not that you’d like FB to start silencing a few people.
I’m fully expecting someone to lightheartedly wish them luck sorting of the terribly diverse set of search terms / friends that they deal with… That works as a reason not to care, if you’re convinced that the rest of society holds an equally diverse set of views, or is managing to maintain them in the face of this winnowing.
The other interesting factoid was the Heinz Varieties sized number (57?!) of parameters that G is using as input to the ranking. I’m struggling to imagine what on earth those can be; starting with obvious stuff, location, browser, browser version (fun to think about what conclusion you’d draw for ranking based on how up to date the browser is!), O/S, screen size, connected via https, er, and now i’m struggling. Anyway, the point is that any ranking system that works off up to 57 parameters sounds ridiculously over designed.
Maybe they’re expecting to achieve sentience? If it does hopefully it’ll quickly tell them they’re wasting their time trying to implement mind reading…
Edit: longer discussion on BBC Radio. Worth a listen.