Search Quality and Filter Bubbles

Stumbled on the TED video below today while looking at another search engine. It’s becoming a hobby of mine. This one is DuckDuckGo, which just has to be the most ridiculous naming idea i’ve seen in a while! Still, simple interface, no tracking, interesting built-in functions, automatic filtering of those link harvesting scum, mediocre to poor search results (at least for me).

The talk very clearly expresses a point that i’ve struggled to get my head around for a while: personalised search is impossible to do correctly. Why? Because the ranking algorithm can never see into the mind of the user. It has to assume that ‘past behaviour is an indication of future success’, which is obviously problematic.

By way of example: I’m a hardcore climate change denier. The search history that my search provider has dutifully collected for me over years of my denial greatly helps them in presenting me with highly relevant climate change denial links. If there are two groups of sites out there, one presenting straightforward scientific analysis and another trashing the latest bunk for those crazy government fed hippy freaks, it’s far more likely that the top results will be from the debunkers. No wonder my denial is bordering on delusion.

Yes, it’s an obviously hyperbolic example, but not actually all that dissimilar to those given in the video. The tools we’re using are insulating us from views that conflict with those we’re already deemed to hold.

The idea that FB would presume to decide which of your friends you’d like to hear from is bizarre to me. If you have so many contacts that you can’t process all their output, it probably a sign that you’ve exceeded your online Dumbar number (I made that part up), not that you’d like FB to start silencing a few people.

I’m fully expecting someone to lightheartedly wish them luck sorting of the terribly diverse set of search terms / friends that they deal with… That works as a reason not to care, if you’re convinced that the rest of society holds an equally diverse set of views, or is managing to maintain them in the face of this winnowing.

The other interesting factoid was the Heinz Varieties sized number (57?!) of parameters that G is using as input to the ranking. I’m struggling to imagine what on earth those can be; starting with obvious stuff, location, browser, browser version (fun to think about what conclusion you’d draw for ranking based on how up to date the browser is!),  O/S, screen size, connected via https, er, and now i’m struggling. Anyway, the point is that any ranking system that works off up to 57 parameters sounds ridiculously over designed.

Maybe they’re expecting to achieve sentience? If it does hopefully it’ll quickly tell them they’re wasting their time trying to implement mind reading…

Edit: longer discussion on BBC Radio. Worth a listen.

Advertisements

7 thoughts on “Search Quality and Filter Bubbles

  1. I was going to call “old” on you but it looks like TED only posted that video in May. It feels much older to me. Perhaps it turned up elsewhere first.

    Anyway, I’d say turn it on its head and see that as far as human nature is concerned Google are getting it exactly right, in that they are feeding people exactly the results they want. If they’re given results that don’t match their personal beliefs, then they’ll consider the results wrong, that Google have failed to find the most relevant answers for their query. By delivering personalised results Google are delivering what people want to find. That’s Google’s job.

    I’m not saying it’s not an important problem. I think it’s a massively important problem. But Google and all the other personalised filtering systems didn’t make the problem. We made it. It lies in our nature. Google are simply doing what consumers want, and doing otherwise would leave people thinking that Google had got it wrong. But in reality it’s us who have it wrong.

    Although it’s useful to recognise the limits of the scope of the problem. Not all subjects are controversial, and when I search for a topic, having the search engine understand which specific details of that topic I’m most interested in has high value. I regularly search for answers to programming questions, and Google is aware enough to know which programming languages I’m most likely to want those answers to match. The problem isn’t universal to all searches, only to searches where personalised focus will shutter our eyes from either important conflicting information or useful information that’s outside the scope of our usual focus.

    I can think of two useful responses to the problem. Firstly, these personalised filtering systems should be explicit, in that they openly inform us of the nature of the personalisations. That would serve the dual purposes of teaching us about our own biases and helping us to be aware of what information we’re missing.

    To partly serve that purpose they could add a toggle such as “Show me the global population consensus results”, or provide toggles to see results from various common perspectives: “Show me the Java programmer consensus”, “Show me the Europe resident consensus”, “Show me the religious consensus”, etc, etc.

    The second response to the problem I think stems from the first, but also from education. The knowledge of how heavily subjective we all are in our beliefs, even though to us they might seem strictly objective, is relatively recent scientific knowledge. That’s something that should be taught. Kids should grow up being aware that they’re full of shit, and have a basic grasp of the science behind why they are. Perhaps that’s a bit too much to hope for, but it’s still a valid part of a good solution.

    • Completely agree, from a human nature point of view G has this completely correct. It would make no sense for them to a) challenge their derived view of the users profile (it would make a mockery of their stated aim of providing better search results through profiling) b) challenge the users view of the world (it would lead to user frustration).

      The problem is, this is obviously not a good state of affairs. It’s correct to opine that not all subjects are controversial, but the problem is that we’d never be able to agree on which ones.

      I’m not sure how it would be possible to arrive at consensus results for a wide range of subjects. The trivial ones (which probably turn out to be the important science based issues), but what of the emotive issues? The issues that with the best will in the world, and all the evidence available, people are reluctant to resolve?

      My feeling is that something like Page Rank™ was probably the right solution as it’s a simple to explain and easily comprehensible metric. Unfortunately it’s been augmented to support Google’s main business, and in the process picked up some (probably) un-intended consequences.

      The ability to see results, ordered by a publicly available criteria / process seems a much simpler answer than attempting to determine consensus results. Unfortunately i’d guess it’s about as likely as educating kids with the skills to be critical of their society.

      • It seems to me trivially easy to arrive at consensus results for all subjects, especially if you are Google. Before they tailored results to individuals they were delivering universal consensus results. That part is easy. The universal consensus view is simply the original Page Rank (or rather its latest incarnation) minus any personalisation.

        Also providing other-group consensus results should be relatively simple for them too. If they can deliver me personalised results, and they know how many people there are who share similar views to me, then they can algorithmically find boundaries between different perspective groups, give them an algorithm derived name, and provide access to results that suit that group’s perspective.

        When I search for “PHP” Google already knows what results I expect and what results most PHP programmers expect, and they also know what results Java programmers expect. To allow us to see results through another group’s eyes I think would be a simple technological feat considering what they’re already capable of doing and the information they have available to them.

        If down the left column it said something like:

        “These results are personalised specifically to you. Your profile is most similar to the group identified as ‘PHP Programmers’. Select below to see results to your query as tailored to other prominent groups:
        Universal Consensus group
        Java Programmers
        Objective-C Programmers
        Non Programmers
        etc, etc”

        Or for the keywords “climate change”:

        “These results are personalised specifically to you. Your profile is most similar to the group identified as ‘Scientific Consensus Follower’. Select below to see results to your query as tailored to other prominent groups:
        Universal Consensus group
        Europeans
        Americans
        Women
        Men
        Skeptics
        Deniers
        etc, etc”

        Obviously my example group names are shite, and Google’s algorithms could provide much deeper insight into what types of groups exist in greatest numbers, but you get the idea.

        • “If they can deliver me personalised results, and they know how many people there are who share similar views to me, then they can algorithmically find boundaries between different perspective groups, give them an algorithm derived name, and provide access to results that suit that group’s perspective.”

          Isn’t that the point though? They can never really know how many people there are who share similar views? For each subject there is going to be a different distribution of people who are reading what they agree with, what challenges them, or what they remain neutral upon.

          if i understand the basics of Page rank™ it doesn’t work that way; it makes a simple assumption that more commonly linked pages are more interesting (yeah, yeah and a bunch of other stuff… but that’s the basic insight). Remove personalisation completely and then your scheme makes perfect sense.

          It feels like the problem of confirmation bias (or whatever the proper name is for this reinforcement of view that are already held) is indirect consequence of attempting to put users into easily categorised pots (as that’s what G’s customer base pays for) and then exposing that categorisation becomes a mis-feature. Exposing the categorisation through personalised search is obviously a cool thing to attempt to do with that data if you’re collecting it anyway, but it feels there needs to be a way to turn it off as it’s has ‘side-effects’.

          Without knowing it, i’ve been taking part in an experiment to see what happens if you completely remove personalisation. And, my conclusion, now that i’ve been made aware of the experiment, is that Gs results are still better than anything else.

          • “Isn’t that the point though? They can never really know how many people there are who share similar views?”

            That’s how recommendation systems work. That’s what they do.

            “Remove personalisation completely and then your scheme makes perfect sense.”

            Remove personalisation for the global consensus view, yes. The personalisation information is necessary for finding and collating the views of major perspective groups.

            “but it feels there needs to be a way to turn it off as it’s has ‘side-effects’.”

            Turning it off is the first item in each of my example lists: “Universal consensus group”, which is essentially Page Rank minus personalisation. Having that view available to everyone for every search query would be the first big win. Then being able to see results through the eyes of other groups is the next big win, and I think one that could do a lot of good in helping people to understand each other better.

            “And, my conclusion, now that i’ve been made aware of the experiment, is that Gs results are still better than anything else.”

            Which is why I don’t have high hopes for seeing Google replaced any time soon. Microsoft and others have tried and failed. It’s a massive challenge and Google have thrown massive computing power and brain power at it. To do better, or even to do equally as well, is a big ask.

            • Sorry, can’t help myself:

              I met a traveller from an antique land
              Who said: “Two vast and trunkless legs of stone
              Stand in the desert. Near them on the sand,
              Half sunk, a shattered visage lies, whose frown
              And wrinkled lip and sneer of cold command
              Tell that its sculptor well those passions read
              Which yet survive, stamped on these lifeless things,
              The hand that mocked them and the heart that fed.
              And on the pedestal these words appear:
              `My name is Ozymandias, King of Kings:
              Look on my works, ye mighty, and despair!’
              Nothing beside remains. Round the decay
              Of that colossal wreck, boundless and bare,
              The lone and level sands stretch far away”.

              Seems like we agree that something should change, and that it’s unlikely to happen. Awesome.

            • “That’s how recommendation systems work. That’s what they do.”

              Really? Doesn’t a recommendation system work off a much clearer signal? If someone buys something, it’s a clear indication that they value something enough to hand over the required money to obtain it. Or, clicking a ‘like’, ‘i own this’ sends a clear signal about music / book / movie preferences.

              You can’t make the same assumption about clicking on a link, or searching for a term. The intent is not obvious to you, and therefore you can’t infer much of anything from the action.

              You can make some statistical guesses about what the intent might have been based on past behaviour, but then you’re right back to where we started, “past behaviour is no indication of future success.”

              It should be noted that i’m only making this distinction based on an understanding that the use of personalisation in search is agreed to be resulting confirmation bias (still don’t know if that’s the right term for this), and not some great insight of what might be possible given enough data and modelling of an individuals behaviour.

              The dilemma is surely that if it is possible to provide all these different views / rankings of the data, and G decides that it’s better to show you only what it thinks you want, it’s behaviour could be considered irresponsible / unethical. I guess that comes down to whether there is any benefit for them in presenting a version of reality too keep you sweet. Are you more likely to click on an ad on a page that gives you that warm ‘in my comfort zone’ feeling, or that cold ‘wtf is this commie / fascist shit’ feeling.

              Once a service becomes popular to the point of monopoly and achieves some kind of ‘utility’ status, it doesn’t seem unreasonable to hold it to a higher standard. I’d guess that regulation is unlikely to happen any time soon, esp. in the states, but maybe that’s the route to go elsewhere.

Wise words...

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s