RSS
24 Apr 2009

Hadoop User Group UK meetup

Author: Adam | Filed under: Community, Development

Last week I went to London to the second HUG UK meetup hosted by the lovely people at Sun.

The day was a great chance to mingle with other Hadoop* users/developers/interested other. The first talk by Tom White from Cloudera on tips for Hadoop users was invaluable, as was Isabel Drost’s talk on Mahout on machine learning on Hadoop. This was the one I was most looking forward to as I’ve been trying to find ways of training classifiers on Hadoop and the prospect of developing my own was daunting and was going to take way too much time. Mahout seems to have a solid foundation of clustering techniques already written, but very few classifiers – I guess I may have to dig in and lend a hand.

It was my first time of having beer and pizza being served mid way through a day of presentations. I approve of this. More conferences/seminars/meetups should do this. Perhaps with a small selection of cheese and biscuits just to keep things respectable.

* Hadoop is the open source distributed data processing software I use as part of my research to do interesting things with large chunks of Flickr within reasonable amounts of time. Highly recommended.

5 Dec 2008

Porqpine

Author: Adam | Filed under: Community, Information Retrieval, Social Networks

I was recently at a presentation by Josep M. Pujol from Telefónica Investigación y Desarrollo (Research and Development of the largest Spanish telecommunications company, similar to BT) regarding their Porqpine search engine.

It was highly relevant for me because this system is working on the principles of social search that I’m investigating in my work. Josep and his team have put together a search system that monitors most of what you do in your web browser via a plug-in, and uses that information, along with that of people who are in your online social networks (Facebook, Twitter etc) to make suggestions along side those given by a search engine when you make a query, hopefully making useful/interesting/relevant recommendations.

Privacy was a priority for the team as well, so according to them no personal data is transferred between the node set up on your machine that encompasses your data, and those of you friends who are online and also using the system. I’m would challenge that somewhat as the suggestions you are presented with could, I’d argue, be traced to a particular user and their browsing habits.

Overall it quite exciting – a distributed, personalised search engine. It doesn’t require any large-scale central infrastructure like Yahoo! or Google, and it is supposed to give you help and guidance when using the web in a way relevant to you personally.

I asked at the end of the presentation whether he had been able to show that the influence of social contacts was actually beneficial in search (kind of the founding assumption of my work!) and his answer was ‘not yet’. It seems we’re in the same boat.

Something to keep an eye on.

4 Nov 2008

Update: I’m in Barcelona!

Author: Adam | Filed under: Development, Flickr, General, Information Retrieval

I moved offices from Milton Keynes to Barcelona about 2 months ago and have been rushed off my feet ever since. Now that I think I have a handle on things I can explain a little of what I’m doing.

First, why am I here? Mostly for the sangria, but also to work with my external supervisor at Yahoo! Research Barcelona who works extensively with Flickr and information retrieval in general. I’ll be spending around a year here using data sets that aren’t available externally and using computing resources that KMi doesn’t have.

What am I doing? Other than drinking said sangria, and attempting to improve my Spanish and learn some Catalan, I’m doing the experimental work of my second year of my PhD. I’ll be working on experiments that investigate how people social context influence their behaviour and preferences in large scale online multimedia IR systems, in particular using Flickr data.

What do I hope to achieve? Well I want to be able to support my thesis hypothesis with good, solid empirical work and get enough material together to get some good papers out of it.

Big questions of the moment?

  • How do various subsets of the complete social context of a user perform when used for suggesting tags and are they useful in different ways when compared to traditional suggestion methods?
  • Are there derivable features about a user that can be used to describe their user preferences or predict their behaviour in specific user interaction situations?
4 Aug 2008

So close

Author: Adam | Filed under: Community, Social Networks

I came across this meta-article on The Register about an experiment by Microsoft researchers into the small-world phenomenon that is central to parts of my work. According the the article they have been able to perform more robust experiments that corroborate the theory that ‘it is a small world’ where there are only a few social network hops between ‘any two strangers on Earth’. I found the original article on the Washington post. A few things annoyed me about the claims made:

1) The experiment – Nice though it would be to say that their experiment shows everyone is connected on average by 6.6 hops, it wouldn’t be true. All the reporter should have said is that users of MSN Messenger (people with access to Internet connected computers, literate, likely to live in north America, Europe or Japan and most vitally, who know other people who use it) who used the system during the window of the experiment may be connected by an average of 6.6 hops.
With just 180m highly selected users (1/37th of the human population) you can’t really make judgements about the whole of humanity. This work also ignores secluded groups within the human social network that are part of the population but have no connections with other communities, like the Sentinelese.
Overall, although the selection bias is less than the famous Milgram Experiment, it is still heavily biased. Any conclusions drawn would only be justifiable for the set of people used (which the actual researchers do point out). This is still useful and interesting, but media reporters should refrain from extrapolating and generalising like they did in this article – it’s just plain misleading.

While most good scientists go out of their way to ensure they qualify results, the media seem to do the exact opposite and unjustifiably try to read far too much into experiments. Critical thinking on the part of the audience is vital.

2) The conclusions – The article mentions that with this reaffirmed knowledge of human social networks, ‘large meshes of people … could be mobilized with the touch of a return key’. I don’t quite see the connection. If information has not previously been able to be disseminated or people organised, how will knowing that we’re heavily interconnected help now? What have we not been doing before that we can now do armed with this information?

The final statement goes to the respected (in my book) Duncan Watts who acknowledges the verging-on-the-apocryphal stories that have sprung up about human inter-connectedness by describing them as folkloric. These kind of popular science myths are tenacious, and just as the Inuit don’t have dozens of words for snow, I’m not on average 6.6 hops from any other human being on our planet. But it does sound appealing, doesn’t it?

19 Jun 2008

PhD Conference 2008

Author: Adam | Filed under: Community

Our Centre for Research in Computing held its annual PhD Conference last week were I presented a talk and a poster. The abstract and poster are available on the documents page linked to above.

3 Jun 2008

Links: June 2008

Author: Adam | Filed under: Links
  • A concept radio from the BBC that tries to bridge the gap between the device and the social context of the user. It’s a pity it’s only a concept but thanks to its license I hope some forward thinking company will take it up and run with it.
    http://www.bbc.co.uk/blogs/radiolabs/2008/05/olinda_a_new_radio.shtml
  • Especially after my Master’s thesis on machine translation I’ve kept an eye on systems like Google Translate – nice to see the expanded language list. I imagine this is entirely down to the deluge of documents that have had to be produced by the EU after the recent accession countries joined and the documents have become publicly available in large enough numbers for their statistics based engine to use.
    http://translate.google.com
  • A very cool music spider and aggregator that takes the pulse of music tastes and trends. I wonder how well it actually reflects the zeitgeist.
    http://www.bbc.co.uk/soundindex

I’m currently working on part of a submission to a few information retrieval (IR) evaluation conferences (including ImageCLEF) at the moment and deadlines are looming. It all rather fun in an incredibly geeky, academic kind of way. We produce a system that is tested on a standard data set and return the results to the conference organisers who compare them with teams from around the world.

While it’s not about the winning, it’s all about the winning. We are currently working on combining our content based IR systems with metadata search systems to produce the most relevant result set possible.

My particular section is near the end of the process. Once each type of search engine outputs a rank of what it thinks the most relevant images are to each query, or a set of filters based on different criteria like location or concept and I take these complimentary data and combine them to form one, super-relevant, über-rank. At least in theory.

I’ve written the code to merge ranks and filters and am awaiting data from the others in my team to test it and see if the MAP values of the combined rank is generally better than those of the constituent ranks. According to some of the literature I’ve recently read, even if a rank improves upon random selection only slightly, it can still contribute to an improved combined rank. The more ranks we have, the better the final results.

We’ll see what we get and hopefully outperform some of the ‘competition’.


12 Mar 2008

Links: 2008-03-12

Author: Adam | Filed under: Links
  • Similar in concept to the Microsoft Touch table, this video shows what is essentially a large touchscreen with applications running on it that take advantage of the different way of interacting with information. The bit that caught my eye was handling images about half way through.

    http://www.perceptivepixel.com/

11 Mar 2008

XFN, FOAF and other acronyms

Author: Adam | Filed under: Development, Social Networks

XHTML Friends Network and Friend-Of-A-Friend are two ways of allowing your internet personæ to link together and to link themselves to those of people you know. XFN works by adding ‘rel’ tags to the links in your blogroll that describe how you know that person; friend who you’ve met, a professional colleague, a neighbour who you have a crush on and, more importantly, the ‘me’ tag that helps link your own disparate profiles around the web together.

FOAF works by producing an XML file that describes you and links to your friends that can then be machine read by sites and services that can then tie you together. What does this all mean? Well, it means that if I chose to join a new social network based web site, I can point it at my blog or some other web presence and it can then find out who else I know and how and add them accordingly. It can also tie in information from my other social networks because it can tell whether another profile is mine or not. It’s kind of like extracting the social network information interconnection information that we repeatedly enter into different websites and storing it in it’s most basic form in our existing web pages.

Google’s Social Graph previously mentioned works by traversing this network of links in web profiles using both FOAF and XFN. The problem I have come to discover with Social Graph is that it is reliant on the main Google search engine to index the FOAF and XFN data, so you may have to wait a while before any new data you add becomes useable. (I’m just impatient and want to test it out on my own sites!)

I think this is one of those technologies that is frustratingly simple to implement and could be immensely powerful, but that just won’t gather enough presence in the collective consciousness of the web to get the required ‘market penetration’ to really start rolling. I sincerely hope I get proved wrong. Hopefully existing sites will incorporate XFN or FOAF into their systems (minimal cost) and that will provide meta-services the data they need to start doing some really cool stuff (I’m looking at you Facebook, Flickr et al).

29 Feb 2008

Google Social Graph

Author: Adam | Filed under: Development, Social Networks

Another cool idea from Google that wants to allow people to connect their internet based social profiles and tools together to make new connections. The introductory video shows how a user could discover existing real-world friends on a social network site by seeing who that person knows in other social networks.  This is interesting and I think I’ll have to have a bit of a think and play around with this.