Data science vs rude Lego

Data science moves onwards each day, helping (perhaps) solve more and more of the world’s problems. But apparently there’s at least one issue for which we don’t have a great machine-learning/AI solution for just yet – identifying penises made out of Lego.

Indeed this is apparently the problem that plagued the potential-Minecraft-beater “Lego Universe” nearly 5 years ago.

The internet is awash with re-tweets of ex-Lego-Universe developer Megan Fox’s amusing stories from yesteryear. Thanks to Exquisite Tweets for collecting.

Funny story – we were asked to make dong detection software for LEGO Universe too. We found it to be utterly impossible at any scale.

Players would hide the dongs where the filtering couldn’t see, or make them only visible from one angle / make multi-part penis sculptures…

They actually had a huge moderation team that got a bunch of screenshots of every model, every property. Entirely whitelist-based building.

YOU could build whatever you wanted, but strangers could never see your builds until we’d had the team do a penis sweep on it.

It was all automated, but the human moderators were IIRC the single biggest cost center for LEGO Universe’s operational costs. Or close to.

To be fair, this was a few years ago and progress on image recognition data science did not stop.

Lego itself just released “Lego Worlds” recently which seems to be a similar type of thing – whether they have solved the problem I do not know.

Humanity does seem to be making decent progress on such tasks in general. Microsoft Research recently published a paper “Delving deep into rectifiers” wherein they detail their algorithmic achievement in being perhaps the first program that classifies images within the Imagenet Large Scale Visual Recognition Challenge 2012 more accurately than the competitor human managed.

In the consumer space, both Flickr, and very recently, Google have opened up features that allow anyone to upload large numbers (or in Google’s case, apparently infinite) photographs and then keyword search for “dog”, “Billy”, “Paris” etc. to show all your photos of dogs, Billy or taken in Paris without you having to provide any manual tagging or contextual information.

Flickr’s attempt has been around a bit longer and has caused a little controversy – as all in the field of data will know, the sort of machine learning and classification processes this extremely hard problem requires do not have any inbuilt sense of politeness or decency.

Misclassifying this photo of Auschwitz as “sport”, as reported by the Guardian, is surely just a confused algorithm rather than a deliberate attempt to offend.

Flickr staff are open that mistakes will be made and that there is an inbuilt process to learn from them – but it’s obvious why a “normal” viewer can find these classification errors offensive, especially when they might relate to photos of their children for instance.

This surely poses a dilemma for the sort of companies that provide these services. The idea behind these services is a great one, and pretty essential in these days where we all take thousands of photos a year and need some way to retrieve the few ones we are particularly interested in – but how understanding present-day consumers are towards the mistakes inherent in the process – particularly at the start of any such efforts – remains to be seen.

In any case I’m sure it won’t be long before someone tests how good Google Photo is at autotagging Lego genitalia (or much worse…).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s