From restaurant-snobbery to racism: some perils of data-driven decision-making

Wired recently wrote a piece explaining how now OpenTable, a leading “reserve a restuarant over the internet” service, was starting to permit customers to pay for their meal via an app at their leisure, rather than flag down a waiter and awkwardly fiddle around with credit cards.

There’s an obvious convenience to this for the restaurant patron, but, as with most useful “free” services, the cost is ones’ personal data. Right now, it is possible, at least unofficially, for the data-interested to access some OpenTable restaurant data – but soon it may become a lot more personal.

Wired writes:

Among information the company purports to collect: name and contact information, current and prior reservation details, order history, dining preferences, demographics and precise location data. The company pairs such user data with information from “Third Party Platforms.” The wording here is purposefully vague, but it is certainly plausible that the company could use outside research firms to cobble together a whole host of personal information like income, age and spending habits.

For users who make payments via the mobile app, OpenTable reserves the right to share its customer dossier with the restaurant “for its own purposes.”

In a utopian world, this could be great. It might be nice to turn up to a restaurant having them already know who you are, which your favourite table is, what drinks you might want on ice and be given a personalised menu highlighting your favourites and removing any foods to which you have an allergy to?

This sort of technology-experience is already in use in a few places. For instance, in the magical land of Disneyworld, the Disney MagicBand allows you to pre-order your food and have it ready by the time you turn up.

From another Wired article:

If you’re wearing your Disney MagicBand and you’ve made a reservation, a host will greet you at the drawbridge and already know your name—Welcome Mr. Tanner! She’ll be followed by another smiling person—Sit anywhere you like! Neither will mention that, by some mysterious power, your food will find you.

The hostess, on her modified iPhone, received a signal when the family was just a few paces away. Tanner family inbound! The kitchen also queued up: Two French onion soups, two roast beef sandwiches! When they sat down, a radio receiver in the table picked up the signals from their MagicBands and triangulated their location using another receiver in the ceiling. The server—as in waitperson, not computer array—knew what they ordered before they even approached the restaurant and knew where they were sitting.

But the first Wired article highlights also a less delightful side to the experience.

Restaurants have always doled out preferential treatment to the “best” customers—but now, they’ll be able to brand them with a specific dollar sign the second they walk in the door.

A concern is whether knowledge of a customer’s normal habits will be used to score them on some metric that leads to differential treatment. Maybe if they know upon approach that you buy cheap dishes, aren’t likely to have excessive social influence or – most horrific of all – tip badly, you will get a worse level of service than if they scored you in the opposite way.

Some might argue you can see the consequences of this sort of “pre-knowledge” in a very basic way already. Discount voucher sites like Groupon offer cheap deals on meals that often require one to identify to the waiter that you will be using such a voucher in advance.

There are very many anecdotal reports (mild example) that this can lead to worse service (and very many other reports that this might well be because voucher users aren’t understanding that the server probably should get a tip based on the non-voucher price!).

The Brainzooming Group summarises some more formal research:

Additional research revealed a direct link between the use of Groupon and a negative service experience. The above graph is from a study conducted by Cornell researchers who studied over 16,000 Groupon Deals in 20 US cities between January and July this year. The study found, among other things, that Groupon users averaged a 10% lower rating than those who didn’t use Groupon.

However, it’s clearly not the case that “restaurant prejudice” would be a new thing – it existed well before OpenTable thought about opening up its data treasure trove. It happened even in ye olde times before Groupon. In fact the author of the original Wired article quoted at the top was themselves an assistant to a “VIP” of the sort that never had trouble getting even the toughest of tables with his secret phone numbers and possibly a bit of name infamy.

My first boss, an infamous man-about-town, kept a three-ring binder of unpublished phone numbers unlocking some of the toughest tables in the city.

The issue is – as with a lot of big data concerns – not that it introduces a brand new issue but that it allows it at a massively enhanced scale.

Now instead of a judgement that is basically a 2-way 0.1% “top VIP” vs 99.9% “normal person” categorisation, we could have “5* tipper who earns £8k a month and just paid off her mortgage” vs “unlucky downtrodden victim of capitalism who normally only spends a bare minimum and drinks tap water” scored for each and every one of the OpenTable users individually.

But at the end of the day, how much should we care? Having a suboptimal restaurant experience is not the worst experience life can bring to most people’s existence. And some would say that, if the data is reliable and the models are accurate (hmmm…), such businesses have every right to treat their customers in line with their behaviour, so long as they remain in line with consumer law.

Of course though, this type of data-driven behaviour is not limited to restaurant-going though – and there are risks of a far less recreational nature to consider.

Insurance is one such example. Insurance actuaries, analysts and the like have, for more years than most of us, used data to determine the services and prices they will offer to people. The basic theory is: if there is more chance that this person will make an expensive insurance claim than average then they will be charged more (or refused) for the insurance product.

There is obvious business sense in this, but the need to rely on segmentaton, generalised patterns and so on means that someone with little relevant history who happens to fall into a “bad group” may be unfairly penalised.

The classic example bandied around here in the UK (where we have no mass-market need yet of health insurance) is car insurance.

Gender was traditionally an influencing factor in the insurance premiums. Men drivers would be charged more than women drivers based on the premise that they may on average take more risks than women, apparently get into more accidents, are less likely to wear seat belts, are more likely to get caught driving drunk and so on.

In a population this therefore might seem fair. At the individual level, less so in some cases – some men are safer drivers than some women, but of course the insurer has no way at first to determine which ones, so charges all men more. Well, they did until it became illegal.

As per the BBC in 2012:

the European Court of Justice has decided that insurers will no longer be allowed to take the gender of their customers into account when setting their insurance premiums.

There are many arguments as to the fairness of this ruling which we won’t go into here, other than to note that it did not please insurance companies overly – so some seem to be using a different sort of data to semi-bypass that ruling.

According to Stephen McDonald from the Newcastle University Business School, it looks like insurance companies are now using a proxy for gender: occupation.

Some jobs may legitimately have higher risks regarding driving than others. Also, even in 2015, it still happens that some jobs have a higher proportion of males or females in than others. Can you see where we’re going with this?

Some examples from Stephen’s study:

The occupations represent different mixes of males to females, with social workers and dental nurses being mostly female, civil engineers and plumbers mostly male, and solicitors and sports and leisure assistants being gender neutral.

And how is this relevant to insurance premiums? Well:

Comparing prices from before and after the change of law, he finds that after the ban:

  • Prices are the same for men and women.
  • But prices become relatively lower for young (21 year old) drivers in occupations mostly held by women, but higher for those in predominantly male occupations. For example, premiums increased by an estimated 13% for 21 year old civil engineers, but decreased by 10% for dental nurses of the same age.

To summarise: (especially young) people who tell insurance companies that they have a job which is commonly done by males are charged more than those who inform the same company that they have a job commonly done by females.

So, for anyone who really has no idea what career they’d like to follow – if you like driving and like money, perhaps dental nurse is the way to go!

Before we leave driving, let’s have a look at those dangerous motorist obstacles: potholes. A few years ago, a (somewhat dubiously sourced) survey estimated that hitting potholes caused nearly £500 million worth of damage to the UK’s car population in a year. The associated swerving and other consequential manoeuvres can be even a matter of life and death in some cases.

So they surely need fixing – but how can we locate where they are, so the requisite council can move in and fix?

In reality, few motorists are likely bothered enough to take the time to call or visit the authorities, even if they know who the relevant one is. Being the year 2015, the fashionable answer in many cases of course is to use an app. “Fill that hole” is one example (now backed by the Government) where you can quickly and easily report any pothole dangers.

Street Bump goes one step further, and tries to use a combination of your smartphone’s accelerometer and GPS sensors to automatically detect when you hit a pothole and then report it directly, avoiding the need for a user to manually log it.

But how you really want to be doing this doesn’t involve faffing around with a smartphone at all. Why not just get one of Jaguar’s latest developments – a Land Rover that not only detects and reports potholes automatically whilst driving but also could potentially alert the driver of the vehicle in advance to give them time to avoid them, tell other nearby cars about their existence and even – as driving becomes ever more automated – take over the controls and safely navigate around them automatically.

So let’s all get brand new Jaguars and pothole problems will be forever solved! Except we won’t, will we? And why not? Well, the #1 reason may be that cutting-edge car tech from Jaguar is not going to be cheap. It will probably be beyond the price range of most of us (for now). Just as smartphones are out of the price, knowledge or “interested in learning about” range of a still-significant proportion of the population. Which leads to a further “data risk”, especially when we are talking about important public services.

As Tim Harford notes in the Financial Times

Yet what Street Bump really produces, left to its own devices, is a map of potholes that systematically favours young, affluent areas where more people own smartphones. Street Bump offers us “N = All” in the sense that every bump from every enabled phone can be recorded. That is not the same thing as recording every pothole.

There’s an obvious loop here. Areas that are more economically deprived, or where there is less digital literacy, are less likely to have people busily reporting potholes on their fancy smartphones, let alone their cutting edge Jags. But there’s little evidence that they have fewer potholes – just fewer reports.

To steal a famous saying, here’s a case where “absence of evidence is not evidence of absence”.

If, however, we wrongly assume that areas with the most digital pothole reports are those with the most potholes and prioritise repairs accordingly, then one can imagine a situation where the most deprived areas get the least maintenance and become ever more troubled and undesirable, as their reputation as “bad areas” spirals further out. Those advocates of the broken windows theory might believe it could even lead to higher crime.

Potholes might seem like a relatively trivial example (if you’ve never hit one), but in an era where even the most vulnerable are being forced by the Government to magic up a computer and learn how to use the internet to access the welfare they desperately need to live – despite some surveys suggesting this will range between hard and impossible for many of those who most need them – it’s not hard to imagine more serious reinforcements of inequality along these lines.

Or even matters of life and death.

3 years ago, Hurricaine Sandy struck New York. It was big and bad enough that many people dialling 911 to request urgent help could not get through. Luckily though, the NY Fire Department had a Twitter account and a member of staff to monitor it for urgent requests. Although, in this case, they explicitly requested people not tweet for emergency help, this fine employee did only what anyone with some humanity would do – and passed on these urgent requests to the front-line rescuers.

“My friends’ parents who are trapped on #StatenIsland are at 238 Weed Ave. Water almost up to 2nd flr.,” posted Michael Luo, an investigative reporter for The New York Times.

I have contacted dispatch,” Rahimi responded within minutes. “They will try to send help as soon as they can.”

Fantastic stuff, social media activity potentially actually saving lives – but for those people unfortunate enough to not have a smartphone suitably fancy enough to tweet, or perhaps lacking the knowledge or desire needed to do so – it potentially introduces an innate bias in resource deployment that favours some people over others for no legitimate reason.

Moving on, regarding the same subject of reinforcing inequalities: it is the case many predictive data models are in fact deriving their answers by learning and adapting to what happened in the past. It’s the obvious way to test such systems if nothing else: if they can predict what did happen properly, they may stand a good chance of predicting what will happen.

This means however that data-driven efforts are not immune from reinforcing historical prejudice either

Let’s take a couple of serious biggies: race and gender discrimination.

Surely big, open, accessible data is exactly the tool needed to combat the horrible biases of some in society? Not necessarily; it depends entirely upon application.

As long ago as 1988, the Commission for Racial Inequality (now merged into the Equality and Human Rights Commission) found St George’s Hospital Medical School guilty of racial and gender discrimination in choosing who to admit to its school. Some rogue racist recruiter? No, not directly anyway.

‘…a computer program used in the initial screening of applicants for places at the school unfairly discriminated against women and people with non-European sounding names’.

Whyso?

‘The program was written after careful analysis of the way in which staff were making these choices and was modified until…it was giving a 90-95% correlation with the gradings of the [human] selection panel’

The data and ruleset that led to this bias wasn’t as explicit as one might imagine at first sight. The computer was never given the race of the applicants – it wasn’t even recorded on the application form. However, it seemingly “learned” to effectively proxy it based on surname and place of birth. The overall effect was to reduce the chances of those applicants who seemed female or foreign to be included on the “invite to interview” list.

‘Women and those from racial minorities had a reduced chance of being interviewed independent of academic considerations’

This wasn’t a programmer or data scientist (or whatever they were called in 1988!) secretly typing in a prejudiced line of code, and nor did the computer suddenly decide it didn’t like foreigners or women. The computer was not racist; but by the nature of the algorithms it ran reflected a process that probably was.

‘This point is important: the program was not introducing new bias but merely reflecting that already in the system’

Fast forward more than quarter of a decade, and now machine learning algorithms can run over outrageously large amounts of data, trying to test and learn which of thousands of variables can automate accurate decision making that in the past took a slow, expensive, human to do.

As shown above, it’s not as simple as removing “race” or other likely prejudices from the variable set.

2 years ago, Kosinksi et al. published a paper in the Proceedings of the National Academy of Science that simply looked at what people had pressed ‘Like’ on in Facebook. This is information that is often publicly available, attributable to an individual, and accessible for data-related uses.

Using nearly 60,000 volunteers, they produced a regression model which produced some very personal insights.

‘The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait “Openness”, prediction accuracy is close to the test–retest accuracy of a standard personality test’

Yes, from you pressing ‘Like’ a few times on Facebook, this model purports to be able to determine with reasonable accuracy your physical and mental traits.

The researchers were kind enough to publish a supplement that showed some of the more predictive “likes” – including such unintuitive gems as an association between high IQ and liking curly fries.

But are businesses, employers and other authorities really going to use Facebook data for important decisions? Undoubtedly. They already do.

Loan companies are an obvious example. A quick Google will reveal many organisations purporting to credit score, or have a model for credit scoring, that depends at least partially on your digital data stream.

To pick one at random, Venturebeat reports on a company called Kreditech.

‘By analyzing a pool of publicly available online information, Kreditech can predict how creditworthy you are in a matter of seconds.

The big data technology doesn’t require any external credit bureau data. Instead, it relies on Facebook, e-commerce shopping behavior, mobile phone usage, and location data.’

And whilst I have no idea about the workings of their model, even if race is not a specific variable involved in the decision, then should if choose to segment based on, for instance, people who have liked “Bonfires” on Facebook, then, in accordance with the Facebook Likes study above, they will de-facto be adjusting for race (liking bonfires being apparently more predictive of being white than black).

Why pick the credit score example? Because some big finance companies have bad form on this sort of prejudice – the US DOJ for instance reached a $335 million settlement a few years ago because:

‘Countrywide discriminated by charging more than 200,000 African-American and Hispanic borrowers higher fees and interest rates than non-Hispanic white borrowers in both its retail and wholesale lending. The complaint alleges that these borrowers were charged higher fees and interest rates because of their race or national origin, and not because of the borrowers’ creditworthiness or other objective criteria related to borrower risk.’

Whatever the motivating factor behind those original prejudiced decisions was, if
this dataset of successful lending to applicants in the past is used to teach a machine how to credit-score automatically then one can see the same risk of unjust outcomes being created, just like St George’s inadvertently reproduced racism and sexism in its data-driven recruitment filter.

Even for those cold, hard capitalists with no interest in social conscience – by not taking the time to consider what your model is actually doing to generate its scoring you could jeopardise your own profits.

As noted above, some of the potential problems come from the classic ‘use the past to predict the future’ methods that underpin a lot of predictive work. Those familiar with Clayton Christensen’s much heralded book regarding the “Innovator’s Dilemma” will immediately see a problem.

From Wikipedia:

‘Christensen’s book suggests that successful companies can
put too much emphasis on customers’ current needs, and fail to adopt new technology or business models that will meet their customers’ unstated or future needs.’

Limiting your marketing efforts to approach people as customers just because they are the same sort of people that previously were customers may artificially restrict your business to a homogeneous, perhaps ever-declining, population of people. That is probably not really your only opportunity for radical growth.

So what’s to be done about this?

Unfortunately, when presented with “computer says no” type consumer scoring, most people are not necessarily going to understand why the computer said no. Sometimes it is literally impossible to determine that through any realistic method. Most of the responsibility therefore has to lie with those developing such models.

I recently had the pleasure of attending a fascinating talk by Hilary Mason, the Founder of Fast Forward Labs and previously a data scientist with bit.ly.

In it, and also in the book ‘Data Driven‘ she co-authored, she went through a set of questions that she likes to be asked about any data science project. One was ‘What’s the most evil thing that can be done with this?’

Partly this was to encourage more open thinking – and they do advise not to ask it if you actually work with people who are evil! – but also noted that:

‘One of the challenges with data is the power that it can unleash for both good and bad, and data scientists may find themselves making decisions that have ethical consequences. It is essential to recognize that just because you can, doesn’t mean you should.

Openess is an important, if tricky, issue. Where possible, good practice should mandate that predictive model builders should provide clear, understandable documentation as to how exactly their products work.

There are at least two tricky factors to take into account here though:

  1. If this is a commercial model, companies are not going to want to reveal the details of their secret sauce. Experian, for instance, may publish “some factors that may affect credit scores“, but they are never going to publish the full model workings for obvious reasons. However, this does not mean that Experian’s data scientists should not be producing at least clear and explicit documentation for their fellow employees, under NDAs if necessary.
  2. Some types of model are simply more impenetrable than others. A decision tree is quite easy to represent in a way that non-data-scientists can understand. A neural network is very much harder. But sometimes neural networks may produce far more accurate models. But either way it’s not hard to document what went in to the model, even if you can’t fully explain what came out!

Although this is somewhat idealistic, it would be nice if users of such models also were enabled, and proactively made an attempt, to understand it as far as possible.

The report on St. George’s prejudiced recruitment model above made this clear:

‘A major criticism of the staff at St. George’s was that many had no idea of the contents of the program and those who did failed to report the bias.’

In reality, one can’t determine from the article whether this is a fair criticism of the users, or actually something that should be aimed elsewhere. But it would not be impossible for any given organisation to internally promote understanding of such systems.

They also imply one possible monitoring solution. It’s good practice to regularly monitor the output of your model for scoring accuracy – but one can also monitor the consequence of the decisions it’s making, irrespective of accuracy. This is especially possible if you are, for legal or ethical reasons, particularly concerned about certain biases, even over pure model accuracy.

Worried if your model is outputting racist decisions? Well, if you can, why not correlate its results with “race” and look for patterns? Even before designing it you could remove any variables that you have determined to have certain degree of cross-correlation with race if you wanted to be really careful. But know that – depending on the task at hand – this might jeopardise model accuracy and hence be a hard sell in many environments.

No-one can argue that the main point of a lot of these predictive models is to optimise for accuracy. But then be aware of exactly what “accuracy” means. History is full of prejudice, and models can certainly reproduce it.

If you’re a luxury brand trying to target the customers with the most spare money then an obvious variable might be “pay received”. But don’t forget, according to the Fawcett Society:

In modern Britain, despite the 1970 Equal Pay Act, women still earn less than men. The gender pay gap remains the clearest and most dramatic example of economic inequality for women today.

So optimising for pay may have some correlation with favouring men over women. Does this matter to you? If, in your context, it does then tread with care!

If you’re a police department, trying to target people likely to commit the most serious of crimes, then “length of prison sentence” might be a sensible input. But beware – the Wall Street Journal reports:

Prison sentences of black men were nearly 20% longer than those of white men for similar crimes in recent years, an analysis by the U.S. Sentencing Commission found.

So optimising for length of prison sentence may be unfairly biasing your “crime score” to select black people over white people.

So, try and evade potential “judge bias” by just using the fact of arrest? Sorry, that probably won’t work much better. Here’s one example why from Jonathon Rothwell of the Brookings Institute:

Blacks remain far more likely than whites to be arrested for selling drugs (3.6 times more likely) or possessing drugs (2.5 times more likely).

Here’s the real shock: whites are actually more likely than blacks to sell drugs and about as likely to consume them.

So optimising for historical arrest statistics may also entail unfairly selecting black people over white people, because – before big data came near to police departments – some already were unfairly selecting black people over white people for other reasons. Does this matter to you? If so, once more, tread with care!

There’s probably no simple, generic, solution to avoid these issues. We are easily already at the point where most human users don’t really understand what creates the output of most data models in any depth.

The point can only be to never assume that even if a model uses a mathematically rigorous algorithm, an unbiased methodology and data that is considered to be an fair view of reality, that its outcome will respect “equal rights”. There is an implicit conflict in some cases between optimising for model accuracy – which may involve internally recreating parts of the world that we don’t especially like – and using data for the common good.

One thought on “From restaurant-snobbery to racism: some perils of data-driven decision-making

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s