Reddit, the “front page of the internet” – and a network I hardly ever dare enter for fear of being sucked in to reading 100s of comments for hours on highly pointless yet entertaining things – has had its share of controversies over the years.
The site is structurally divided up into “subreddits” , which one can imagine just as simple, quite old-school, forums where anyone can leave links and comments, and anyone else can up or downvote them as to whether they approve or not.
Reddit users were themselves busily engaged in a chat regarding “which popular subreddit has a really toxic community” when Ben Bell of Idibon (a company big into text analysis) decided to tackle the same question with a touch of data science.
But what is “toxic”? Here’s their definition.
Ad hominem attack: a comment that directly attacks another Redditor (e.g. “your mother was a hamster and your father smelt of elderberries”) or otherwise shows contempt/disagrees in a completely non-constructive manner (e.g. “GASP are they trying CENSOR your FREE SPEECH??? I weep for you /s”)
Overt bigotry: the use of bigoted (racist/sexist/homophobic etc.) language, whether targeting any particular individual or more generally, which would make members of the referenced group feel highly uncomfortable
Now, text sentiment analysis isn’t all that perfect as of today. The CTO of Datasift who has a very cool social-media-data-acquiring-tool was claiming around 70% accuracy as being about the peak possible, a couple of years ago. The CEO of the afore-mention Idibon claimed about 80% was possible today.
No-one is claiming nearly 100%, especially on such subtle determinations such as toxicity, and their chosen opposite, supportiveness. The learning process was therefore a mix of pure machine science and human involvement, with the Idibon sentiment analysis software highlighting, via the Reddit API, the subreddits most likely to be extreme, and humans classifying a subset of the posts into those categories.
But what is a toxic community? It’s not as simple as simply a place with a lot of toxic comments (although that’s probably not a bad proxy). It’s a community where such nastiness is approved of or egged on, rather than ignored, frowned upon or punished. Here Reddit provides a simple mechanism to indicate this, as each user can upvote (approve of) or downvote (disapprove of) a post.
Their final formula they used to calculate judge the subreddits, as per their blog again, is
The full results of their analysis are kindly available for interactive visualisation, raw data download and so on here.
But in case anyone is in need of a quick offending, here were the top 5 by algorithmic toxicity. It may not be advisable to visit them on a work computer.
|Rank of bigotry||Subreddit name||Official description|
|1||TheRedPill||Discussion of sexual strategy in a culture increasingly lacking a positive identity for men.|
|2||Opieandanthony||The Opie and Anthony Show|
|3||Atheism||The web’s largest atheist forum. All topics related to atheism, agnosticism and secular living are welcome.|
|4||Sex||r/sex is for civil discussions about all facets of sexuality and sexual relationships. It is a sex-positive community and a safe space for people of all genders and orientations.|
|5||Justneckbeardthings||A subreddit for those who adorn their necks with proud man fur.Neckbeard: A man who is socially inept and physically unappealing, especially one who has an obsessive interest in computing:- Oxford Dictionary|
[Edited to correct Ben Bell’s name and column title of table – my apologies!]
2 thoughts on “The most toxic place on Reddit”
This is Ben Bell (not Ball) – author of the Reddit Toxicity study. Thanks for taking the time to write this article! Two things – first, the rankings you have of the subreddits is by bigotry, not by subreddit. Second, if you want to embed the actual charts we used into the article you can do it with the following code (we have wordpress too):
Chart 1: [iframe src=”https://plot.ly/~bsbell21/210/toxicity-vs-supportiveness-by-subreddit.embed?width=700&height=650″ width=700 height=650 scrolling = “no” align = “center”]
Chart 2: [iframe src=”https://plot.ly/~bsbell21/283/bigotry-by-subreddit.embed?width=725&height=535″ width=750 height=553 align=”center” scrolling=”no” seamless=”seamless”]
You’ll have to install this plugin for it to work: https://wordpress.org/plugins/iframe/
Let me know if you have any questions!
LikeLiked by 1 person
Thanks, great to hear from the famed analysis author! Your comments are much appreciated, and I really enjoyed reading about your approach on your blog. My apologies for those errors (especially your name, extremely unforgivable!!), now corrected 🙂
I would have loved to embed your charts thanks – but unfortunately I’m using hosted wordpress.com which I think means I can’t install the needed plugin, which is a real shame. If yours is also hosted wordpress.com rather than a standalone WordPress server and I’m wrong then please do let me know! There are plenty of other nice chart and data sites that offer up iframes I am sad to have to work without.
As to questions, your blog post was really informative on what you did, but if you have time then I’m curious (just as a thought exercise) as to if you have any thoughts surrounding applications related to your “bigotry score”.
Several sites, perhaps most notably Twitter in recent times, would no doubt like to cut out some of the worst of the worst stuff that violates their rules . Right now I believe they added new and enhanced tools for humans to report anything that shouldn’t be there – but I wonder whether you envisage potential mass use of the sort of technique you used to help them along with that effort and pre-classify some content as potentially offensive for instance. Perhaps Twitter users could even be given a “we calculate that you are x% bigoted” score depending on what they write and how they interact with other bigoted comments 🙂
[Of course would open up a whole new topic surrounding censorship, the filter bubble et al!]
Thanks again for writing,