The Data Is Plural newsletter provides a mass of free and fascinating data

I recently chanced upon "Data Is Plural" - an email newsletter, currently on issue 370. Each week it provides a list and some commentary on "useful/curious datasets". There's a ton of links in each issue for anyone who wants data to play or work with to get stuck into. To give a taster of what … Continue reading The Data Is Plural newsletter provides a mass of free and fascinating data

Average age at menarche by country

A question came up recently about variations in the age at menarche - the first occurrence of menstruation for a female human -  with regards to the environment. A comparison by country seemed like a reasonable first step in noting whether there were in fact any significant, potentially environmental, differences in this age. A quick … Continue reading Average age at menarche by country

data.world: the place to go for your open data needs?

Somewhere in my outrageously long list of data-related links to check out I found "data.world". Not only is that a nice URL, it also contains a worthy service that I can imagine being genuinely useful in future, if it takes off like it should. At first glance, it's a platform for hosting data - seemingly biased towards the … Continue reading data.world: the place to go for your open data needs?

#VisualizeNoMalaria: Let’s all help build an anti-Malaria dataset

As well as just being plain old fun, data can also be an enabler for "good" in the world. Several organisations are clearly aware of this; both Tableau and Alteryx now have wings specifically for doing good. There are whole organisations set up to promote beneficial uses of data, such as DataKind, and a bunch of … Continue reading #VisualizeNoMalaria: Let’s all help build an anti-Malaria dataset

Accessing Adobe Analytics data with Alteryx

Adobe Analytics (also known as Site Catalyst, Omniture, and various other names both past and present) is a service that tracks and reports on how people use websites and apps. It's one of the leading solutions for organisations who are interested in studying how people are actually using their digital offerings. Studying real-world usage is often far more insightful, … Continue reading Accessing Adobe Analytics data with Alteryx

Kaggle now offers free public dataset and script combos

Kaggle, a company most famous for facilitating competitions that allow organisations to solicit the help of teams of data scientists to solve their problems in return for a nice big prize, recently introduced a new section useful even for the less competitive types: "Kaggle Datasets". Here they host "high quality public datasets" you can access for free. … Continue reading Kaggle now offers free public dataset and script combos

How many teachers do we need? The official Governmental model

How do we know how many teachers are required to keep the UK's schools in good working order? It's an interesting question, with obvious implications for Governmental education policy with regards to teacher compensation, incentives, training places and so on. The "official" requirements are calculated via the Government's "Teacher Supply Model", which, happily, in the … Continue reading How many teachers do we need? The official Governmental model

Microsoft Academic Graph: paper, journals, authors and more

The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals and conference "venues" and fields of study. Microsoft have been good enough to structure and release a bunch of web-crawled data around scientific papers, journals, authors, URLs, keywords, references between and so on for … Continue reading Microsoft Academic Graph: paper, journals, authors and more

Free dataset: all Reddit comments available for download

As terrifying a thought as it might be, Jason from Pushshift.io has extracted pretty much every Reddit comment from 2007 through to May 2015 that isn't protected, and made it available for download and analysis. This is about 1.65 million comments, in JSON format. It's pretty big, so you can download it via a torrent, as per the … Continue reading Free dataset: all Reddit comments available for download