5 Datasets to inspire your next project

Sometimes it’s hard to think of new data science projects for your resume or just to keep you busy. Maybe it’s creativity or maybe it’s the lack of interesting data sets out there!

We’ve decided give you some inspiration so you can get cracking like the true data nerd you are!

1. A dataset of 40+ million tweets of COVID19 chatter

Who doesn’t like a good twitter dataset? Thanks to panacealab.org, who have released their dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. They have released all tweets and retweets and the data consists of 152,920,832 unique tweets!

With this data you would be able to perform a sentiment analysis, decide what are the common phrases and track how tweets have changed over time due to the COVID-19 growth.

You can get the data here.

2. A dataset of 27,000 Games from Steam Store

This one is a goodie!

Downloaded from the Steam and SteamSpy APIs. This data Includes release data, developers, publishers, genres, positive/negative ratings, average/median playtime, owners (estimation from SteamSpy. Pretty inaccurate though), and price, as well as descriptions, media data (such as links to screenshots), system requirements, and support info (like company url and email).

This will work great on determine trends of popular games and how factors like price, genre and playtime affect its’ success.

You can get the data here.

3. A dataset on Weather and Climate Public Datasets by Google

What would we do without Google?

Google Cloud’s Public Datasets Program can help you better understand and predict that impact by providing access to some of the most valuable global public weather and climate datasets, at no cost.

Combining public datasets with your proprietary data can help you unlock new insights and take your work to another level. 

You can analyze the impacts of weather and climate.

You can get the data here.

4. The COVID-19 open research dataset

A pandemic that has changed the modern world. We had to include this.

In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19).

CORD-19 is a resource of over 47,000 scholarly articles, including over 36,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.

You can get the data here.

5. A dataset of the Big Five Personality Test with 1M answers

Yes, you read right. A personality dataset consisting of 1 million answers!

The Big Five personality traits, also known as the five-factor model (FFM) and the OCEAN model, is a taxonomy, or grouping, for personality traits. The big 5 personality test consists of five broad personality traits described by the theory are extraversion (also often spelled extroversion), agreeableness, openness, conscientiousness, and neuroticism

This dataset contains 1,015,342 questionnaire answers. You will be able to do a lot with this data such as analyzing what type of answers and personal characteristics are common with certain groups.

You can get the data here.


Previous
Previous

The 5 point check for cleaning data

Next
Next

Top BI Tools for 2020