Free data is everywhere! Firms publish their data anonymized on kaggle.com, governments participate in open data initiatives and banks open their payment transaction data to start-ups to offer new services for customers. Big firms like Facebook promote the data age with Facebook API (e.g. see here) and entire businesses base their profit on these APIs (Pokemon Go using Google Maps API e.g.). Additionally, most firms make use of free data, or offer free data on Hackathons to get almost free insights.
As data scientists and analysts we can use this data to generate insights for firms, customers or the general public (see here for data-driven journalism). The following links are a collection of different sources I found interesting myself. Note in the comments if you miss something.
Data to Play with
- A good place for free and structured data is kaggle.com. Here companies host competitions involving data analysis. Mostly the objective is to predict some sort of outcome or interpret images using machine learning (CNNs). Kaggles big plus is a direct console to run Python or R code having the data sets directly loaded. You can not only profit from the data, but also from the code shared by others. Registration is free, check it out!
- Great data sets of different nature can be found at the UCI Machine Learning Repository. Known suff like the Iris data set, but also new topics like wine and car data are available.
- Affilinet offers an API to pull product data (mostly for advertisers). This kind of data can be very interesting for economists doing research in competition economics and things a like.
- A wide range of free data from different fields can be found on data.gov. This site is hosted by the U.S. State and contains data on health, finance, consumers and so on in the US. Depending on the topic the data is on an individual level, others are more aggregated (like weather or climate data).
- Open data from the EU can be found on http://data.europa.eu/ the database contains more than 12,000 data sets.
- Local governments participate in the open data initiative too. Hamburg provides its addresses (helpful for retargeting) or weather data, and lists similar local governments also providing data sets.
- Another rich resource is open data hosted on AWS by Amazon. They even have a data set about human genomes, or satellite data.
- An extensive set of macro data can be found in the CIA Factbook, it contains relevant macro related data from countries all over the world and is one of the most complete collection I am aware of.
- IMDB the famous movie rating site offers its data for free. The data is up to data, so have fun finding your next Friday night movie!
- This awesome page attribute data about LEGO, the well known brick producer. Find the mostly sold brick (with its respective color!)
As mentioned above the possibilities are unlimited. You can find free and interesting data everywhere to play with. So when you want to check out a new function do not always use build in data like the Titanic data set or the Iris set. Get your hands on some more innovative stuff and find some interesting insights. Playing around with data of different kind makes you more versatile as an analyst and helps you to find business values faster. Most data on the internet is aggregated or at least pseudonymized — meaning you cannot identify the original person behind the date. But keep in mind that data regulation and protection are serious business, violating these regulations is expensive for companies. Data from other units, such as production Companies, created by sensors different and will be even more important in the future.