Datasets
Here we list a few places from which you can access data.
International and national statistical institutions
These are institutions which do offer access to data.
National Surveys
Many nations run large national surveys. This is a selection.
Typically summary data are freely available and access to more detailed
data can be available for some of these after a registration
process.
Aggregator sources
Some organisation provide access to a range of datasets collated from
different sources:
- Harvard Dataverse, a
place where researches deposit datasets used in empirical work
- Consumer Data Research
Center, e.g. Election Data, Residence-Workplace and travel mode data
from Census, Council Tax Maps, Index of Multiple Deprivation, Population
density
- The UK Data Service,
Census data, International macrodata, Qualitative/mixed methods and UK
surveys
- Our World in Data, Data
Stories with good links to data
- Data Commons, a website to
facilitate the identification of datasets relevant to your question
- Statista is
a website that collates data from a range of different sources and
countries. With your university login you may have access to this
database.
- Economic
Data Sources collated by UC Davies
- Economic Data
Sources collated by the Economics Network
- Datasets for
published work
- Center for International
Data, excellent worldwide trade statistics and US tariff data
- UC Irvine Machine
Learning repository, hosts a range of datasets that are often used
as benchmark datasets
Not really an aggregator source, but a useful tool when looking for
data: Google
dataset serach tool
Some Specific Datasets
Here are links to some specific datasets
non-UK data
- Facebook
data, a surprisingly rich source of data
- FRED-MD
and FRED-QD, a standardised set of monthly and quarterly US Macro
data which is updated monthly. This is updated by and available from
Michael McCracken’s website on FRED.
- Global Carbon Budget
Data website, CO2 emission and emission budget data
- UTD19, traffic data for a range
of cities (mainly in Europe).
- OECD database of hate
crimes
- US Center for Disease Control,
Data on all sorts of diseases.
- Data on state level
U.S. healthcare, University of Minnesota run database of all
information relating to U.S. healthcare
- F.B.I Crime Data Explorer,
U.S. Crime data at different levels of aggregation.
- US Household Surveys,
Yearly Big US household survey 2006-onwards
- US Consumer expenditure
survey
- US Panel Study of Income
Dynamics, Tracks consumers over time, so can be used to look at
career progression, etc.
- US Educational
Achievements, Reading and Maths achievement in the US. Data by
regions and time.
- US CDC Wonder
Database, can help you get detailed disease data by location and
time.
- Home
Mortgage Disclosure Act
- Zillow research
data, Housing: prices, sales, etc.
- Realtor.com
data, Housing characteristics (costs, physical condition, with
household characteristics).
- Time use surveys,
international
- Federal Financial
Institutions Examination Council, Panel data on banks’ balance
sheets.
- EUROSTAT,Income
and living conditions in the EU
- Visual Crossing, From
here you can get global weather data, including historical data.
Datasets used on ECLR
Here we provide information on some of the datasets used on the ECLR
page. Others are explained in the workthroughs they are used in.
This is a dataset often used in standard econometrics textbooks (such
as Wooldridges’s Introductory Econometrics). It provides 753
observations for female wages with a range of other variables. You can
find details on the variables here. You
can get all the datafiles for Wooldridge’s book from its publisher
[Cengage])https://www.cengage.com/aise/economics/wooldridge_3e_datasets/).
Major League Baseball (mlb1.csv)
This is another dataset used in Wooldridges’s Introductory
Econometrics textbook. It has 353 observations for Major League Baseball
players in 1993 including their wages and a range of other variables.
You can find details on the variables here. You
can get all the datafiles for Wooldridge’s book from its publisher Cengage.
In this dataset you can find data on each police-registered road
traffic accident in Greater Manchester (UK) between 2010 and 2020. The
government provides detailed data on these. The datafile comes from the
Government
Data website. From this link you can download the latest dataset.
The file linked above contains data from 2010 to 2020.
The dataset contains information on more than 40,000 accidents. On
each accident there are 25 pieces of information (variables). The data
are the coded up information drawn from accident reports. In order to
understand how the variables are coded you should have the Data
Dictionary at hand.
The data here are prepared from the European Value Study. This is
a repeated cross-sectional survey administered across European
countries. The study covers a wide range of subjects like attitudes
towards work, immigration, family and the environment.
The above dataset is a Rdata object which you can load
into your Rcode using load("WBdata.Rdata"). This will then
deliver two objects into your environment. Along the proper datafile
(wb_data) you will find wb_data_Des which
contains some information for each of the variables.