B Data sources and final report

  • Know different data sources
  • Use libraries for direct access to databases

B.1 Data sources

In this chapter, I will introduce some interesting data sources (databases) that you can use for your final report. I offer technical assistance for the package eurostat. You can discover other packages on your own (this could be a possible challenge for the final report, see below 😄).

B.1.1 Federal Statistical Office

The Federal Statistical Office offers data about Germany in different areas through their database GENESIS. You can navigate to the data sets and download them. Pay attention to select the format flat when downloading to get a tidy data set.

B.1.2 eurostat

eurostat is the statistical office of the European Union. You will find many different data sets about Europe there. However, I strongly recommend using the dedicated R library eurostat to download the data. My experience shows that students underestimate how error-prone manual download from a large database can be! The R library eurostat has an informative web site with many tutorials. Some of the show how to use the eurostat data to visualize maps (possible report challenge).

B.1.3 gapminder

You already worked with an excerpt of data by gapminder. The complete data set contains much more. You can download more data here. gapminder organises their data in a so-called DDF Format (data description format) as tidy .csv files. A possible report challenge could be to understand this format. A complete data set is availagle via GitHub here.

B.1.4 National Oceanic and Atmospheric Administration (NOAA)

NOAA offers data on oceans, climate and weather. You can download the data sets via the library rnoaa.

B.1.5 More data sources

B.2 Research proposal

Before you write your final report, you need to submit a research proposal on ILIAS. Be sure to upload before the deadline!. You will receive feedback on it. This should avoid any misunderstanding about your final report. Please use the template provided on ILIAS for the research proposal.

B.3 Structure of the final report

B.3.1 Sturcture of the working directory

Create a new R project for your final report. You can find more about using projects here. A project helps you to structure your work properly.

Inside your project folder, create a folder for data, figures and (help) scripts (if appropriate). Save your notebooks in the root directory of your project.

B.3.2 Download and save data

If you want to use libraries to download data, then create a notebook for this task. Don’t download and analyse the data on-the-fly. Data can change and there will be a time delay between your analysis and my assessment of your report. Download and save your data in the data folder. Use the saved data for the analysis. This will ensure that your report is fully reproducible.

B.3.3 Structure of the report

Please structure your report as follows:

  • Introduction (with research question at the end)
  • Material and Methods:
    • Data description: date of download, reproducibility is crucial), description of variables and their units
    • Description of the method with references, description of the research area if appropriate
  • Results:
    • Explorative data analysis (mandatory!)
    • Further analysis to respond to your research question(s)
  • Discussion: use peer-reviewed literature to discuss your results
  • Conclusions
  • Bibliography

You can combine results and discussion to one section.

Each report should contain a challenge. This can be an extensive data tidying and wrangling or use of a new and interesting library (e.g. to plot spatial data in eurostat) etc.

You can write your report in groups of two, I even recommend to do so. In this case, please add your names to the respective parts that you are responsible to receive individual grades. Every part will be graded separately. Every group member need to do some analysis and some writing. Alternatively, give your consent to receive the same grade if you analysed the data and wrote the report together.

I don’t impose any particular length for your report. However, please be concise.

Knit the final report to an html-document. Think about whether you need to show all chunks, avoid redundancy. Compress the whole project and submit everything! Your report and your analysis must run on my computer so observe the paths!

Submit the compressed project to ILIAS before the deadline.

B.4 Evaluation critera

I will evaluate your report along the following criteria:

  1. Does the notebook run without errors?
  2. Is the the notebook structured as prescribed above?
  3. Are the research questions stated clearly?
  4. Are the methods described and necessary literature cited.
  5. Were the data downloaded and saved outside the main report in an extra notebook for full reproducibility? In case the data is downloaded by hand, this must be explained in the main report.
  6. Is the research area described?
  7. Quality of data preparation and exploratory analysis.
  8. Quality of the main analysis, which you do to respond to research questions.
  9. Quality of figures.
  10. Are the results discussed and additional literature cited? This literature should help evaluating and understanding your results.
  11. Are the conclusions sound and supported by the data and results?

You will receive a corrected report as feedback.