In this post there is a brief introduction on how to fetch data from Erustat for data ingestion in a continuous integration (CI) framework for Data Scientists. This may help Data Scientists with interest to device projects with CI and Deployment focus to add extra value to the research phase of a project.
In this post you can review a proposal for retrieving data from Eurostat to completely or partially automate data set updates using R. Although it provides useful links for a full-integrated development using SDMX Web Services and APIs, the focus is to understand the Buldk Download facility from Eurostat and how to organize the methods to fetch it an prepare it.
When approaching the challenge of systematically maintain data ingestion from Eurostat with few DevOps resources, one can use Eurostat’s Bulk Download Facility
Updates are done twice a day, at 11:00 and 23:00, and the data is available in two formats: tsv (tab separated values) and SDMX (Statistical Data and Metadata eXchange).
Those allow access to updated “plain text” versions of table of content (TOC) of their data structure and data sets using (REST requests) patterns available, which can be easily implemented in “R/Phyton-minded” research projects.
There are 3 simple steps to retrieve data from Eurostat using R:
1. Search in Eurostat’s TOC to retrieve time series codes. 2. Fetch data-sets or time-series. 3. Retrieve related information from dictionaries.
Although there’s is a recommended previous step which is understanding how Eurostat structures things.
Have a look at the Bulk Download Listing .
Here you will easily understand what information is available. Specially review the PDF document BulkDownload_Guidelines.pdf.