1. Introduction

This work represents an effort to make accessible and manageable in a standard format, a common Coordinated Reference System (CRS) and various resolutions, the information of the normalised Digital Surface Model of building heights for the whole of Spain –MDSnE2,5– distributed by the National Geographic Institute (IGN), and generated from the first nationwide LiDAR –Light Detection and Ranging– coverage.

From the original information distributed by the IGN, and accessible in the Download Centre, we generate a series of layers in a single CRS –ETRS89-LAEA– and resolutions ranging from the original one, pixels of 2.5m x 2.5m up to 1km x 1km, which constitutes the standard European reference grid (INSPIRE 2014). All of them can be used for different analyses, depending on the objective of the work. In a way, this product represents the national version of the building height information from the Global Human Settlement Layer (GHSL-H), with global coverage, distributed by the Joint Research Center (JRC) of the European Commission (EC).

As a by-product, we obtain two other types of information consistent with the previous one.

On the one hand, a built-up layer, which in its original resolution is binary –1/0–, depending on whether the pixel is considered occupied by a building or not, and which represents the percentage of the cell covered by buildings in lower resolutions layers obtained by aggregation from the original binary layer. In a way, this product represents the national version of the European Settlement Map (ESM), with European coverage, or the built-up surface of the Global Human Settlement Layer (GHSL-S), with global coverage.

On the other hand, a built-up volume layer, which combines the information from the built-up surface and the building height. We obtain a layer in the original resolution, 2.5m x 2.5m, and in all the lower resolutions as the previous layers. In a way, this product represents the national version of the built-up volume of the Global Human Settlement Layer (GHSL-V), with global coverage1.

The national product has higher initial resolution and, at the same time, we hope it has fewer errors, since the global products of the Global Human Settlement Layer are based on satellite images while the digital models of the IGN come from aerial orthophotographs and LiDAR sensors.

It should be noted that, although the first LiDAR coverage corresponds to the period 2009 - 2015, the second coverage –2015 - 2021– has already been completed, being pending, at the time of undertaking this work –October 2023–, the publication of this new information. Given that the process has been fully automated, as described below, the updating of the layers is (almost) direct, once the original information is available from the IGN.

The structure of the paper is as follows. Next, the original information is described, setting out some technical details and coverage. The third section details the processing of this information, which involves reprojecting the original raster files and the generation of a complete mosaic, initially maintaining the original resolution. This is followed by a description of the final layers produced, which are distributed in national files by variable and resolution. All the generated information is freely available for download. Finally, regional and provincial statistics are offered as an exampe of use. A few brief comments conclude the work.

2. Original information: Digital building height models (2.5m x 2.5m)

The National Aerial Orthophotography Plan (PNOA) is a cooperative project involving the General State Administration and the Autonomous Communities (CCAA). It began in 2004 with the aim of obtaining digital aerial orthophotographs of the entire Spanish territory, with a fixed updating period. In 2009, the LiDAR –Light Detection and Ranging– technology was incorporated into the PNOA project.

LiDAR is currently the technology that allows the most accurate capture of elevation information. Its development and consolidation throughout the 20th century has made it possible to obtain high quality information about the orography of the terrain. The fundamental objective of the use of LiDAR technology is the capture of three-dimensional information of the entire Spanish territory using airborne sensors, thus generating a series of products from this capture –cloud of points with coordinates–.

The derivative product used in this work, provided directly by IGN, is the Digital normalized Surface Model of the building class (MDSnE2,5) from the LiDAR information. It is a product in raster format that informs about the height above the ground of buildings, being a product of easier access, consultation and distribution than the original cloud of points.

Basically, from the LiDAR cloud of points we can derive two basic products. On the one hand, a Digital Terrain Model, which is simply a digital model referred to the terrain, without obstacles like trees or buildings. On the other hand, a Digital Surface Model, which is a digital model that represents the highest surface on the ground, that is, including obstacles, either of natural origin –vegetation– or artificial origin –buildings–.

Given that the Digital Surface Model can be classified according to different classes, for example, buildings or vegetation, it is possible to determine the height of these obstacles simply by taking the difference between the Digital Surface Model of a given class, buildings in our case, and the Digital Terrain Model. This difference is the Digital normalized Surface Model of the building class (MDSnE2,5), which represents the height of buildings above the ground.

The Digital normalized Surface Model of the building class used in this work correspond to the first LiDAR coverage, that took place between 2009 - 2015, with a minimum cloud density of 0.5 points/m² and a Root Mean Square Error in the Z coordinate \(\le\) 40 cm2.

The information is distributed in the sheets of the National Topographic Map (MTN) 1/50,000 –MTN50–. An obsolete distribution system that involves handling more than 1,000 raster files in tif format3. The CRS is ETRS89European Terrestrial Reference System 1989– for the peninsula and the Balearic Islands, and REGCAN95 for the Canary Islands, and UTMUniversal Transverse Mercator– projection in the corresponding zone, covering 29, 30 and 31 for the peninsula and the Balearic Islands and 28 for the Canary Islands4.

In total we have 1,180 files, which correspond to as many sheets of the National Topographic Map (MTN). Some of them are redundant, since the sheets that are located between two zones are duplicated in the projections in both zones. Specifically, we have 46 duplicate files –corresponding to the same MTN sheet– as they are projected in the 29 and 30 zones, and 23 duplicate files –corresponding to the same MTN sheet– as they are projected in the 30 and 31 zones. There are therefore 69 redundant files that were not considered in the processing of the information. From a practical point of view, the files projected in zone 30 were taken in both cases. In addition, a file5 has no information, as it contains no buildings, all values being missing. Consequently, the unique files to be processed are a total of 1,110, 30 files in REGCAN95-UTM28, 200 files in ETRS89-UTM29, 749 files in ETRS89-UTM30 and 131 files in ETRS89-UTM316.

The 1,110 files to be processed cover the entire national territory with the exception of the autonomous cities of Ceuta and Melilla, the island of Alboran –administratively belonging to the municipality of Almeria– and the African possessions –Perejil Islet, the Rock of Velez de la Gomera, the Rock of Alhucemas, the Islands of Alhucemas and the Chafarinas Islands–. In total we have just over 600 million pixels with building height values at a resolution of 2.5m x 2.5m7. Some of these values are negative8, which makes no sense, and others are incredibly high, well over 250m, when the tallest building in Spain does not exceed that height. These values, which come from the automatic treatment of the LiDAR point cloud, and which curiously have not been filtered by the IGN, were treated in the processing of the information, as indicated in the following section.

The original information is not masked by administrative boundaries. Visual inspection of the border sheets shows pixels built in France. The processed information is also not masked by administrative boundaries, as it simply reprojects, merge and aggregates the original information. A precaution to be taken into account in the use of layers when results for specific administrative boundaries are desired, as shown in section 5.

3. Processed information

The process of generating a single national building height layer with 2.5m x 2.5m resolution and in ETRS89-LAEA from the original information requires the reprojection of the 1,110 downloaded raster files, originally in four different CRS, into a single CRS and their union into a single national layer.

It is widely known that there is no single way to reproject –or resample– a raster file. The process followed in this case was broadly known as warping by points (Maffenini, Schiavina, Freire, Melchiorri and Kemper 2023) and consisted of the following steps9:

  1. The original raster files were converted to point vector files –at cell center– and reprojected to ETRS89-LAEA10.
  1. The vector files were individually rasterized in the original resolution, 2.5m x 2.5m. Most pixels with height value only have a point from the vector layer, but in 2% of cases –about 12 million points– pixels accommodate two points of the vector information. In these cases the pixel value was averaged. In this step the negative height values were set to missing values, and values greater than 250m were truncated to that value11.
  1. Finally we make a mosaic by merging all the raster files from the previous step, averaging pixels when they overlap, and adjusting the extend of the final layer to the reference grid generated from the administrative contours (Goerlich 2023a)12.

This is our base layer, which represents the height of the buildings in a pixel of 2.5m x 2.5m, a single CRSETRS89-LAEA– and full coverage of Spain, with strictly positive values in the interval (0, 250]. The height is expressed in meters. We name this layer Height_epsg3035_2.5m.tif. It has 588,149,113 pixels with strictly positive height values and bounded above by 250m, representing 3,676km² of built-up area, and an average height of 6.9m, approximately two floors. All other pixels in this layer are set to missing values.

All other information is derived from this base layer, that is simply the Digital normalized Surface Model of the building class –the MDSnE2,5 data set– distributed by the IGN in other CRS and in a single file.

The whole process was performed using free software based on the statistical calculation system R (R Core Team 2023), using tidyverse libraries (Wickham et al 2019) for data wrangling, sf library (Pebesma 2018) for handling vector information and terra (Hijmans 2023) and stars (Pebesma and Bivand 2023) libraries for handling raster information.

4. Final layers

From the main layer produced in the previous section, Height_epsg3035_2.5m.tif, we generate the following information.

All the layers are saved in tif files with the following nomenclature: <var>_epsg3035_<res>.tif, where <var> is the variable involved: Height, Built-up or Volume, for built-up volume, and <res> is the resolution: 2.5m, 5m, 10m, 20m, 50m, 100m, 200m, 250m, 500m or 1km. So, we have 30 files in total, 10 by each variable.

These files are available from zenodo14.

As an example, map 1 shows the percentage of built-up in 20m x 20m pixels of the urban area of Valencia, corresponding to the layer Built-up_epsg3035_20m.tif.


Map 1. Built-up share at 20m x 20m pixels


The data for the 1km x 1km resolution for the 3 variables was transferred to the grid statistics of Goerlich (2023a) in vector and tabulated form, so it can be readily integrated with other statistics in this format.

5. Example of use: Regional, provincial and municipal statistics

From the generated layers, we derive –using the original pixel resolution of 2.5 m x 2.5 m– the built-up surface and its proportion with respect to total regional surface, the average height of the buildings and the total built-up volume within the regional –Autonomous Communities–, provincial and municipal administrative boundaries. As a by-product, we also include the total number of pixels involved in the calculations and the built-up pixels.15

Regional data –Autonomous Communities– are shown in table 1. The numbers given in this table are slightly lower than those mentioned in the previous sections because they are cut by administrative boundaries, whereas the previous ones are the original ones from the information distributed by the IGN and we have already mentioned that they include buildings beyond our borders.

The calculations involve just over 80 billion pixels, although the built-up pixels are only 585,722,033, representing a built-up area of 3,661 km².

At the national level, the percentage of built-up area as a percentage of the total is 0.72%, but some communities register notably higher percentages. Madrid, with 2.59%, and the Canary Islands, with 2.46%, stand out, although highly touristic regions, such as the Balearic Islands and the Valencian Community, also register notable values, 1.87% and 1.77% respectively.

The average height of buildings at national level is almost 7 metres, slightly above 2 floors, although the Basque Country slightly exceeds 10 metres, and Madrid is close to it. At the other extreme, the regions with the lowest average building heights are Extremadura and the region of Murcia, with 5.8 and 5.4 metres respectively. According to our estimates, the national building volume amounts to 25,153hm³.


Table 1. Built-up statistics by Region –Autonomous Communities–.
Region
Pixels
Surface (km²)
Code Name Total Built-up Total Built-up Share (%) Height (m) Volume (hm³)
01 Andalucía 14,015,675,866 106,412,386 87,597.97 665.08 0.759 6.29 4,184.75
02 Aragón 7,635,399,895 27,336,895 47,721.25 170.86 0.358 6.47 1,104.60
03 Asturias 1,696,623,246 10,271,029 10,603.90 64.19 0.605 8.34 535.69
04 Illes Balears 798,526,935 14,967,915 4,990.79 93.55 1.874 6.20 580.07
05 Canarias 1,191,218,325 29,294,704 7,445.11 183.09 2.459 5.99 1,097.60
06 Cantabria 852,811,513 6,196,150 5,330.07 38.73 0.727 7.63 295.36
07 Castilla y León 15,075,788,352 53,288,382 94,223.68 333.05 0.353 6.10 2,030.56
08 Castilla-La Mancha 12,713,325,984 46,267,241 79,458.29 289.17 0.364 6.02 1,741.00
09 Cataluña 5,138,402,549 68,280,324 32,115.02 426.75 1.329 8.25 3,518.78
10 Comunidad Valenciana 3,722,424,368 66,008,199 23,265.15 412.55 1.773 6.86 2,831.19
11 Extremadura 6,661,570,038 19,692,099 41,634.81 123.08 0.296 5.75 707.87
12 Galicia 4,734,244,185 47,424,389 29,589.03 296.40 1.002 6.31 1,870.34
13 Madrid 1,284,950,672 33,217,928 8,030.94 207.61 2.585 9.69 2,011.59
14 Región de Murcia 1,810,606,472 27,333,961 11,316.29 170.84 1.510 5.43 928.21
15 Navarra 1,662,561,998 8,803,164 10,391.01 55.02 0.529 8.16 448.80
16 País Vasco 1,157,566,860 16,238,349 7,234.79 101.49 1.403 10.12 1,027.27
17 La Rioja 807,180,736 4,688,918 5,044.88 29.31 0.581 8.17 239.55
18 Ceuta y Melilla 5,482,440 34.27
España 80,964,360,434 585,722,033 506,027.25 3,660.76 0.723 6.87 25,153.22
Fuente: MDSnE2.5 - National Geographic Institute (IGN) and own elaboration.


It would be interesting to compare our estimates, derived from LiDAR, with those obtained, for the same variables, from Cadastral information (Goerlich 2023c, Uhl; Royé; Burghardt; Vázquez; Sanchiz and Leyk 2023).

Provincial data are shown in the appendix. Regional, provincial and municipal data are available from zenodo in an Excel file format.

6. Concluding comments

This work presents a re-elaboration of the Digital normalized Surface Model of the building class –the MDSnE2,5 data set– provided by IGN in order to facilitate their use. In particular, all building height information is provided in a single national file, in the CRS which is normally used in spatial analysis at European level –ETRS89-LAEA–, in a reasonable size and with minimal treatment in the data to make it more in line with the variable it represents16.

From this file, which represents a notable improvement in terms of ease of manipulation on the original data provided by IGN, we proceed to the aggregation at lower resolutions until the European standard grid of 1km x 1km of cell size.

From the original information, we also obtain two derived products that are not directly offered by the IGN. A layer of built-up surface and another of built-up volume, in both cases in all the resolutions derived for the layer of buildings height.

Additionally, this information is transformed into administrative level statistics: regions –Autonomous Communities–, provinces and municipalities. And, given its resolution, it can be transformed to other areas of interest at various resolutions.

The information processed in this work comes from the first LiDAR coverage corresponding to the period 2009 - 2015. However, the process is automated in its (almost) totality, so the developed scripts can be used, (almost) without modifications, for the generation of the same information from the second LiDAR coverage –corresponding to the period 2015 - 2021–, which will be available in the near future. Or for the generation of the same information from the Normalised Digital Surface Model of the vegetation class –the MDSnV2,5 dataset17.


References



Annex: Provincial statistics

Table A1. Built-up statistics by Province.
Province
Pixel
Surface (km²)
Code Name Total Built-up Total Built-up Share (%) Height (m) Volume (hm³)
01 Alava 485,923,238 4,165,590 3,037.02 26.03 0.857 8.99 234.06
02 Albacete 2,388,279,619 7,643,706 14,926.75 47.77 0.320 6.47 309.08
03 Alacant/Alicante 930,905,471 27,048,875 5,818.16 169.06 2.906 6.19 1,046.36
04 Almeria 1,403,769,135 16,685,218 8,773.56 104.28 1.189 5.23 545.89
05 Avila 1,287,946,058 4,499,185 8,049.66 28.12 0.349 5.54 155.69
06 Badajoz 3,482,694,952 11,953,143 21,766.84 74.71 0.343 5.60 418.45
07 Illes Balears 798,526,935 14,967,915 4,990.79 93.55 1.874 6.20 580.07
08 Barcelona 1,236,983,936 36,079,653 7,731.15 225.50 2.917 9.30 2,096.79
09 Burgos 2,286,584,424 6,913,450 14,291.15 43.21 0.302 7.14 308.33
10 Cáceres 3,178,875,086 7,738,956 19,867.97 48.37 0.243 5.98 289.42
11 Cádiz 1,190,137,607 12,747,944 7,438.36 79.67 1.071 6.63 528.53
12 Castellón/Castelló 1,061,488,008 12,016,906 6,634.30 75.11 1.132 6.25 469.61
13 Ciudad Real 3,169,880,313 10,764,964 19,811.75 67.28 0.340 5.77 388.38
14 Córdoba 2,203,310,330 13,542,295 13,770.69 84.64 0.615 5.26 445.58
15 A Coruña 1,274,161,598 17,067,629 7,963.51 106.67 1.340 6.65 709.51
16 Cuenca 2,742,176,655 6,414,066 17,138.60 40.09 0.234 5.95 238.48
17 Girona 945,333,022 11,294,880 5,908.33 70.59 1.195 7.03 496.15
18 Granada 2,023,508,823 11,021,813 12,646.93 68.89 0.545 6.80 468.76
19 Guadalajara 1,954,145,714 5,678,822 12,213.41 35.49 0.291 6.36 225.77
20 Guipúzcoa 317,160,959 4,994,191 1,982.26 31.21 1.575 10.33 322.34
21 Huelva 1,620,350,285 8,201,607 10,127.19 51.26 0.506 5.92 303.37
22 Huesca 2,501,953,894 7,552,433 15,637.21 47.20 0.302 6.00 283.38
23 Jaen 2,159,470,722 8,810,787 13,496.69 55.07 0.408 6.47 356.36
24 León 2,492,446,143 8,999,011 15,577.79 56.24 0.361 6.75 379.56
25 Lleida 1,946,929,906 9,156,872 12,168.31 57.23 0.470 6.72 384.81
26 La Rioja 807,180,736 4,688,918 5,044.88 29.31 0.581 8.17 239.55
27 Lugo 1,577,237,906 9,003,047 9,857.74 56.27 0.571 5.71 321.23
28 Madrid 1,284,950,672 33,217,928 8,030.94 207.61 2.585 9.69 2,011.59
29 Málaga 1,169,344,307 14,583,206 7,308.40 91.15 1.247 7.68 699.76
30 Murcia 1,810,606,472 27,333,961 11,316.29 170.84 1.510 5.43 928.21
31 Navarra 1,662,561,998 8,803,164 10,391.01 55.02 0.529 8.16 448.80
32 Ourense 1,163,906,291 7,736,738 7,274.41 48.35 0.665 5.91 285.70
33 Asturias 1,696,623,246 10,271,029 10,603.90 64.19 0.605 8.34 535.69
34 Palencia 1,288,371,654 3,754,401 8,052.32 23.47 0.291 6.37 149.57
35 Palmas de Gran Canaria 651,153,919 14,296,564 4,069.71 89.35 2.196 6.06 541.66
36 Pontevedra 718,938,390 13,616,975 4,493.36 85.11 1.894 6.51 553.90
37 Salamanca 1,975,985,281 7,838,093 12,349.91 48.99 0.397 5.48 268.33
38 Santa Cruz de Tenerife 540,064,406 14,998,140 3,375.40 93.74 2.777 5.93 555.94
39 Cantabria 852,811,513 6,196,150 5,330.07 38.73 0.727 7.63 295.36
40 Segovia 1,107,673,668 4,354,447 6,922.96 27.22 0.393 5.63 153.18
41 Sevilla 2,245,784,657 20,819,516 14,036.15 130.12 0.927 6.43 836.51
42 Soria 1,649,140,311 3,117,199 10,307.13 19.48 0.189 5.42 105.69
43 Tarragona 1,009,155,685 11,748,919 6,307.22 73.43 1.164 7.37 541.03
44 Teruel 2,369,448,638 6,554,733 14,809.05 40.97 0.277 5.53 226.55
45 Toledo 2,458,843,683 15,765,683 15,367.77 98.54 0.641 5.88 579.29
46 Valencia/València 1,730,030,889 26,942,418 10,812.69 168.39 1.557 7.81 1,315.22
47 Valladolid 1,297,686,463 7,351,842 8,110.54 45.95 0.567 6.72 308.86
48 Vizcaya 354,482,663 7,078,568 2,215.52 44.24 1.997 10.64 470.87
49 Zamora 1,689,954,350 6,460,754 10,562.21 40.38 0.382 4.99 201.35
50 Zaragoza 2,763,997,363 13,229,729 17,274.98 82.69 0.479 7.19 594.68
51 Ceuta 3,203,763 20.02
52 Melilla 2,278,677 14.24
España 80,964,360,434 585,722,033 506,027.25 3,660.76 0.723 6.87 25,153.22
Fuente: MDSnE2.5 - National Geographic Institute (IGN) and own elaboration.

  1. The last version of the Global Human Settlement Layer, R2023, distinguishes between residential and non-residential buildings, this distinction is not possible without incorporating additional external information to the IGN digital surface models, and is therefore not incorporated in the present work. On the other hand, there is information in the IGN that may be more useful for this, such as the Spanish High Resolution Soil Occupation Information System 2017 (SIOSEAR2017).↩︎

  2. Technical specifications.↩︎

  3. Initially these were distributed as ASCII files, asc, known as ESRI ASCII Grid format, which does not include the CRS information, that has to be added after reading the file from the conventions in its name.↩︎

  4. EPSGEuropean Petroleum Survey Group– codes are 25829, 25830 and 25831 for ETRS89 and UTM projection in zones 29, 30 and 31, and 4083 for REGCAN95 and UTM projection in zone 28.↩︎

  5. NDSM-Edificacion-ETRS89-H31-0669B-COB1.tif↩︎

  6. Two files were, however, in a different projection from the one in their nomenclature. These were file NDSM-Edificacion-ETRS89-H29-0001-COB1.tif, which should be projected in zone 29, but its reading indicated that it was in zone 30, and file NDSM-Edificacion-ETRS89-H31-0118B-COB1.tif, which should be in zone 31, but its reading indicated that it was in zone 30. A careful inspection of these files indicated that their resolution was not exactly 2. 5m x 2.5m, but somewhat higher, 2.504835m x 2.504835m in the first case and 2.501588m x 2.501588m in the second, so they are probably files reprojected from the zone given in their nomenclature to the zone obtained from their reading. In both cases, the correct projection was considered to be the one obtained from reading the file, and no modifications were made to them.↩︎

  7. Specifically, the 1,110 files processed contain 628,005,854 pixels with height values, representing 3,925km² of built-up area. However, since the distribution files have a certain degree of overlapping, the number of different pixels with height information is somewhat smaller.↩︎

  8. About half million points, which represents about 0.07%.↩︎

  9. Although the information was processed using the steps described below, all downloaded files –1,180– were reprojected –resampled– individually to ETRS89-LAEA using the nearest neighbor method. This includes duplicated sheets, between two zones, which are now available in the same projection. This information was not used in the generation of layers at the national level, but is available from upon request, in tif format files.↩︎

  10. They were saved in gpkg format –1,110 files–. This information is available from upon request.↩︎

  11. After fixing negative values to missing values, the lower strictly positive values are ridiculously small to be considered building heights or constructions. In some cases less than 1cm, not even the 🐕 house! The message is that, probably, there is room for improvement in the sense of adapting the data to reality –for example by filtering all values below 1m or eliminating isolated pixels, since they represent a surface of only 6.25m², which can hardly represent a building– but it was decided to manipulate the original data as little as possible. Scripts allow, however, to make such modifications easily if necessary.

    In this step we have 615,100,263 pixels with strictly positive height values and bounded above by 250m, representing 3,847km² of built-up area. These files were saved in tif format –1,110 files–. This information is available from upon request.↩︎

  12. However, we don´t maks the data, neither by administrative contours, nor by the reference grid, only the extension of the layers is adjusted. Goerlich (2023a) derives a grid for Spain using European standards (INSPIRE 2014) and covering the contour of the whole country. The extension of the generated layers coincides, in all resolutions, with that of this grid.↩︎

  13. Even if only 362,118 cells with strictly positive height values fall within our reference grid determined by the Spanish contour (Goerlich 2023a).↩︎

  14. The distribution files are made in simple precision calculations. The height and volume files are also available in double precision, which are considerably more cumbersome to handle. The building volume calculations in the following section use the results calculated in double precision.↩︎

  15. Of course, the surfaces are no more than the number of corresponding pixels scaled by 6.25m², which is the surface of a pixel.↩︎

  16. Other treatment is easily possible, if desired, from the intermediate working files.↩︎

  17. Scripts that perform the calculation are also available from upon request.↩︎