This graph summarises part of the work carried out with the daily prices of hotel establishments (hotels and guesthouses) scraped from the internet. For the capture of data we relied on different web-scraping methods and as we became more familiar with the structure of the website the quality of the data also improved. The studied universe is incomplete given that not all accommodation offers the option of online booking, accommodation coverage currently stands at approximately 75%. The scraped data were merged with the Eustat Survey on Tourist Establishments in order to obtain additional information from the hotels and guesthouses directory. Among the preparatory work that was carried out was the detection of atypical values, the imputation of missing data and finally various types of clusterings of the daily price series.
Clustering with absolute prices: In this clustering, the hotel establishments have been grouped via absolute prices. We have relied on the Euclidean distance to group the series because, among other reasons, it is simple to understand and because the desired results were obtained. For its part, the PAM method (Partitioning Around Medoids1 ) was used to classify the series, of the cluster package, based on the K-means method.
Clustering with normalised prices: In this clustering, the hotel establishments have been grouped via normalised prices. We relied on the Min-Max method; that is, the prices of each of the hotels have changed scale, between 0 and 100. This change of scale is done to compare all the hotels and guesthouses in the same way so that their trend can later be analysed and the classification can be conducted accordingly. We relied on the Euclidean distance to group the series and the PAM method (Partitioning Around Medoids1 ) for the classification of the series, of the cluster package, based on the K-means method.
Clustering with volatility: This clustering is based on volatility in order to measure the changes presented by the prices from one day to another. The main objective of this grouping is to conduct a classification of the hotels and guesthouses that modify their price considerably, slightly or not at all in the short term. It was based on a measurement used in economics. Close-to-Close Volatility or Close/Close Volatility.
The main idea of this measurement consists of analysing the difference between consecutive days through the logarithm and subsequently calculating the standard deviation of these values.