How to Integrate Web Scraping, Pivot Longer, and Spatial Animation using R?

Integrating web scraping, pivot longer, and spatial visualization

Hello everyone, in this post I'll show you how to create spatial visualization using R. As we know that spatial analysis is important one, however I thought that not all of us can visualize spatial data as well. I felt it as I ever did it before.

In this section, we will scrape the data first then process it well using a bit of Natural Language Processing (NLP) method. After that, we link and match the data into single data. So, we need raster data specifically Jawa Timur (shape file, *shp) and latitude-longitude of Jawa Timur's region. The main data derived from BPS Jawa Timur, you can access it on this url: jatim.bps.go.id.

At last stage, we will learn and practice how to animate the spatial data using a special package often I used named gganimate(). This package is powerfull to change your general either graph or chart into an interactive visualization.

Noted: latitude-longitude data of Jawa Timur can be found in this site either shape file of Jawa Timur

How to practice web scraping, pivot longer, and spatial visualization using R? we can follow some stages belowed:


# packages activation to support the project
library(rvest)
library(xml2)
library(stringr)
library(dplyr)
# web scraping activity to get rice production data from BPS Jawa Timur in jatim.bps.go.id
# after that we integrate it into a data frame named prodberas
# first, we should declare the url and then read it
# second, we get some nodes from pages then scrape and extract it using a bit NLP process
url <- "https://jatim.bps.go.id/indicator/53/580/1/produksi-beras-menurut-kabupaten-kota.html"
laman <- read_html(url)

kako <- laman %>% html_nodes('#tablex td:nth-child(1)') %>% html_text()
kako <- kako[1:38] %>% str_replace_all("Kabupaten ", "")

jan <- laman %>% html_nodes('#tablex td:nth-child(3)') %>% html_text() %>% str_replace_all(" ", "")
jan <- jan[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

feb <- laman %>% html_nodes('#tablex td:nth-child(4)') %>% html_text() %>% str_replace_all(" ", "")
feb <- feb[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

mar <- laman %>% html_nodes('#tablex td:nth-child(5)') %>% html_text() %>% str_replace_all(" ", "")
mar <- mar[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

apr <- laman %>% html_nodes('#tablex td:nth-child(6)') %>% html_text() %>% str_replace_all(" ", "")
apr <- apr[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

may <- laman %>% html_nodes('#tablex td:nth-child(7)') %>% html_text() %>% str_replace_all(" ", "")
may <- may[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

jun <- laman %>% html_nodes('#tablex td:nth-child(8)') %>% html_text() %>% str_replace_all(" ", "")
jun <- jun[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

jul <- laman %>% html_nodes('#tablex td:nth-child(9)') %>% html_text() %>% str_replace_all(" ", "")
jul <- jul[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

ags <- laman %>% html_nodes('#tablex td:nth-child(10)') %>% html_text() %>% str_replace_all(" ", "")
ags <- ags[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

sep <- laman %>% html_nodes('#tablex td:nth-child(11)') %>% html_text() %>% str_replace_all(" ", "")
sep <- sep[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

okt <- laman %>% html_nodes('#tablex td:nth-child(12)') %>% html_text() %>% str_replace_all(" ", "")
okt <- okt[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

nov <- laman %>% html_nodes('#tablex td:nth-child(13)') %>% html_text() %>% str_replace_all(" ", "")
nov <- nov[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

des <- laman %>% html_nodes('#tablex td:nth-child(14)') %>% html_text() %>% str_replace_all(" ", "")
des <- des[1:38] %>% str_replace_all(",", ".") %>% str_replace_all("[[:space:],]", "") %>%
  as.numeric()

# the result
prodberas <- data.frame(kako, jan, feb, mar, apr, may, jun, jul, ags, sep, okt, nov, des)
prodberas
##                kako      jan      feb       mar      apr      may      jun      jul      ags      sep      okt
## 1           Pacitan    74.30  7709.22  16664.44  6732.61  2924.93  5087.10  6623.02  1620.68  1232.37  1027.38
## 2          Ponorogo  4424.66   392.08  43211.19 42660.14  7572.77  8267.44 43204.56 20959.75  5310.70   237.95
## 3        Trenggalek  1652.96   419.30  16189.83 10576.06  2963.06  1588.58 13626.96  6630.16  1996.29  2534.68
## 4       Tulungagung  6812.89  2643.75  13299.86 33108.09  8882.33  4938.71  8323.36 21894.89  6265.12  6503.48
## 5            Blitar  3404.15  3482.39  16254.15 44325.64 18977.01  4482.54  5292.56  5915.28  8545.73  7723.40
## 6            Kediri  2299.77  3721.03  29636.05 23413.95  2567.84  2419.87 18711.00  4924.18  2788.64   522.82
## 7            Malang 15132.43  5992.60  18043.45 24490.44 17434.28 12156.26 12150.65 11444.80 10094.83 10731.35
## 8          Lumajang 14922.39 13933.55  19235.73 22531.27 15099.56 17643.17 15272.37 10363.57  8701.12 12770.96
## 9            Jember 20443.23 12864.52  46680.31 95342.61 27488.63 16158.38 29038.47 49950.82 20473.92 13698.28
## 10       Banyuwangi 18766.17  8434.83  34346.28 48585.98 23469.67 14631.95 15495.68 14525.81 21508.13 36279.30
## 11        Bondowoso  8311.20  8065.34  21660.89 21202.50 18287.10 12569.63 13348.98  7798.97  6177.20  8070.38
## 12        Situbondo  3546.16  6735.77  20347.51 15265.94  4353.95  4016.70  8990.11  6144.96  2032.09  3809.36
## 13      Probolinggo  4198.92  3996.79  20497.91 29060.97 14575.22  7594.48  5409.95  6532.54  5957.53  3913.96
## 14         Pasuruan  5891.52  5711.86  21935.32 21769.23 16768.56  6611.29 18064.41  9047.51 10650.58  8608.98
## 15         Sidoarjo   991.26   501.74   4514.21 21389.91 22029.76  4940.07   574.48  4428.10 18293.93 18186.59
## 16        Mojokerto  3152.40  9898.22  26483.37 40201.48 15314.56 13043.03 13875.79 15726.87  8891.03  7746.17
## 17          Jombang  3803.64   733.65  30062.37 61866.63 17515.80  1723.15 17593.24 28494.53 20942.02  2476.25
## 18          Nganjuk  3781.75 10409.78  63505.28 37186.04  4822.87 22270.83 37777.23  9210.10  6362.32  6165.72
## 19           Madiun  5671.56   509.65  45787.74 35963.82  8664.44  3114.32 49575.30 20405.76  6573.18  2792.35
## 20          Magetan   551.37  8171.18  41572.90 14198.04  2587.40 21116.14 26640.45  4842.41  2392.88  8024.92
## 21            Ngawi  2993.39 21771.86  93477.70 42048.82  3722.63 30360.45 85766.32 24278.50  3434.18 20954.32
## 22       Bojonegoro  5406.18 42889.97 163434.99 24443.56  6339.59 60792.22 41905.41  6817.31 10586.96 14665.17
## 23            Tuban 18410.00 21536.45  88058.10 39949.99  8148.49  7861.99 24870.00 10389.18 27278.44 27106.91
## 24         Lamongan  2909.94 67690.04 145597.68 17154.44 19765.76 78837.29 52327.94 43720.09 35996.41 33517.36
## 25           Gresik   357.21 59009.17  52374.15  8063.86  8186.15 62102.27 15976.65  8462.17  3947.69 13201.76
## 26        Bangkalan   888.87 14721.98  49103.81  7209.85  2656.17 15149.51 13442.62  5616.20  2152.18  2978.37
## 27          Sampang     0.00 15458.23  56465.20  1082.92   463.28 23476.00  2264.13     0.00     0.00   391.96
## 28        Pamekasan     0.00  3091.90  52094.19  2961.37    43.74   811.53   644.54   250.28  2260.99     0.00
## 29          Sumenep  1965.03 11746.06  63974.55 18788.22  1449.88  9445.65 10243.17  6536.89  2886.58  3770.35
## 30      Kota Kediri   359.72   514.83    589.74  1382.34   357.34    48.35   360.12  1053.16   257.72    89.53
## 31      Kota Blitar    27.28     0.00    132.75  1914.09   760.00     0.00    22.50    90.07    62.90     0.00
## 32      Kota Malang   262.10   787.72    852.10   113.45   451.00   515.84   478.32   548.66   578.63   206.73
## 33 Kota Probolinggo    68.61     0.00    564.23  2070.79  1635.11   171.68     0.00     0.00    41.14   127.80
## 34    Kota Pasuruan   118.18   300.78    333.17   291.51   399.23   654.25   283.33   266.36   329.79   495.03
## 35   Kota Mojokerto     0.00   104.28      0.00   383.90   513.84    32.62   185.39   333.78   490.10   413.67
## 36      Kota Madiun     0.00     0.00   1693.11   943.40     0.00     0.00  1761.78   185.72     0.00    47.49
## 37    Kota Surabaya    56.78   567.17   1972.79   500.10   120.30    85.30   332.95   428.29   300.33    58.52
## 38        Kota Batu   396.19   420.31    178.78   207.70   318.42   510.89   362.58   401.72   205.17   252.11
##         nov      des
## 1   2300.56   218.45
## 2   5958.71 25332.77
## 3   4982.36  3680.76
## 4   4420.27  2558.67
## 5   4012.82  2008.42
## 6   3619.84  2874.46
## 7   9773.96  9386.15
## 8  12485.35 10745.68
## 9   8424.42 10144.76
## 10 16977.50 13865.72
## 11  5599.75  6725.36
## 12  3498.58  3037.76
## 13  2937.44  2385.20
## 14  9814.76  7896.41
## 15 15476.82  1004.71
## 16  2906.61  5494.43
## 17  3747.07  5731.63
## 18  9028.34  6864.81
## 19 34340.69 18477.91
## 20 18018.97  5881.24
## 21 76797.07 30889.11
## 22 17796.48 11592.78
## 23 11361.31  3126.33
## 24 21212.26  3190.63
## 25  4795.04   983.41
## 26   480.49     0.00
## 27     0.00     0.00
## 28     0.00     0.00
## 29  2052.39   283.53
## 30   284.63   590.76
## 31     0.00     0.00
## 32   640.97   783.95
## 33     0.00     0.00
## 34   583.64   156.24
## 35   112.13   336.42
## 36  1633.42   348.81
## 37   117.41   147.16
## 38   111.42   539.55
# in this step, we active two packages that important to import the map and visualize it
library(sf)
library(ggspatial)
library(readxl)
# this step aims to import the map from computer and so latlong data to be integrated
nc <- st_read(dsn = "E:/R/petajatim", layer = "jatim")
## Reading layer `jatim' from data source `E:\R\petajatim' using driver `ESRI Shapefile'
## Simple feature collection with 38 features and 17 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 110.8984 ymin: -8.780764 xmax: 116.2634 ymax: -5.527337
## CRS:           NA
latlonjatim <- read_excel("E:/R/petajatim/latlonjatim.xlsx")
# this code aims to change the column's name of latlong data from "Kako" to "NAMA_KAB"
names(latlonjatim)[names(latlonjatim) == "Kako"] <- "NAMA_KAB"

# change the region's name in nc data to be proper char
nc$NAMA_KAB <- str_to_title(nc$NAMA_KAB)

# join the nc data and latlong data
joinpeta <- left_join(nc, latlonjatim, by = "NAMA_KAB")
# Deleting "Kabupaten"
prodberas$kako <- gsub("Kabupaten ", "", prodberas$kako)

# change the pattern "Kota X" into "X (Kota)"
prodberas$kako <- gsub("Kota (\\w+)", "\\1 (Kota)", prodberas$kako)

# change column's name from "Kako" to "NAMA_KAB"
names(prodberas)[names(prodberas) == "kako"] <- "NAMA_KAB"

# join the joinpeta and prodberas
hasil <- left_join(joinpeta, prodberas, by = "NAMA_KAB")

# selecting the main column for spatial visualization
hasil <- hasil[,-c(3:17)]

# change wide table to be longer using pivot_longer()
library(tidyr)
data_pivot <- hasil %>% 
  pivot_longer(cols = -c(KODE_KAB, NAMA_KAB, geometry, lat, long), 
               names_to = "bulan", 
               values_to = "nilai")
head(data_pivot)
## Simple feature collection with 6 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 110.8984 ymin: -8.285555 xmax: 111.4228 ymax: -7.916109
## CRS:           NA
## # A tibble: 6 x 7
##   KODE_KAB NAMA_KAB   lat  long                                                                   geometry bulan  nilai
##      <int> <chr>    <dbl> <dbl>                                                             <MULTIPOLYGON> <chr>  <dbl>
## 1     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ jan   7.43e1
## 2     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ feb   7.71e3
## 3     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ mar   1.67e4
## 4     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ apr   6.73e3
## 5     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ may   2.92e3
## 6     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199527, 111.4011 -~ jun   5.09e3
# change monthly format into "2022-01-01" sampai "2022-12-01"
data_pivot$bulan <- gsub("(jan|feb|mar|apr|may|jun|jul|ags|sep|okt|nov|des)", 
                         "2022-\\1-01", data_pivot$bulan)

# change the monthly name into numeric respectively
data_pivot$bulan <- gsub("jan", "01", data_pivot$bulan)
data_pivot$bulan <- gsub("feb", "02", data_pivot$bulan)
data_pivot$bulan <- gsub("mar", "03", data_pivot$bulan)
data_pivot$bulan <- gsub("apr", "04", data_pivot$bulan)
data_pivot$bulan <- gsub("may", "05", data_pivot$bulan)
data_pivot$bulan <- gsub("jun", "06", data_pivot$bulan)
data_pivot$bulan <- gsub("jul", "07", data_pivot$bulan)
data_pivot$bulan <- gsub("ags", "08", data_pivot$bulan)
data_pivot$bulan <- gsub("sep", "09", data_pivot$bulan)
data_pivot$bulan <- gsub("okt", "10", data_pivot$bulan)
data_pivot$bulan <- gsub("nov", "11", data_pivot$bulan)
data_pivot$bulan <- gsub("des", "12", data_pivot$bulan)

data_pivot$bulan <- as.POSIXct(data_pivot$bulan)
data_pivot
## Simple feature collection with 456 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 110.8984 ymin: -8.780764 xmax: 116.2634 ymax: -5.527337
## CRS:           NA
## # A tibble: 456 x 7
##    KODE_KAB NAMA_KAB   lat  long                                                    geometry bulan                nilai
##  *    <int> <chr>    <dbl> <dbl>                                              <MULTIPOLYGON> <dttm>               <dbl>
##  1     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-01-01 00:00:00 7.43e1
##  2     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-02-01 00:00:00 7.71e3
##  3     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-03-01 00:00:00 1.67e4
##  4     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-04-01 00:00:00 6.73e3
##  5     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-05-01 00:00:00 2.92e3
##  6     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-06-01 00:00:00 5.09e3
##  7     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-07-01 00:00:00 6.62e3
##  8     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-08-01 00:00:00 1.62e3
##  9     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-09-01 00:00:00 1.23e3
## 10     3501 Pacitan  -8.18  111. (((111.3961 -8.193083, 111.3974 -8.196307, 111.3999 -8.199~ 2022-10-01 00:00:00 1.03e3
## # ... with 446 more rows
# Last step, the spatial visualization is ready to be served
library(gganimate)

x <- ggplot(data_pivot, aes(x = long, y = lat, fill = nilai)) +
  geom_sf() + scale_fill_continuous(low = "#b7ffc2", high = "#00812a") +
  ggtitle('Rice Production in Jawa Timur during January to December 2022',
          subtitle = 'source: jatim.bps.go.id | visualization: jokoding.com') +
  transition_states(states = bulan,
                    transition_length = 2, state_length = 1)
animate(x, height = 7,
        width = 9, units = "in", res = 150)
## 
Rendering [===================================================================================] at 1.1 fps ~ eta:  0s

plot of chunk unnamed-chunk-28

I think enough for this sharing, feel free to command and share this post as your support this site. Welcome to understanding and practicing!

Tidak ada komentar:

Diberdayakan oleh Blogger.