The influence of the early 2020s crises on the .cz domain
ADAM report 2/2025
To better understand how the .cz domain fared in face of the challenges that emerged during the first half of the 2020s, I investigated whether the spread of COVID-19, inflation rate, and the war in Ukraine associated with the count of second-level domains under .cz, the count of Czech holders of .cz domains, and traffic under .cz. The analysis shows that all of these were diversely related to the changes in the transmission of COVID-19 and the rise and fall of the inflation rate in the Czech Republic. Domain and holder counts decreased during the first COVID-19 wave but increased during the second wave. Domain counts also decreased when the inflation rate was high. Traffic increased during both COVID-19 waves.
1 Introduction
The COVID-19 pandemic, increased inflation rate, and the war in Ukraine have influenced many aspects of life in the early 2020s. The goal of this report was to investigate whether any of these crises also related to trends within the .cz domain. Can we observe some associations between monthly counts of domains, counts of holders, and average traffic on one side and the spread of COVID-19, the rise and fall of inflation, and the Russo-Ukrainian war on the other?
COVID-19. One popular notion claims that the COVID-19 pandemic lock-downs forced much of social life into the online space (Zanella et al. 2024). People increasingly turned to the internet for the purposes of work, education, social life, and entertainment (Feldmann et al. 2021; Priyadarshini et al. 2022). Then, if businesses and individuals opted to move their operations and interests online, such increased demand for online activities and presence might have influenced the count of domains in the .cz TLD. Already in April 2020, an ADAM report (Andziński et al. 2020) observed an increase in domain registrations connected to the pandemic, although it focused on second-level domains directly related to the COVID-19 pandemic. Here, I focus on the count of all second level domains as the pandemic probably did not prompt an increase in creation of domains exclusively related to the COVID-19 pandemic (and because more time has passed since, allowing for such examination). After all, many actors might have been outright forced by the COVID-19 to go online with their interests. Supporting this assertion, we can observe that after a period of saturation and stagnation (2017–2019), the count of second-level domains in the .cz TLD started growing once again when the COVID-19 pandemic raged most prominently (in the period between 2020 and 2022). After 2022, the domain count seems to stagnate once again.
Beside the domain count, the count of Czech holders of .cz domains shows a larger increase in 2020 than the year-on-year increases observed between 2017 and 2019.1 Additionally, traffic kept on rising throughout the years (see Figure 3), providing us with yet another perspective for the inquiry on whether the COVID-19 pandemic influenced trends in the .cz domain. Taken together, we may hypothesize that the spread of COVID-19 influenced the count of domains, the count of Czech domain holders, and traffic; namely:
H1: Transmission of COVID-19 positively associates with the domain count.
H2: Transmission of COVID-19 positively associates with the count of Czech holders.
H3: Transmission of COVID-19 positively associates with QPS.
Inflation. Global supply chains problems, increases and volatility in energy and commodity prices, the Russo-Ukrainian war, increased demand, fiscal policies reacting to the COVID-19 pandemic, and companies passing the emergent price shocks to their customers all resulted in increased inflation (Komárek et al. 2024), which affected the Czech Republic rather harshly when compared to other European states (Czech National Bank 2022). When looking into the monthly year-on-year inflation rates, we can observe an increasing trend since summer 2021, peaking in summer 2022, and then decreasing throughout the whole year 2023, until reaching pre-2019 rates in 2024 (see Figure 5). Such increased consumer prices surely created a pressure to reevaluate resource allocation among individuals and businesses. In turn, this pressure could have lead to a diminished interest in creating new domains or renewing already registered domains, hypothetically reducing the counts of both domains and holders (As for the traffic, it is less clear whether the inflation should be envisioned to have influenced its values; however, it is worth to explore this possibility too, without stating the direction of the proposed association). This is, of course, an assertion that tries to paint a narrative for how inflation might have influenced the said variables. The price of a single .cz domain is not that high to necessarily burden most budgets. Nevertheless, it is not easy to discern whether inflation itself might be the culprit or if it “merely” coincides with other trends and processes which could have been “actually” responsible for such proposed associations. Even so, it is worth investigating the inflation rate because it can provide us with a useful proxy, describing a period of socio-economic hardship and its influence on the .cz domain. Then, with such a grain of salt, it seems worthwhile exploring hypotheses which propose the following.
H4: Inflation negatively associates with the domain count.
H5: Inflation negatively associates with the count of Czech domain holders.
H6: Inflation associates with QPS.
Russo-Ukrainian war. Lastly, although already mentioned in the previous paragraph as one of the causes of increased inflation, it might prove interesting to consider whether the Russo-Ukrainian war2 has manifested some influence on the .cz domain. In comparison to the already resolved inflation crisis, the conflict is still ongoing and may exert some influence of its own on the .cz domain. Beside causing geopolitical instability and energy crises, the Russian aggression also brought a large wave of Ukrainian war-refugees seeking asylum in the Czech Republic and an information war largely pronounced in the online space. Consequently, new domains dedicated either to help the Ukrainian war refugees or to the dissemination of information supporting either of the sides and their coalitionary partners could have emerged, along with new holders and increased QPS. Another ADAM report (Quiros Segovia, Andziński, and Helebrant 2022) found that a small increase in conflict-related domains was observed around the time when the invasion began. However, whether a similar increase can be estimated over a longer period and on the total domain count remains an open question. Therefore, we can hypothesize that:
H7: The war in Ukraine positively associates with the domain count.
H8: The war in Ukraine positively associates with the count of Czech domain holders.
H9: The war in Ukraine positively associates with QPS.
Of course, the temporal dimension is important for such an analysis because the count of domains, holders, and queries naturally change over time and are determined by their own preceding historical values. Therefore, I also focused on overall temporal trends and seasonal patterns which are known to associate with domain counts (see Quiros Segovia and Řezníček 2025).
Note that most charts (or at least the focal ones) in this report are interactive—they can be zoomed, filtered (by clicking or double clicking items in the legend), and hovering the mouse cursor over datapoints reveals further information.
Sections marked by “▶ Code” can be expanded to reveal the actual R code (R Core Team 2024) used for producing the statistics, graphs, and tables.
To keep the text lighter, additional information is often collapsed within callout blocks (such as this one) which can be expanded (or collapsed) by clicking on their headers.
Code
library(dplyr)
library(readr)
library(ggplot2)
library(plotly)
library(lubridate)
library(tidyr)
library(knitr)
library(kableExtra)
library(formattable)
library(jsonlite)
library(mgcv)
library(gratia)
library(forecast)
library(forcats)
library(colorspace)
library(countrycode)
library(ggthemes)
library(scales)
library(performance)
library(purrr)
library(GGally)
library(marginaleffects)
theme_set(theme_minimal())
Code
<- "https://stats.nic.cz"
root_endpoint
<- function(endpoint, ...) {
api_endpoint <- list(...)
pars if (length(pars) == 0) {
return(paste0(root_endpoint, endpoint))
}<- mapply(function(x, y) paste(x, URLencode(toString(y)), sep = "="),
pstr names(pars), pars)
paste0(root_endpoint, endpoint, "?", paste(pstr, collapse = "&"))
}
<- function(url, token=Sys.getenv("API_TOKEN")) {
fromJSONwithToken <- curl::new_handle()
curl_handle if (token != "") {
::handle_setheaders(curl_handle,
curl"Authorization" = paste("Bearer", token)
)
}<- curl::curl_fetch_memory(url, handle = curl_handle)
request return(jsonlite::fromJSON(rawToChar(request$content)))
}
2 Data
All data used in this report are available as .csv files in this report’s repository and also shown entirely in a callout block in Section 2.7.
2.1 Domains
Data on the domain count were obtained using the /fred_domains endpoint from the ADAM project’s Postgres database and operationalized as domain counts at the end of each month.
Figure 1 illustrates domain count’s overall trend, also discerning some seasonal patterns within respective years—beside the saturation of domain counts (left), we can observe the typical double-peaked seasonal pattern (right).
Code
#data imported and prepared
<- fromJSON(
df_domains api_endpoint("/fred_domains")
|>
) filter(zone == "cz") |>
mutate(
year = year(ts),
month = month(ts),
day = day(ts),
ts = as.Date(ts)
|>
) group_by(year, month) |>
filter(day == max(day)) |>
select(ts, year, month, domains) |>
ungroup() |>
filter(year >= 2020) |>
filter(ts < "2024-10-01")
#and saved as a csv
write_csv(df_domains, file = "domains.csv")
Code
<- read_csv("domains.csv", show_col_types = FALSE) df_domains
Code
plot_ly(
data = df_domains,
type = "scatter",
mode = "lines",
x = ~ts,
y = ~domains
|>
) layout(
xaxis = list(title = "Date"),
yaxis = list(title = "Domain count")
)plot_ly(
data = df_domains,
type = "scatter",
mode = "lines",
x = ~month,
y = ~domains,
color = ~as.factor(year)
|>
) layout(
xaxis = list(title = "Month"),
yaxis = list(title = "Domain count")
)
2.2 Czech holders
Data on the domain holders were obtained using the /fred_holders_by_cc endpoint. Again, the values were operationalized as counts at the end of each month and only Czech holders were kept in the analysis. Figure 2 illustrates the overall trend in holder count, also discerning the seasonal patterns within respective years. While the count of holders does not seem to saturate as the count of domains does, a similar seasonal pattern is present, although little less pronounced than in the domain count.
Code
#data imported and prepared
<- fromJSON(
df_holders api_endpoint("/fred_holders_by_cc")
|>
) filter(zone == "cz" & cc == "CZ") |>
mutate(
year = year(ts),
month = month(ts),
day = day(ts),
ts = as.Date(ts)
|>
) group_by(year, month) |>
filter(day == max(day)) |>
select(ts, year, month, holders) |>
ungroup() |>
filter(year >= 2020) |>
filter(ts < "2024-10-01")
#and saved as a csv
write_csv(df_holders, file = "holders.csv")
Code
<- read_csv("holders.csv", show_col_types = FALSE) df_holders
Code
plot_ly(
data = df_holders,
type = "scatter",
mode = "lines",
x = ~ts,
y = ~holders
|>
) layout(
xaxis = list(title = "Date"),
yaxis = list(title = "CZ holder count")
)plot_ly(
data = df_holders,
type = "scatter",
mode = "lines",
x = ~month,
y = ~holders,
color = ~as.factor(year)
|>
) layout(
xaxis = list(title = "Month"),
yaxis = list(title = "CZ holder count")
)
2.3 Traffic
As for the last response variable modeled in this report, traffic data were obtained using the /cz_qps_total_1h endpoint and summarized into monthly mean QPS values. The data begin in September 2020, getting rid of some non-reliable observations. Figure 3 illustrates the overall monthly trend in traffic, also discerning the seasonal patterns within respective years. The trend shows slightly accelerating increase over time. The seasonal pattern is not as clear as for the domain and holder counts but seems to exhibit some sort of summer depression (although not in 2024).
Code
#data imported and prepared
<- fromJSON(
df_qps api_endpoint("/cz_qps_total_1h")
|>
) mutate(
year = year(ts),
month = month(ts),
ts = as.Date(ts)
|>
) group_by(year, month) |>
reframe(
ts = max(ts),
year = year,
month = month,
qps = mean(qps)
|>
) ungroup() |>
unique() |>
filter(ts >= "2020-09-01") |>
filter(ts < "2024-10-01")
#and saved as a csv
write_csv(df_qps, file = "qps.csv")
Code
<- read_csv("qps.csv", show_col_types = FALSE) df_qps
Code
plot_ly(
data = df_qps,
type = "scatter",
mode = "lines",
x = ~ts,
y = ~qps
|>
) layout(
xaxis = list(title = "Date"),
yaxis = list(title = "QPS")
)plot_ly(
data = df_qps,
type = "scatter",
mode = "lines",
x = ~month,
y = ~qps,
color = ~as.factor(year)
|>
) layout(
xaxis = list(title = "Month"),
yaxis = list(title = "QPS")
)
2.4 COVID-19
For the COVID-19, data by Our World in Data were used. I considered the count of new cases, count of new deaths, and the stringency index as predictor variables, opting for the new cases as they exhibited most fluent seasonality patterns of the three (see Figure 4). Sub-figure a clearly shows two massive spikes of new COVID-19 cases in the winter of 2021 and 2022, and sub-figure b illustrates how this trend behaved seasonally, peaking in January and February (in 2021 and 2022) and gaining on strength since September (in 2020 and 2021) after a seasonal decline during summer.
Code
<-
df_covid read_csv("cz_covid.csv") |>
select(date,
stringency_index,
new_cases,
new_deaths,|>
) mutate(
year = year(date),
month = month(date),
ts = as.Date(date)
|>
) group_by(year, month) |>
reframe(
ts = max(ts),
max_stringency = max(stringency_index),
new_cases = sum(new_cases),
new_deaths = sum(new_deaths)
|>
) ungroup() |>
unique()
Code
plot_ly(
data = df_covid,
type = "scatter",
mode = "lines",
x = ~ts,
y = ~new_cases
|>
) layout(
xaxis = list(title = "Date"),
yaxis = list(title = "New cases per month")
)plot_ly(
data = df_covid,
type = "scatter",
mode = "lines",
x = ~month,
y = ~new_cases,
color = ~as.factor(year)
|>
) layout(
xaxis = list(title = "Month"),
yaxis = list(title = "New cases per month")
)
2.5 Inflation
For the inflation rate, data by the Czech Statistical Office were used. I opted for the increase in CPI compared with the corresponding month of a preceding year metric as it indicates a percentage change in the price level between the reference month of a given year and the corresponding month of a preceding year. Figure 5 shows inflation’s trend (and the irrelevance of seasonality).
Code
<-
df_inflation read_csv("inflation.csv") |>
pivot_longer(
!year, names_to = "month", values_to = "inflation"
|>
) filter(year > 2018) |>
mutate(
ts = as.Date(paste0(year, "-", month, "-15")),
year = as.integer(year),
month = as.integer(month)
|>
) arrange(year, month)
Code
plot_ly(
data = df_inflation,
type = "scatter",
mode = "lines",
x = ~ts,
y = ~inflation
|>
) layout(
xaxis = list(title = "Date"),
yaxis = list(title = "Inflation rate")
)plot_ly(
data = df_inflation,
type = "scatter",
mode = "lines",
x = ~month,
y = ~inflation,
color = ~as.factor(year)
|>
) layout(
xaxis = list(title = "Month"),
yaxis = list(title = "Inflation rate")
)
2.6 Russo-Ukrainian war
Lastly, the Russo-Ukrainian war was specified as a factor variable with values “Peace” (Until February 23th 2022) and “War” (since February 24th 2022).
Code
<- df_qps |>
df_crises left_join(df_domains, by = c("year", "month")) |>
left_join(df_holders, by = c("year", "month")) |>
left_join(df_covid, by = c("year", "month")) |>
left_join(df_inflation, by = c("year", "month")) |>
select(!c(ts.x, ts.y, ts.x.x, ts.y.y)) #drop duplicates
Code
<- df_crises |>
df_crises mutate(
ukraine =
case_when(ts < "2022-02-24" ~ "Peace",
> "2022-02-24" ~ "War"),
ts ukraine = as.factor(ukraine),
year_f = as.factor(year),
max_stringency =
if_else(is.na(max_stringency), 0, max_stringency),
new_cases =
if_else(is.na(new_cases), 0, new_cases),
new_deaths =
if_else(is.na(new_deaths), 0, new_deaths),
time = seq(from = 1,
to = nrow(df_crises),
by = 1),
new_cases = new_cases/1000 #turn into thousands
|>
) ungroup()
2.7 The dataset
The entire dataset is available for inspection in the callout block below. Note that new cases of COVID-19 were recalculated into thousands.
Code
|>
df_crises mutate(date = paste0(year, "-", month)) |>
select(date, time,
domains,
holders,
qps,
new_cases,
inflation,|>
ukraine) kable(
"html",
digits = 2,
col.names = c(
"Date", "Time",
"Domains",
"CZ holders",
"QPS",
"New COVID cases",
"Inflation",
"War in Ukraine"
) )
Date | Time | Domains | CZ holders | QPS | New COVID cases | Inflation | War in Ukraine |
---|---|---|---|---|---|---|---|
2020-9 | 1 | 1368781 | 654791 | 14894.78 | 39.25 | 3.2 | Peace |
2020-10 | 2 | 1364093 | 654611 | 13928.01 | 187.94 | 2.9 | Peace |
2020-11 | 3 | 1369032 | 656968 | 14671.87 | 269.55 | 2.7 | Peace |
2020-12 | 4 | 1370804 | 658071 | 15398.89 | 154.10 | 2.3 | Peace |
2021-1 | 5 | 1378195 | 661960 | 16585.94 | 318.62 | 2.2 | Peace |
2021-2 | 6 | 1387674 | 666320 | 16859.40 | 254.11 | 2.1 | Peace |
2021-3 | 7 | 1395295 | 669351 | 16143.66 | 284.34 | 2.3 | Peace |
2021-4 | 8 | 1399309 | 670642 | 14679.03 | 106.44 | 3.1 | Peace |
2021-5 | 9 | 1401016 | 670961 | 15334.59 | 42.92 | 2.9 | Peace |
2021-6 | 10 | 1401323 | 670946 | 15342.71 | 5.71 | 2.8 | Peace |
2021-7 | 11 | 1400490 | 670716 | 14853.23 | 5.57 | 3.4 | Peace |
2021-8 | 12 | 1403622 | 671094 | 15348.25 | 6.54 | 4.1 | Peace |
2021-9 | 13 | 1408476 | 672685 | 15507.51 | 10.86 | 4.9 | Peace |
2021-10 | 14 | 1413755 | 674090 | 15915.06 | 75.39 | 5.8 | Peace |
2021-11 | 15 | 1420857 | 675185 | 15788.99 | 369.83 | 6.0 | Peace |
2021-12 | 16 | 1424131 | 675342 | 15405.15 | 331.06 | 6.6 | Peace |
2022-1 | 17 | 1432454 | 676575 | 16097.25 | 617.10 | 9.9 | Peace |
2022-2 | 18 | 1441347 | 679048 | 15724.20 | 686.04 | 11.1 | Peace |
2022-3 | 19 | 1444213 | 679567 | 15897.77 | 253.42 | 12.7 | War |
2022-4 | 20 | 1442541 | 675817 | 15838.66 | 127.79 | 14.2 | War |
2022-5 | 21 | 1439607 | 674849 | 16318.36 | 28.96 | 16.0 | War |
2022-6 | 22 | 1438268 | 674863 | 16551.80 | 11.57 | 17.2 | War |
2022-7 | 23 | 1437182 | 674511 | 15555.57 | 75.01 | 17.5 | War |
2022-8 | 24 | 1440667 | 675363 | 17635.59 | 72.11 | 17.2 | War |
2022-9 | 25 | 1445559 | 677033 | 17168.33 | 81.06 | 18.0 | War |
2022-10 | 26 | 1452935 | 678306 | 17176.03 | 94.90 | 15.1 | War |
2022-11 | 27 | 1462843 | 679512 | 17342.93 | 21.35 | 16.2 | War |
2022-12 | 28 | 1463116 | 679025 | 17265.51 | 21.78 | 15.8 | War |
2023-1 | 29 | 1463084 | 680624 | 17330.86 | 11.38 | 17.5 | War |
2023-2 | 30 | 1470108 | 683378 | 17950.18 | 19.24 | 16.7 | War |
2023-3 | 31 | 1473706 | 684943 | 18565.98 | 22.70 | 15.0 | War |
2023-4 | 32 | 1468610 | 684686 | 19183.52 | 9.57 | 12.7 | War |
2023-5 | 33 | 1466240 | 684678 | 18901.38 | 1.64 | 11.1 | War |
2023-6 | 34 | 1467104 | 684544 | 18791.62 | 0.68 | 9.7 | War |
2023-7 | 35 | 1466822 | 683683 | 17519.86 | 0.34 | 8.8 | War |
2023-8 | 36 | 1468764 | 684997 | 18867.83 | 0.98 | 8.5 | War |
2023-9 | 37 | 1471167 | 686788 | 18599.48 | 5.38 | 6.9 | War |
2023-10 | 38 | 1472772 | 688821 | 19236.23 | 16.18 | 8.5 | War |
2023-11 | 39 | 1474194 | 690459 | 20969.48 | 24.73 | 7.3 | War |
2023-12 | 40 | 1468788 | 690326 | 19866.98 | 56.28 | 6.9 | War |
2024-1 | 41 | 1467715 | 692502 | 20806.68 | 9.49 | 2.3 | War |
2024-2 | 42 | 1471419 | 695908 | 20948.74 | 2.18 | 2.0 | War |
2024-3 | 43 | 1472117 | 696938 | 20194.06 | 0.66 | 2.0 | War |
2024-4 | 44 | 1469049 | 697046 | 19044.95 | 0.27 | 2.9 | War |
2024-5 | 45 | 1467388 | 697560 | 20573.34 | 0.27 | 2.6 | War |
2024-6 | 46 | 1465440 | 697782 | 21473.08 | 0.00 | 2.0 | War |
2024-7 | 47 | 1464980 | 698143 | 21833.11 | 0.00 | 2.2 | War |
2024-8 | 48 | 1465976 | 699412 | 21685.01 | 0.00 | 2.2 | War |
2024-9 | 49 | 1468911 | 701506 | 21624.75 | 0.00 | 2.6 | War |
3 Models
In this section, models estimating associations with domain counts are presented first (Section 3.1), followed by models focused on holder counts (Section 3.2), and then models focused on the traffic (Section 3.3). For all, the same modelling approach was used, although some models are hidden in collapsed callout blocks (available for the keen readers). The domain counts were modeled using the negative-binomial distribution, the domain holders with the Poisson distribution, and the traffic with the normal distribution. All models were fit using the mgcv
package’s gamm
function which provides tools for fitting generalized additive mixed models (“Mgcv: Mixed GAM Computation Vehicle with Automatic Smoothness Estimation” 2000, ver 1.9-1; Wood 2017; Simpson 2018a, 2018b; Pedersen et al. 2019) (see callout block below).
Generalized additive models (GAMs) are often portrayed to be situated in a middle ground between interpretable but often inflexible linear models and flexible but black-boxish machine learning models. GAMs can be used to model nonlinear relationships (overcoming limits of linear models) while still providing inferential statistics and explanatory insights (avoiding the black-box nature of predictions made by machine learning models).
To capture these non-linear relationships, GAMs use smooth functions which are functions that are composed of smaller basis functions. While the smaller basis functions capture smaller fractions of the relationships, they add up into the bigger smooth function, which is in turn able to describe nonlinear relationships between the variables. In effect, the associations estimated by GAMs wiggle as the size of the relationship between variables need not be linear.
Additionally, generalized additive mixed models (GAMMs) offer further modeling possibilities. In this report, it is the specification of correlation structures which helps account for residual autocorrelations in the data that has not been accounted for by the smooths.
For an introduction on GAMs, an interactive course by Noam Ross or an introductory text by Michael Clark are recommended. Furthermore, introductory lectures by Noam Ross and Gavin Simpson are also freely available.
3.1 Domains
To set some initial model specification, gamm_domains_1
estimates the domain count by a seasonal pattern (months within a year), an overall monthly trend, (thousands of) new COVID-19 cases, the rate of inflation, and whether the Russo-Ukrainian war was ongoing. I also specified a varying intercept for the respective years and a correlation structure accounting for the autocorrelation of observations. However, this initial model does not yet interact the focal predictors with the temporal variables (providing a simpler perspective which sets ground for the following models), therefore, all the results are hidden within the collapsed callout blocks.
Code
<- gamm(
gamm_domains_1 ~
domains s(month, k = 12, bs = "cc") +
s(time) +
s(new_cases) +
s(inflation) +
ukraine,random = list(year_f = ~ 1),
correlation =
corARMA(form = ~ time,
p = 2,
q = 2),
data = df_crises,
family = nb,
method = "REML"
)saveRDS(gamm_domains_1, file = "gamm_domains_1.rds")
Code
<- readRDS(file = "gamm_domains_1.rds") gamm_domains_1
3.1.1 Domains - GAMM 2
The model gamm_domains_2
estimates the domain count by a seasonal pattern (months within a year), an overall monthly trend, (thousands of) new COVID-19 cases, and whether the Russo-Ukrainian war was ongoing. Furthermore, I specified an interaction term between the monthly trend and new COVID-19 cases. I also specified a varying intercept for the respective years and a correlation structure accounting for the autocorrelation of observations. Due to concurvity issues, the inflation rate was dropped from the model (and is modeled separately in gamm_domains_3
below).
Concurvity is a generalization of collinearity to the framework of generalized additive models. Similarly to collinearity issues within the generalized linear models, concurvity describes a computational issue within a generalized additive model when one smooth term can be approximated by other smooth terms. Concurvity is estimated on a range from 0 (no overlap between the smooths) to 1 (complete overlap between the smooth functions). As stated by Simon Wood in the mgcv
documentation, concurvity often becomes an issue when “… a smooth of space is included in a model, along with smooths of other covariates that also vary more or less smoothly in space. Similarly it tends to be an issue in models including a smooth of time, along with smooths of other time varying covariates. Concurvity can be viewed as a generalization of co-linearity, and causes similar problems of interpretation. It can also make estimates somewhat unstable (so that they become sensitive to apparently innocuous modelling details, for example).”
Code
<- gamm(
gamm_domains_2 ~
domains s(month, k = 12, bs = "cc") +
ti(time) +
ti(new_cases) +
ti(time, new_cases, k = c(5, 5)) +
ukraine,random = list(year_f = ~ 1),
correlation =
corARMA(form = ~ time,
p = 4,
q = 0),
data = df_crises,
family = nb,
method = "REML"
)saveRDS(gamm_domains_2, file = "gamm_domains_2.rds")
Code
<- readRDS(file = "gamm_domains_2.rds") gamm_domains_2
Code
<- draw(
fig $gam, residuals = TRUE, select = 1)
gamm_domains_2ggplotly(fig)
<- draw(
fig $gam, residuals = TRUE, select = 2)
gamm_domains_2ggplotly(fig) |>
layout(
xaxis = list(tickvals = ~ ticks$time,
ticktext = ~ ticks$ticktext)
)<- draw(
fig $gam, residuals = TRUE, select = 3)
gamm_domains_2ggplotly(fig)
We can observe the typical double-peaked seasonal pattern and the monthly temporal trend plateauing since mid-2023. The smooth for new cases was significant, estimating a diminishing positive association with the domain count below some 97 thousand new cases and an increasingly negative one above. The term for the the Russo-Ukrainian war was insignificant. The newly introduced interaction term was significant and is plotted with more detail below in Figure 9.
Code
#' Get CIs excluding zero and paste positive and negative associations to new columns to facilitate plotting of interaction terms fitted by mcgv GAMs.
#'
#' Works only for interaction terms as it is useful to highlight such areas when building an graph for smooth interaction terms.
#'
#' @param df is the df returned by the gratia__smooth_estimates(), already containing CIs added with add_confint().
#' @param term is either "s", "te", or "ti" term used to specify the interaction in the GAM.
#' @param var1 is the first predictor in the interaction, e.g. "month".
#' @param var2 is the second predictor in the interaction, e.g. "new_cases".
<- function(df, term, var1, var2){
get_ci_areas |>
df mutate(
var1_l_ci_neg = case_when(
== paste0(term, "(", var1, ",", var2, ")") &
.smooth < 0 &
.lower_ci < 0
.upper_ci ~ .lower_ci),
var1_u_ci_neg = case_when(
== paste0(term, "(", var1, ",", var2, ")") &
.smooth < 0 &
.lower_ci < 0
.upper_ci ~ .upper_ci),
var1_sig_neg = case_when(
!is.na(var1_l_ci_neg) &
!is.na(var1_u_ci_neg)
~ eval(parse(text = var1))
),var2_sig_neg = case_when(
!is.na(var1_l_ci_neg) &
!is.na(var1_u_ci_neg)
~ eval(parse(text = var2))
),var1_l_ci_pos = case_when(
== paste0(term, "(", var1, ",", var2, ")") &
.smooth > 0 &
.lower_ci > 0
.upper_ci ~ .lower_ci),
var1_u_ci_pos = case_when(
== paste0(term, "(", var1, ",", var2, ")") &
.smooth > 0 &
.lower_ci > 0
.upper_ci ~ .upper_ci),
var1_sig_pos = case_when(
!is.na(var1_l_ci_pos) &
!is.na(var1_u_ci_pos)
~ eval(parse(text = var1))
),var2_sig_pos = case_when(
!is.na(var1_l_ci_pos) &
!is.na(var1_u_ci_pos)
~ eval(parse(text = var2))
)
) }
Code
<-
gam_smooth smooth_estimates(gamm_domains_2$gam) |>
add_confint() |>
filter(.smooth == "ti(time,new_cases)")
<- get_ci_areas(gam_smooth, "ti", "time", "new_cases") gam_smooth
Code
<- df_crises |>
ticks select(time, year, month) |>
mutate(
ticktext = paste0(year, "-", month)
|>
) slice(which(row_number() %% 10 == 1))
plot_ly(
data = gam_smooth,
type = "contour",
x = ~time,
y = ~new_cases,
z = ~.estimate,
contours = list(
coloring = "heatmap",
showlabels = TRUE
),colors = "RdBu",
reversescale = TRUE,
lines = list(color = "black"),
hoverinfo = "text",
text = paste0(
"<b>New cases</b>: ",
round(gam_smooth$new_cases, 1),
"<br>",
"<b>Estimate</b>: ",
round(gam_smooth$.estimate, 4),
"<br>",
"<b>95% CI</b> (",
round(gam_smooth$.lower_ci, 4),
", ",
round(gam_smooth$.upper_ci, 4),
")"
)|>
) add_trace(
name = "Historical",
x = ~ df_crises$time,
y = ~ df_crises$new_cases,
type = "scatter",
mode = "lines",
#marker = list(color = "black"),
line = list(color = "black"),
hoverinfo = "text",
text = paste0(
"<b>Month</b>: ",
round(df_crises$month, 0),
" (", df_crises$year, ")",
"<br>",
"<b>New cases</b>: ",
round(df_crises$new_cases, 1)
)|>
) add_trace(
name = "Positive CI",
type = "scatter",
mode = "markers",
x = ~ gam_smooth$var1_sig_pos,
y = ~ gam_smooth$var2_sig_pos,
marker = list(color = "black",
opacity = 0.1,
symbol = "triangle-up-open"),
showlegend = TRUE#,
#visible = "legendonly"
|>
) add_trace(
name = "Negative CI",
type = "scatter",
mode = "markers",
x = ~ gam_smooth$var1_sig_neg,
y = ~ gam_smooth$var2_sig_neg,
marker = list(color = "black",
opacity = 0.1,
symbol = "triangle-down-open"),
showlegend = TRUE#,
#visible = "legendonly"
|>
) layout(
title = "Associations with domain count",
xaxis = list(title = "Time",
tickvals = ~ ticks$time,
ticktext = ~ ticks$ticktext),
yaxis = list(title = "New cases (thousands)")
|>
) colorbar(title = "Estimate",
limits = c(-max(abs(gam_smooth$.estimate)),
max(abs(gam_smooth$.estimate))))
The interaction’s association was estimated positive around the peak of the pandemic (and also later since October 2022 but mostly far from the historically observed new COVID-19 cases denoted by the black line). A triangle-like area of negative association was estimated between some 70 thousand and 325 thousand new cases between September 2020 and December 2021 (and another top-right area far from any observations).
It is true that this report contains many charts describing results of many models. This can be overwhelming and hard to make sense of. However, when reporting the results of the models, I always follow the same approach by which I try to describe (A) diagnostics (Did the model fit well?), (B) partial smooth effects (Are there associations? How large?), and (C) predictions (What values can we predict from the estimated associations?).
Collapsed in the Results sections, the first four charts describe how well were the models fit to the data. These model diagnostic plots are not of primary interest but it makes sense to be open about them for those interested. Furthermore, I also report correlograms (ACF and partial ACF) which show autocorrelation of residuals. In time series models it is important to check whether the residuals are significantly correlated (which they should not be) to ensure that the errors in the model predictions do not exhibit strong patterns of dependence on each other. Put simply, the spikes in these charts should not cross the significance levels denoted by the blue dotted lines. If the spikes remain below these levels, the residuals are relatively independent, which suggests that the model captures the underlying data structure adequately. In most models in this report, autocorrelations were observed; therefore I specified correlation structures which usually resolved these depedencies sufficiently.
I always plot smooths estimated by the GAMMs (e.g., Figure 6). These are partial effects on the link scale (the domain and holder GAMMs are fitted using the negative binomial and Poisson distributions and use a log-link; the traffic GAMMs are fitted using the Gaussian distribution and therefore do not need any transformation when fitting). These plots show the individual component effect of a smooth function on the link scale, conditional on all other terms in the model being set to zero. These plots, then, help us to make interpretations on whether and how the predictors associate with the response variable. Below zero, the association is negative (i.e., it decreases the value of the response); above zero, the association is positive. Keep in mind that in GAMMs, the size (or we can say “strength”) of these associations is not constant. Instead, GAMMs allow for these association to vary—wiggle—depending on the values of the predictor variables. In other words, the size of the association might vary as the value of the predictor changes. Note that I also plot the interaction terms in further detail (e.g., Figure 9) as they are focal in this report.
Finally, I always plot predictions fitted from the smooths using the historically observed data. Note that I predict these values (a) from all terms included in the GAMMs, and also (b) all terms but the interaction terms. Plotting both (a) and (b) predictions allows us to inspect how the interaction terms add to the predictions as we can see the difference between these predictions. Note that I predict only from the historically observed values of the predictors as I’m interested in how well the models captured these data. In the end, these predictions provide us with practical values—not as often unintelligible partial effects (part B) which may be on a log scale and do not translate linearly to the original scale, but as, for example, predicted domain counts.
To restate, the procedure is to answer the following questions: (A) Has the model worked properly? (B) Has the model found any associations? (C) How should we understand the associations in comprehensible, practical values?
In Figure 10 below, we can inspect a time series showing predictions made by the model and compare them to the historically observed domain counts. Importantly, the plot contains two traces for the predictions—one that represents predictions made from all the terms in the model (“All”) and one excluding the interaction term (“No interaction”). This allows us to inspect the practical impact of the associations described in Figure 9 above as the plot shows how much the domain count predictions change once we take the interaction term into account. We can see that the interaction term improves predictions during the COVID-19 peaks (i.e., the predictions are closer to the historical observations than when we exclude the interaction term) as the model predicts less domain during the first wave and more domains during the second wave when the interaction is included. For example, in February 2022 (peak of the second COVID-19 wave), including the interaction term allows the model to predict 7 600 more domains than when excluded, providing a better prediction.
Code
<- fitted_values(
fitted_noint
gamm_domains_2,data = df_crises,
terms = c(
"(Intercept)",
"ti(new_cases)",
"ti(time)",
"s(month)",
"ukraine"
)
)<- fitted_values(
fitted_int
gamm_domains_2,data = df_crises,
terms = c(
"(Intercept)",
"ti(new_cases)",
"ti(time)",
"ti(time,new_cases)",
"s(month)",
"ukraine"
)
)
plot_ly(
name = "No interaction",
data = fitted_noint,
x = ~ts,
y = ~.fitted,
type = "scatter",
mode = "lines",
line = list(color = "#fc8d62",
dash = "dot"),
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_nointformat(fitted_noint$.fitted, big.mark = " ", scientific = FALSE),
" domains", "<br>",
"95% CI [",
format(fitted_noint$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_noint$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_ribbons(
name = "No interaction CI",
x = ~ts,
ymin = fitted_noint$.lower_ci,
ymax = fitted_noint$.upper_ci,
line = list(color = "#fc8d62"),
fillcolor = "#fc8d62",
opacity = 0.05,
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_noint"95% CI [",
format(fitted_noint$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_noint$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_trace(
name = "All",
x = ~ts,
y = ~fitted_int$.fitted,
line = list(color = "#8da0cb",
dash = "dot"),
hovertemplate = paste0(
$year, "-", fitted_int$month, "<br>",
fitted_nointformat(fitted_int$.fitted, big.mark = " ", scientific = FALSE),
" domains", "<br>",
"95% CI [",
format(fitted_int$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_int$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_ribbons(
name = "All CI",
x = ~ts,
ymin = fitted_int$.lower_ci,
ymax = fitted_int$.upper_ci,
line = list(color = "#8da0cb"),
fillcolor = "#8da0cb",
opacity = 0.05,
hovertemplate = paste0(
$year, "-", fitted_int$month, "<br>",
fitted_noint"95% CI [",
format(fitted_int$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_int$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_trace(
name = "Historical",
x = ~ts,
y = ~domains,
line = list(color = "#66c2a5",
dash = "full"),
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_nointformat(fitted_noint$domains, big.mark = " ", scientific = FALSE), " domains"
)|>
) layout(
xaxis = list(
title = "Date"
),yaxis = list(
title = "Domains"
) )
3.1.2 Domains - GAMM 3
In gamm_domains_3
, I dropped the focus on the new COVID-19 cases and instead specified an interaction term between the inflation rate and the monthly temporal trend.
Code
<- gamm(
gamm_domains_3 ~
domains s(month, k = 12, bs = "cc") +
ti(time) +
ti(inflation) +
ti(time, inflation, k = c(10, 5)) +
ukraine,random = list(year_f = ~ 1),
correlation =
corARMA(form = ~ time,
p = 1,
q = 0),
data = df_crises,
family = nb,
method = "REML"
)saveRDS(gamm_domains_3, file = "gamm_domains_3.rds")
Code
<- readRDS(file = "gamm_domains_3.rds") gamm_domains_3
Code
<- draw(
fig $gam, residuals = TRUE, select = 1)
gamm_domains_3ggplotly(fig)
<- draw(
fig $gam, residuals = TRUE, select = 2)
gamm_domains_3ggplotly(fig) |>
layout(
xaxis = list(tickvals = ~ ticks$time,
ticktext = ~ ticks$ticktext)
)<- draw(
fig $gam, residuals = TRUE, select = 3)
gamm_domains_3ggplotly(fig)
With the exception of the term for the Russo-Ukrainian war, all the terms were significant.
Code
<-
gam_smooth smooth_estimates(gamm_domains_3$gam) |>
add_confint() |>
filter(.smooth == "ti(time,inflation)")
<- get_ci_areas(gam_smooth, "ti", "time", "inflation") gam_smooth
Code
plot_ly(
data = gam_smooth,
type = "contour",
x = ~time,
y = ~inflation,
z = ~.estimate,
contours = list(
coloring = "heatmap",
showlabels = TRUE
),colors = "RdBu",
reversescale = TRUE,
lines = list(color = "black"),
hoverinfo = "text",
text = paste0(
"<b>Inflation</b>: ",
round(gam_smooth$inflation, 1),
"<br>",
"<b>Estimate</b>: ",
round(gam_smooth$.estimate, 4),
"<br>",
"<b>95% CI</b> (",
round(gam_smooth$.lower_ci, 4),
", ",
round(gam_smooth$.upper_ci, 4),
")"
)|>
) add_trace(
name = "Historical",
x = ~ df_crises$time,
y = ~ df_crises$inflation,
type = "scatter",
mode = "lines",
line = list(color = "black"),
hoverinfo = "text",
text = paste0(
"<b>Month</b>: ",
round(df_crises$month, 0),
" (", df_crises$year, ")",
"<br>",
"<b>Inflation</b>: ",
round(df_crises$inflation, 1)
)|>
) add_trace(
name = "Positive CI",
type = "scatter",
mode = "markers",
x = ~ gam_smooth$var1_sig_pos,
y = ~ gam_smooth$var2_sig_pos,
marker = list(color = "black",
opacity = 0.1,
symbol = "triangle-up-open"),
showlegend = TRUE#,
#visible = "legendonly"
|>
) add_trace(
name = "Negative CI",
type = "scatter",
mode = "markers",
x = ~ gam_smooth$var1_sig_neg,
y = ~ gam_smooth$var2_sig_neg,
marker = list(color = "black",
opacity = 0.1,
symbol = "triangle-down-open"),
showlegend = TRUE#,
#visible = "legendonly"
|>
) layout(
title = "Associations with domain count",
xaxis = list(title = "Time",
tickvals = ~ ticks$time,
ticktext = ~ ticks$ticktext),
yaxis = list(title = "Inflation")
|>
) colorbar(title = "Estimate",
limits = c(-max(abs(gam_smooth$.estimate)),
max(abs(gam_smooth$.estimate))))
In Figure 12, we can observe that the model estimates significant negative associations between the domain count and the rate of inflation. The associations are located around the peak of inflation’s rate and at the beginning and the end of the investigated period (The model also estimates some positive associations but the areas seem far from the historical observations).
In Figure 13 below, we can see that including the interaction term seems important for predicting domain counts during the inflation’s highest values (period between March 2022 and June 2023; and also at the start and the end of the investigated period). Indeed, when the interactions term is included, the model predicts 22 027 less domains in July 2022 (as an example) than when excluded, copying the historical observations noticeably better.
Code
<- fitted_values(
fitted_noint
gamm_domains_3,data = df_crises,
terms = c(
"(Intercept)",
"ti(inflation)",
"ti(time)",
"s(month)",
"ukraine"
)
)<- fitted_values(
fitted_int
gamm_domains_3,data = df_crises,
terms = c(
"(Intercept)",
"ti(inflation)",
"ti(time)",
"ti(time,inflation)",
"s(month)",
"ukraine"
)
)
plot_ly(
name = "No interaction",
data = fitted_noint,
x = ~ts,
y = ~.fitted,
type = "scatter",
mode = "lines",
line = list(color = "#fc8d62",
dash = "dot"),
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_nointformat(fitted_noint$.fitted, big.mark = " ", scientific = FALSE),
" domains", "<br>",
"95% CI [",
format(fitted_noint$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_noint$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_ribbons(
name = "No interaction CI",
x = ~ts,
ymin = fitted_noint$.lower_ci,
ymax = fitted_noint$.upper_ci,
line = list(color = "#fc8d62"),
fillcolor = "#fc8d62",
opacity = 0.05,
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_noint"95% CI [",
format(fitted_noint$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_noint$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_trace(
name = "All",
x = ~ts,
y = ~fitted_int$.fitted,
line = list(color = "#8da0cb",
dash = "dot"),
hovertemplate = paste0(
$year, "-", fitted_int$month, "<br>",
fitted_nointformat(fitted_int$.fitted, big.mark = " ", scientific = FALSE),
" domains", "<br>",
"95% CI [",
format(fitted_int$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_int$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_ribbons(
name = "All CI",
x = ~ts,
ymin = fitted_int$.lower_ci,
ymax = fitted_int$.upper_ci,
line = list(color = "#8da0cb"),
fillcolor = "#8da0cb",
opacity = 0.05,
hovertemplate = paste0(
$year, "-", fitted_int$month, "<br>",
fitted_noint"95% CI [",
format(fitted_int$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_int$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_trace(
name = "Historical",
x = ~ts,
y = ~domains,
line = list(color = "#66c2a5",
dash = "full"),
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_nointformat(fitted_noint$domains, big.mark = " ", scientific = FALSE), " domains"
)|>
) layout(
xaxis = list(
title = "Date"
),yaxis = list(
title = "Domains"
) )
Code
AIC(gamm_domains_1$lme,
$lme,
gamm_domains_2$lme) gamm_domains_3
df AIC
gamm_domains_1$lme 15 -469.2533
gamm_domains_2$lme 16 -466.1264
gamm_domains_3$lme 13 -436.1309
Code
BIC(gamm_domains_1$lme,
$lme,
gamm_domains_2$lme) gamm_domains_3
df BIC
gamm_domains_1$lme 15 -440.8760
gamm_domains_2$lme 16 -435.8573
gamm_domains_3$lme 13 -411.5372
3.1.3 Discussion: Domains and 2020s crises
Two hypotheses focused on the domain count (H1 and H4) found support in models which utilized terms interacting the new COVID-19 cases and inflation rate with the monthly trend (see Section 3.1.1 and Section 3.1.2). However, the interactions for the new COVID-19 cases also estimated associations suggesting the opposite. Such divergences require contemplation.
Regarding H1 (Transmission of COVID-19 positively associates with the domain count.), model gamm_domains_2
(Section 3.1.1) estimated both positive and negative associations between the count of domains and the interaction between new COVID-19 cases and the monthly trend. Focusing on the positive association, it suggests that during the peak of the COVID-19 pandemic (winter 2022), there was a significant increase in the number of domains. However, during the first wave, the interaction term estimated a negative relationship, showing that an increase in new COVID-19 cases also decreases the count of domains. Taken together, while the negative association found in the interaction term suggests a decrease in domains specific to the first COVID-19 wave (i.e., slowing down the domain market, as COVID-19 has done for many other areas; see Czech National Bank (2020)), the interaction term also reveals a short-lived increase specific to the second COVID-19 wave. Perhaps, this increase during the second wave of the pandemic might have been a results of necessity as more individuals and companies opted (or were finally forced) to create domains to meet their needs. Alternatively, individuals and companies might have been better prepared to face the challenges posed by the pandemic during the second wave, moving their interests online more proficiently compared to the first wave. To the contrary, the pressure might have not been large enough during the first wave (or the confusion was too large), possibly motivating rather austere and restrained measures as it was not yet known what effects on day-to-day life would the pandemic bring.
Regarding H4 (Inflation negatively associates with the domain count.), the modeling results estimate that there were less domains when inflation rate reached its highest values between 12.7% and 18% in a period between April 2022 and April 2023 (based on the interaction term in gamm_domain_3
, Section 3.1.2). For example, the model predicts 22 027 less domains in July 2022 than when the interaction terms gets excluded from making the predictions. Then, the results suggest that the domain count has indeed suffered a notable decrease during this period of socio-economic hardship.3
As for the H7 (The war in Ukraine positively associates with the domain count.), little support was found in the models. Once interaction terms were introduced to the models, the term remained insignificant and the only time it was estimated significant (model gamm_domain_1
) it showed an opposite direction than proposed in the hypothesis.
The models also observed the typical double-peaked pattern for the seasonal variation—peaking in March and November—which can be attributed to the vacation patterns of Czech citizens (see our previous report, Quiros Segovia and Řezníček 2025). Lastly, the monthly temporal trend smooths captured the slow-down in the domain count’s trajectory (since 2023), accounting for the temporal inertia and saturation in domain counts (however, this slow-down was not estimated in gamm_domains_3
).
3.2 Holders
Next, I present results of the models predicting the count of Czech domain holders. With the exception of using the Poisson distribution, specifying different correlation structures and knots for the interaction terms, the models are the same as in the domains section; therefore, I do not describe the specifications and only report the results.
Code
<- gamm(
gamm_holders_1 ~
holders s(month, k = 12, bs = "cc") +
s(time) +
s(new_cases) +
s(inflation) +
ukraine,random = list(year_f = ~ 1),
correlation =
corARMA(form = ~ time,
p = 2,
q = 1),
data = df_crises,
family = poisson,
method = "REML",
control =
nlmeControl(maxIter = 1e8,
msMaxIter = 1e8,
msMaxEval = 1e8,
msVerbose = FALSE,
optimMethod = "L-BFGS-B")
)saveRDS(gamm_holders_1, file = "gamm_holders_1.rds")
Code
<- readRDS(file = "gamm_holders_1.rds") gamm_holders_1
3.2.1 Holders - GAMM 2
The second model found a significant association of holder counts with the interaction between the monthly trend and new COVID-19 cases. The main effect for the monthly trend and the seasonal pattern were also significant (other terms were insignificant).
Code
<- gamm(
gamm_holders_2 ~
holders s(month, k = 12, bs = "cc") +
ti(time) +
ti(new_cases) +
ti(time, new_cases, k = c(10, 6)) +
ukraine,random = list(year_f = ~ 1),
correlation =
corARMA(form = ~ time,
p = 1,
q = 1),
data = df_crises,
family = poisson,
method = "REML",
control =
nlmeControl(maxIter = 1e8,
msMaxIter = 1e8,
msMaxEval = 1e8,
msVerbose = FALSE,
optimMethod = "L-BFGS-B")
)saveRDS(gamm_holders_2, file = "gamm_holders_2.rds")
Code
<- draw(
fig $gam, residuals = TRUE, select = 1)
gamm_holders_2ggplotly(fig)
<- draw(
fig $gam, residuals = TRUE, select = 2)
gamm_holders_2ggplotly(fig) |>
layout(
xaxis = list(tickvals = ~ ticks$time,
ticktext = ~ ticks$ticktext)
)<- draw(
fig $gam, residuals = TRUE, select = 3)
gamm_holders_2ggplotly(fig)
In Figure 16 above, the smooth for the seasonal pattern once again estimates a double-peaked association with a clear negative depression during summer (although the autumn peak does not seem different from zero; sub-figure a). The monthly trend shows an overall increase, with a period of a slowdown somewhere between November 2021 and January 2023 (sub-figure b). The smooth for the new COVID-19 cases was insignificant.
Code
<-
gam_smooth smooth_estimates(gamm_holders_2$gam) |>
add_confint() |>
filter(.smooth == "ti(time,new_cases)")
<- get_ci_areas(gam_smooth, "ti", "time", "new_cases") gam_smooth
Code
plot_ly(
data = gam_smooth,
type = "contour",
x = ~time,
y = ~new_cases,
z = ~.estimate,
contours = list(
coloring = "heatmap",
showlabels = TRUE
),colors = "RdBu",
reversescale = TRUE,
lines = list(color = "black"),
hoverinfo = "text",
text = paste0(
"<b>New cases</b>: ",
round(gam_smooth$new_cases, 1),
"<br>",
"<b>Estimate</b>: ",
round(gam_smooth$.estimate, 4),
"<br>",
"<b>95% CI</b> (",
round(gam_smooth$.lower_ci, 4),
", ",
round(gam_smooth$.upper_ci, 4),
")"
)|>
) add_trace(
name = "Historical",
x = ~ df_crises$time,
y = ~ df_crises$new_cases,
type = "scatter",
mode = "lines",
line = list(color = "black"),
hoverinfo = "text",
text = paste0(
"<b>Month</b>: ",
round(df_crises$month, 0),
" (", df_crises$year, ")",
"<br>",
"<b>New cases</b>: ",
round(df_crises$new_cases, 1)
)|>
) add_trace(
name = "Positive CI",
type = "scatter",
mode = "markers",
x = ~ gam_smooth$var1_sig_pos,
y = ~ gam_smooth$var2_sig_pos,
marker = list(color = "black",
opacity = 0.1,
symbol = "triangle-up-open"),
showlegend = TRUE#,
#visible = "legendonly"
|>
) add_trace(
name = "Negative CI",
type = "scatter",
mode = "markers",
x = ~ gam_smooth$var1_sig_neg,
y = ~ gam_smooth$var2_sig_neg,
marker = list(color = "black",
opacity = 0.1,
symbol = "triangle-down-open"),
showlegend = TRUE#,
#visible = "legendonly"
|>
) layout(
title = "Associations with holder count",
xaxis = list(title = "Time",
tickvals = ~ ticks$time,
ticktext = ~ ticks$ticktext),
yaxis = list(title = "New cases")
|>
) colorbar(title = "Estimate")
The interaction term (Figure 17 above) estimates a positive association around March 2022, right after the peak of new COVID-19 cases. The smooth also estimates two areas with negative associations. The first one is located between October and December 2020, the second one between May and August 2022.
Below, Figure 18 shows that when the interaction term is included, the model predicts 2 134 more holders in March 2022 than when excluded. Note that Figure 17 also suggest two areas of negative association.
Code
<- fitted_values(
fitted_noint
gamm_holders_2,data = df_crises,
terms = c(
"(Intercept)",
"ti(new_cases)",
"ti(time)",
"s(month)",
"ukraine"
)
)<- fitted_values(
fitted_int
gamm_holders_2,data = df_crises,
terms = c(
"(Intercept)",
"ti(new_cases)",
"ti(time)",
"ti(time,new_cases)",
"s(month)",
"ukraine"
)
)
plot_ly(
name = "No interaction",
data = fitted_noint,
x = ~ts,
y = ~.fitted,
type = "scatter",
mode = "lines",
line = list(color = "#fc8d62",
dash = "dot"),
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_nointformat(fitted_noint$.fitted, big.mark = " ", scientific = FALSE),
" holders", "<br>",
"95% CI [",
format(fitted_noint$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_noint$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_ribbons(
name = "No interaction CI",
x = ~ts,
ymin = fitted_noint$.lower_ci,
ymax = fitted_noint$.upper_ci,
line = list(color = "#fc8d62"),
fillcolor = "#fc8d62",
opacity = 0.05,
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_noint"95% CI [",
format(fitted_noint$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_noint$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_trace(
name = "All",
x = ~ts,
y = ~fitted_int$.fitted,
line = list(color = "#8da0cb",
dash = "dot"),
hovertemplate = paste0(
$year, "-", fitted_int$month, "<br>",
fitted_nointformat(fitted_int$.fitted, big.mark = " ", scientific = FALSE),
" holders", "<br>",
"95% CI [",
format(fitted_int$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_int$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_ribbons(
name = "All CI",
x = ~ts,
ymin = fitted_int$.lower_ci,
ymax = fitted_int$.upper_ci,
line = list(color = "#8da0cb"),
fillcolor = "#8da0cb",
opacity = 0.05,
hovertemplate = paste0(
$year, "-", fitted_int$month, "<br>",
fitted_noint"95% CI [",
format(fitted_int$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_int$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_trace(
name = "Historical",
x = ~ts,
y = ~holders,
line = list(color = "#66c2a5",
dash = "full"),
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_nointformat(fitted_noint$holders, big.mark = " ", scientific = FALSE), " holders"
)|>
) layout(
xaxis = list(
title = "Date"
),yaxis = list(
title = "Holders"
) )
3.2.2 Holders - GAMM 3
Lastly, the third model found a significant associations between the holder count and the monthly trend, the seasonal pattern and the interaction between the monthly trend and the inflation rate.
Code
<- gamm(
gamm_holders_3 ~
holders s(month, k = 12, bs = "cc") +
ti(time) +
ti(inflation) +
ti(time, inflation, k = c(10, 5)) +
ukraine,random = list(year_f = ~ 1),
correlation =
corARMA(form = ~ time,
p = 2,
q = 1),
data = df_crises,
family = poisson,
method = "REML",
control =
nlmeControl(maxIter = 1e8,
msMaxIter = 1e8,
msMaxEval = 1e8,
msVerbose = FALSE,
optimMethod = "L-BFGS-B")
)saveRDS(gamm_holders_3, file = "gamm_holders_3.rds")
Code
<- readRDS(file = "gamm_holders_3.rds") gamm_holders_3
Code
<- draw(
fig $gam, residuals = TRUE, select = 1)
gamm_holders_3ggplotly(fig)
<- draw(
fig $gam, residuals = TRUE, select = 2)
gamm_holders_3ggplotly(fig) |>
layout(
xaxis = list(tickvals = ~ ticks$time,
ticktext = ~ ticks$ticktext)
)<- draw(
fig $gam, residuals = TRUE, select = 3)
gamm_holders_3ggplotly(fig)
In Figure 19 above, we can observe the usual temporal smooths. In Figure 20 below, the model estimates two positive and two negative areas. With the inflation rate below 8.3% and until mid-2022, the relationship is positive. Then, when inflation surges above 12.3%, the relationship becomes negative until September 2022. However, after the inflation rate peaks, the relationship suddenly turns positive again and remains different from zero above 12.3% inflation rate. Then, upon falling below 8.3% inflation rate, the relationship turns negative for the rest of the tested period.
Code
<-
gam_smooth smooth_estimates(gamm_holders_3$gam) |>
add_confint() |>
filter(.smooth == "ti(time,inflation)")
<- get_ci_areas(gam_smooth, "ti", "time", "inflation") gam_smooth
Code
plot_ly(
data = gam_smooth,
type = "contour",
x = ~time,
y = ~inflation,
z = ~.estimate,
contours = list(
coloring = "heatmap",
showlabels = TRUE
),colors = "RdBu",
reversescale = TRUE,
lines = list(color = "black"),
hoverinfo = "text",
text = paste0(
"<b>Inflation</b>: ",
round(gam_smooth$inflation, 1),
"<br>",
"<b>Estimate</b>: ",
round(gam_smooth$.estimate, 4),
"<br>",
"<b>95% CI</b> (",
round(gam_smooth$.lower_ci, 4),
", ",
round(gam_smooth$.upper_ci, 4),
")"
)|>
) add_trace(
name = "Historical",
x = ~ df_crises$time,
y = ~ df_crises$inflation,
type = "scatter",
mode = "lines",
line = list(color = "black"),
hoverinfo = "text",
text = paste0(
"<b>Month</b>: ",
round(df_crises$month, 0),
" (", df_crises$year, ")",
"<br>",
"<b>Inflation</b>: ",
round(df_crises$inflation, 1)
)|>
) add_trace(
name = "Positive CI",
type = "scatter",
mode = "markers",
x = ~ gam_smooth$var1_sig_pos,
y = ~ gam_smooth$var2_sig_pos,
marker = list(color = "black",
opacity = 0.1,
symbol = "triangle-up-open"),
showlegend = TRUE#,
#visible = "legendonly"
|>
) add_trace(
name = "Negative CI",
type = "scatter",
mode = "markers",
x = ~ gam_smooth$var1_sig_neg,
y = ~ gam_smooth$var2_sig_neg,
marker = list(color = "black",
opacity = 0.1,
symbol = "triangle-down-open"),
showlegend = TRUE#,
#visible = "legendonly"
|>
) layout(
title = "Associations with holder count",
xaxis = list(title = "Time",
tickvals = ~ ticks$time,
ticktext = ~ ticks$ticktext),
yaxis = list(title = "Inflation")
|>
) colorbar(title = "Estimate",
limits = c(-max(abs(gam_smooth$.estimate)),
max(abs(gam_smooth$.estimate))))
In Figure 21 below, predictions are plotted.
Code
<- fitted_values(
fitted_noint
gamm_holders_3,data = df_crises,
terms = c(
"(Intercept)",
"ti(inflation)",
"ti(time)",
"s(month)",
"ukraine"
)
)<- fitted_values(
fitted_int
gamm_holders_3,data = df_crises,
terms = c(
"(Intercept)",
"ti(inflation)",
"ti(time)",
"ti(time,inflation)",
"s(month)",
"ukraine"
)
)
plot_ly(
name = "No interaction",
data = fitted_noint,
x = ~ts,
y = ~.fitted,
type = "scatter",
mode = "lines",
line = list(color = "#fc8d62",
dash = "dot"),
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_nointformat(fitted_noint$.fitted, big.mark = " ", scientific = FALSE),
" holders", "<br>",
"95% CI [",
format(fitted_noint$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_noint$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_ribbons(
name = "No interaction CI",
x = ~ts,
ymin = fitted_noint$.lower_ci,
ymax = fitted_noint$.upper_ci,
line = list(color = "#fc8d62"),
fillcolor = "#fc8d62",
opacity = 0.05,
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_noint"95% CI [",
format(fitted_noint$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_noint$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_trace(
name = "All",
x = ~ts,
y = ~fitted_int$.fitted,
line = list(color = "#8da0cb",
dash = "dot"),
hovertemplate = paste0(
$year, "-", fitted_int$month, "<br>",
fitted_nointformat(fitted_int$.fitted, big.mark = " ", scientific = FALSE),
" holders", "<br>",
"95% CI [",
format(fitted_int$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_int$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_ribbons(
name = "All CI",
x = ~ts,
ymin = fitted_int$.lower_ci,
ymax = fitted_int$.upper_ci,
line = list(color = "#8da0cb"),
fillcolor = "#8da0cb",
opacity = 0.05,
hovertemplate = paste0(
$year, "-", fitted_int$month, "<br>",
fitted_noint"95% CI [",
format(fitted_int$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_int$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_trace(
name = "Historical",
x = ~ts,
y = ~holders,
line = list(color = "#66c2a5",
dash = "full"),
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_nointformat(fitted_noint$holders, big.mark = " ", scientific = FALSE), " holders"
)|>
) layout(
xaxis = list(
title = "Date"
),yaxis = list(
title = "Holders"
) )
Code
AIC(gamm_holders_1$lme,
$lme,
gamm_holders_2$lme) gamm_holders_3
df AIC
gamm_holders_1$lme 13 -472.4934
gamm_holders_2$lme 13 -481.8192
gamm_holders_3$lme 14 -478.3854
Code
BIC(gamm_holders_1$lme,
$lme,
gamm_holders_2$lme) gamm_holders_3
df BIC
gamm_holders_1$lme 13 -447.8997
gamm_holders_2$lme 13 -457.2256
gamm_holders_3$lme 14 -451.9000
3.2.3 Discussion: Holders and 2020s crises
Regarding the holder counts, the models estimated associations which support the proposed hypotheses but concomitantly paint a more complex picture than envisioned as they also suggest associations going in contrary directions (similarly to the domain models), albeit with insignificant main effect smooths for the new COVID-19 cases and the inflation rate (unlike the domain models).
When inspecting model gamm_holders_2
(Section 3.2.1), the interaction term between monthly trend and new COVID-19 cases finds support for the hypothesis H2 (Transmission of COVID-19 positively associates with the count of Czech holders.) around March 2022, one month after the peak of the second COVID-19 wave. Furthermore, the very same interaction term also predicts a decrease of Czech holders at the beginning of the first wave. In sum, the holder count seems to be associated with the intensity of COVID-19 transmission depending on time—the results suggest a dip in holder counts at the beginning of the first wave but an increase in holder counts right after the peak of the second wave. Similarly to the domain counts, the speculation about austerity, precaution, and confusion during the first wave and necessity or preparedness during the second wave might be the story behind these changes.
Model gamm_holders_3
(Section 3.2.2) finds somewhat perplexing results concerning the hypothesis H5 (Inflation negatively associates with the count of Czech domain holders.). The interaction term between the monthly trend and the inflation rate suggests associations not only in both directions, but also reversing their directions at low and high inflation rates. Following the historically observed inflation values, the results suggest that when the inflation rate was low at the beginning of the investigated period, we should expect more domain holders (implicitly supporting H5). Then, once the inflation rate goes above some 12.3%, the association becomes negative, suggesting that high inflation values associate negatively with holder counts (supporting H5). However, the association suddenly becomes positive in the middle of the inflation’s peak, going contrary to H5. Such a reversal seems confusing. Furthermore, as inflation gradually decreases, the positive association becomes uncertain and turns negative once the rate falls below 8.3%. Again, such a development seems confusing. In sum, while the upward-going-inflation part of the investigated period seems to support the hypothesis H5, the downward-going-inflation part suggests the exact opposite. Why were the associations estimated inversely during the downward-going-inflation part remains a mystery.
No support was found for the hypothesis H8 (The war in Ukraine positively associates with the count of Czech domain holders.).
Lastly, the count of holders was estimated to vary seasonally, copying the typical double-peaked shape. The count of holders was also estimated to gradually increases over time, with a period of slowdown around the year 2022.
3.3 Traffic
Finally, I finish the modelling with three models estimating the traffic in the .cz domain. I fully comment only on the second model as results in the first one are not too interesting and the third model’s results seem confusing.
Code
<- gamm(
gamm_qps_1 ~
qps s(month, k = 12, bs = "cc") +
s(time) +
s(new_cases) +
s(inflation) +
ukraine,random = list(year_f = ~ 1),
correlation =
corARMA(form = ~ time,
p = 1,
q = 0),
data = df_crises,
family = gaussian,
method = "REML"
)saveRDS(gamm_qps_1, file = "gamm_qps_1.rds")
Code
<- readRDS(file = "gamm_qps_1.rds") gamm_qps_1
3.3.1 Traffic - GAMM 2
Model gamm_qps_2
estimates traffic by a seasonal pattern (months within a year), an overall monthly trend, (thousands of) new COVID-19 cases, whether the Russo-Ukrainian war was ongoing, and by an interaction term between the monthly temporal trend and new COVID-19 cases. As before, I also specified a varying intercept for the respective years and a correlation structure accounting for the autocorrelation of observations.
Code
<- gamm(
gamm_qps_2 ~
qps s(month, k = 12, bs = "cc") +
ti(time) +
ti(new_cases) +
ti(time, new_cases, k = c(10, 5)) +
ukraine,random = list(year_f = ~ 1),
correlation =
corARMA(form = ~ time,
p = 2,
q = 2),
data = df_crises,
family = gaussian,
method = "REML",
control =
nlmeControl(maxIter = 1e8,
msMaxIter = 1e8,
msMaxEval = 1e8,
msVerbose = FALSE,
optimMethod = "L-BFGS-B")
)saveRDS(gamm_qps_2, file = "gamm_qps_2.rds")
Code
<- readRDS(file = "gamm_qps_2.rds") gamm_qps_2
Code
<- draw(
fig $gam, residuals = TRUE, select = 1)
gamm_qps_2ggplotly(fig)
<- draw(
fig $gam, residuals = TRUE, select = 2)
gamm_qps_2ggplotly(fig) |>
layout(
xaxis = list(tickvals = ~ ticks$time,
ticktext = ~ ticks$ticktext)
)<- draw(
fig $gam, residuals = TRUE, select = 3)
gamm_qps_2ggplotly(fig)
Monthly averages for QPS values do not seem to be associated with any seasonal pattern (while the smooths somewhat wiggles in the typical shape, it remains uncertain throughout the entire year). Nevertheless, QPS has been estimated to increase with the monthly trend, gradually increasing in its positive association. For the main effect term on the association with the new COVID-19 cases, the model estimated a diminishing positive association below some 97 thousand cases and an increasingly negative association above (although less observations lie therein, making the estimates increasingly less certain). The term for the Russo-Ukrainian war was insignificant.
Code
<-
gam_smooth smooth_estimates(gamm_qps_2$gam) |>
add_confint() |>
filter(.smooth == "ti(time,new_cases)")
<- get_ci_areas(gam_smooth, "ti", "time", "new_cases") gam_smooth
Code
plot_ly(
data = gam_smooth,
type = "contour",
x = ~time,
y = ~new_cases,
z = ~.estimate,
colors = "RdBu",
reversescale = TRUE,
contours = list(
coloring = "heatmap",
showlabels = TRUE
),lines = list(color = "black"),
hoverinfo = "text",
text = paste0(
"<b>Month</b>: ",
round(gam_smooth$month, 0),
"<br>",
"<b>New cases</b>: ",
round(gam_smooth$new_cases, 1),
"<br>",
"<b>Estimate</b>: ",
round(gam_smooth$.estimate, 4),
"<br>",
"<b>95% CI</b> (",
round(gam_smooth$.lower_ci, 4),
", ",
round(gam_smooth$.upper_ci, 4),
")"
)|>
) add_trace(
name = "Historical",
x = ~ df_crises$time,
y = ~ df_crises$new_cases,
type = "scatter",
mode = "lines",
line = list(color = "black"),
hoverinfo = "text",
text = paste0(
"<b>Month</b>: ",
round(df_crises$month, 0),
" (", df_crises$year, ")",
"<br>",
"<b>New cases</b>: ",
round(df_crises$new_cases, 1)
)|>
) add_trace(
name = "Positive CI",
type = "scatter",
mode = "markers",
x = ~ gam_smooth$var1_sig_pos,
y = ~ gam_smooth$var2_sig_pos,
marker = list(color = "black",
opacity = 0.1,
symbol = "triangle-up-open")#,
#visible = "legendonly"
|>
) add_trace(
name = "Negative CI",
type = "scatter",
mode = "markers",
x = ~ gam_smooth$var1_sig_neg,
y = ~ gam_smooth$var2_sig_neg,
marker = list(color = "black",
opacity = 0.1,
symbol = "triangle-down-open")#,
#visible = "legendonly"
|>
) layout(
title = "Associations with QPS",
xaxis = list(title = "Time",
tickvals = ~ ticks$time,
ticktext = ~ ticks$ticktext),
yaxis = list(title = "New cases")
|>
) colorbar(title = "Estimate")
In Figure 25 above, the entire surface of the plot estimated significant associations between the QPS values and the interaction between the monthly trend and new COVID-19 cases. Notably, the smooth estimates the largest positive associations precisely at the peaks of the COVID-19 pandemic (January 2021 and January and February 2022). The interaction term also estimates a negative relationship for the period between September 2020 and September 2022 when below 90 thousand new COVID-19 cases, wherein lie historical observations of new COVID-19 cases during their summery seasonal dips.
In Figure 26 below, we can see that the model predicts some 2000 more QPS during both the first and the second wave when the interaction term is included.
Code
<- fitted_values(
fitted_noint
gamm_qps_2,data = df_crises,
terms = c(
"(Intercept)",
"ti(new_cases)",
"ti(time)",
"s(month)",
"ukraine"
)
)<- fitted_values(
fitted_int
gamm_qps_2,data = df_crises,
terms = c(
"(Intercept)",
"ti(new_cases)",
"ti(time)",
"ti(time,new_cases)",
"s(month)",
"ukraine"
)
)
plot_ly(
name = "No interaction",
data = fitted_noint,
x = ~ts,
y = ~.fitted,
type = "scatter",
mode = "lines",
line = list(color = "#fc8d62",
dash = "dot"),
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_nointformat(fitted_noint$.fitted, big.mark = " ", scientific = FALSE),
" QPS", "<br>",
"95% CI [",
format(fitted_noint$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_noint$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_ribbons(
name = "No interaction CI",
x = ~ts,
ymin = fitted_noint$.lower_ci,
ymax = fitted_noint$.upper_ci,
line = list(color = "#fc8d62"),
fillcolor = "#fc8d62",
opacity = 0.05,
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_noint"95% CI [",
format(fitted_noint$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_noint$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_trace(
name = "All",
x = ~ts,
y = ~fitted_int$.fitted,
line = list(color = "#8da0cb",
dash = "dot"),
hovertemplate = paste0(
$year, "-", fitted_int$month, "<br>",
fitted_nointformat(fitted_int$.fitted, big.mark = " ", scientific = FALSE),
" QPS", "<br>",
"95% CI [",
format(fitted_int$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_int$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_ribbons(
name = "All CI",
x = ~ts,
ymin = fitted_int$.lower_ci,
ymax = fitted_int$.upper_ci,
line = list(color = "#8da0cb"),
fillcolor = "#8da0cb",
opacity = 0.05,
hovertemplate = paste0(
$year, "-", fitted_int$month, "<br>",
fitted_noint"95% CI [",
format(fitted_int$.lower_ci, big.mark = " ", scientific = FALSE),
", ",
format(fitted_int$.upper_ci, big.mark = " ", scientific = FALSE),
"]"
)|>
) add_trace(
name = "Historical",
x = ~ts,
y = ~qps,
line = list(color = "#66c2a5",
dash = "full"),
hovertemplate = paste0(
$year, "-", fitted_noint$month, "<br>",
fitted_nointformat(fitted_noint$qps, big.mark = " ", scientific = FALSE), " QPS"
)|>
) layout(
xaxis = list(
title = "Date"
),yaxis = list(
title = "QPS"
) )
Code
<- gamm(
gamm_qps_3 ~
qps s(month, k = 12, bs = "cc") +
ti(time) +
ti(inflation) +
ti(time, inflation) +
ukraine,random = list(year_f = ~ 1),
correlation =
corARMA(form = ~ time,
p = 4,
q = 0),
data = df_crises,
family = gaussian,
method = "REML"
)saveRDS(gamm_qps_3, file = "gamm_qps_3.rds")
Code
<- readRDS(file = "gamm_qps_3.rds") gamm_qps_3
Code
AIC(gamm_qps_1$lme,
$lme,
gamm_qps_2$lme) gamm_qps_3
df AIC
gamm_qps_1$lme 12 732.2652
gamm_qps_2$lme 16 720.7267
gamm_qps_3$lme 16 722.3500
Code
BIC(gamm_qps_1$lme,
$lme,
gamm_qps_2$lme) gamm_qps_3
df BIC
gamm_qps_1$lme 12 753.6755
gamm_qps_2$lme 16 749.2737
gamm_qps_3$lme 16 750.8970
3.3.2 Discussion: Traffic and 2020s crises
Regarding the traffic in the .cz domain, the most interesting results were found in the model gamm_qps_2
(Section 3.3.1). Similarly to the results presented in previous sections, the model estimated positive associations for the interaction term between the monthly trend and new COVID-19 cases around the peaks of the COVID-19 transmission, starting as soon as there are some 100K new COVID-19 cases. Notably, this positive association appears at lower values of new cases and is far more widespread than the association estimated for domains and holders (which were observed only at or around the second-wave peaks). Furthermore, when the values of new COVID-19 cases fell down below some 100k in-between the waves, the association was estimated negative. Then, the hypothesis H3 (Increase in the spread of COVID-19 positively associates with QPS.) finds support when seen through the lens of the interaction. Taken together, when looking at the new cases’ fluctuations depending on the monthly time-flow, we can once again observe that traffic increases in given time periods when new COVID-19 cases simultaneously increase. In traffic’s case, these increases were observed for both waves of COVID-19, with a clear drop in and around the summer in-between, and also later after the second wave faded away. In sum, it seems that traffic increased noticeably during both COVID-19 waves when the transmission of COVID-19 increased too.
As for the inflation rate, the model gamm_qps_3
found associations supporting the hypothesis H6 (Increase of inflation associates with QPS.). When proposing the hypothesis in the introduction, I was unsure in which direction to envision such associations. Unfortunately, inspecting the results of the model has not shed more light on the processes which may lie behind them.
No support was found for the hypothesis H9 (The war in Ukraine positively associates with QPS).
Lastly, traffic does not seem to be associated with any seasonal pattern (although it slightly wiggles, similarly to the typical double-peaked shape). However, the monthly trend clearly shows a gradually increasing traffic volume.
4 General discussion
The goal of this report was to better understand how the .cz domain fared during the challenges that emerged during the first half of the 2020s. To do so, I investigated whether the spread of COVID-19, inflation rate, and the war in Ukraine associated with the count of second-level domains under .cz, the count of Czech holders of .cz domains, and traffic under .cz. I formulated three sets of hypotheses proposing that (a) higher transmission of COVID-19 would increase the domain and holder counts and QPS, (b) higher inflation rates would decrease domain and holder counts and associate with QPS, and that (c) the war in Ukraine would increase the count of domains and holders and also QPS. While the analysis found support for many of the hypotheses, the modelling results also estimated associations going contrary to the proposed directions. At first, this may sound like a paradox but upon a closer look the results provide a picture more complex than was originally envisioned.
One surprising finding is that the domain and holder counts behaved differently during the first and second COVID-19 wave. During the first wave, domain counts and in part also holder counts were unexpectedly estimated to decrease when the count of new COVID-19 cases in the Czech Republic increased. Here, the reason might be that individuals and companies might have been confused and ill-prepared to quickly adjust their activities and interests when facing the global pandemic for the first time. Perhaps, instead of rushing to register domains to realize interests in the online space, the shock and confusion initially lead to austerity, inactivity, or cautious hesitation, consequently decreasing the willingness to register domains (similarly to slowdowns in other industries, see Czech National Bank (2020)). However, during the second wave, both the domain and holder counts increased when the count of new COVID-19 cases increased too, supporting hypotheses H1 and H2. Here, we may speculate that individuals and companies might have been better prepared to face the challenges of another COVID-19 wave or were forced to move their interest and activities online out of necessity or both.
The results suggest a slightly different story for the traffic as QPS increased during both COVID-19 waves and the positive associations emerged at lower levels of COVID-19 transmission than for domain and holder counts. Furthermore, the association even became negative once the COVID-19 transmission decreased between the waves and afterwards. Together, these results support the hypothesis H3. The answer to why the results suggest increases in QPS even during the first COVID-19 wave quite possibly also dwells in the sudden confusion brought by the onset of the pandemic. However, unlike for the domain and holder counts, the uncertainty and social isolation prompted heightened demand for answers, entertainment or escapism in the online space which could have been realized instantly, resulting in increased traffic. Then, the first-wave difference from the domain and holder counts associations might be that the crisis projected into traffic way more flexibly and immediately, while registering domains and becoming a holder may require more deliberation and additional costs.
Furthermore, I found that high inflation rates associated negatively with domain counts in the period between March 2022 and June 2023, supporting hypothesis H4. When inflation rates surged to their highest values around 18% in July 2022, domain counts were estimated to drop by 22 thousand. Such results suggest that the domain count reflects the socio-economic situation in Czech Republic—and once the economic hardship set in sufficiently, the domain count seemed to follow these broader developments.
Inflation was found to associate with holder counts too (supporting hypothesis H5) but the size of the association was not as large as for the domain counts. Furthermore, inflation also associated with QPS; however, it is not clear what such associations might mean, if anything at all.
Lastly, the Russo-Ukrainian war did not seem to be associated with any changes in any of the modeled variables (hypotheses H7–9). One reason for such null results might be that the online dimension of the conflict mostly focuses on disseminating (dis-)information through already established domains (e.g., news websites) and social networking services suitable for reaching large audiences.
One limitation of this analysis is that the interaction terms for inflation and new COVID-19 cases with the monthly trend were not estimated within the same models but separately because high concurvity between the predictor variables would otherwise plague the models’ estimates. Also note that the associations described in this report are correlational, not causal. While it may seem intuitive to claim that COVID-19 and inflation influenced the count of domains and holders and the volume of QPS, one should not interpret the modeling results as such. Therefore, I usually describe the results as associations without claiming causality. If we were to properly test each predictors’ causal effect, we would need a parallel universe serving as an experimental control group where COVID-19, high inflation, and war in Ukraine did not happen. Unfortunately, we are aware only of the universe where all these crises happened.
To conclude, the analysis presents insights into how the COVID-19, inflation and war in Ukraine associated with trends in the .cz domain. It helps to evaluate and interpret the developments in the count of domains and holders and traffic under .cz by showing that all these variables were diversely related to the changes in the transmission of COVID-19 and the rise and fall of the inflation rate in the Czech Republic.
References
Footnotes
Before 2017, the .cz TLD was still observing an increasing trend in domain and holder counts.↩︎
One could argue that the Israeli conflicts with Hamas and Hezbollah could also be of focus. However, these conflicts have not created any immediate migration crises affecting the Czech Republic and are geographically distant. Thus, for simplicity, these conflicts were not modeled in this report.↩︎
The interaction term also estimated negative associations at the beginning and the end of the investigated period; however, it is dubious what these could mean.↩︎