r2ogs6 Ensemble Guide
2023-04-05
ensemble_workflow_vignette.Rmd
Introduction
Hi there! This is a practical guide on the OGS6_Ensemble
class from the r2ogs6
package. I will show you how you can use this class to set up ensemble runs for OpenGeoSys 6, extract the results and plot them.
If you want to follow this tutorial, you’ll need to have r2ogs6
installed and loaded. The prerequisites for r2ogs6
are described in detail in the r2ogs6 User Guide
vignette.
Additionally, you’ll need the following benchmark files:
Theis’ Problem as described here
Theis solution for well pumping as described here
General instructions on how to download OpenGeoSys 6 benchmark files can be found here.
Theis’ problem
We will consider the following parameters for our sensitivity analysis:
permeability
porosity
storage
Setup
First, we create a simulation object to base our ensemble on and read in the .prj
file.
# Change this to fit your system
testdir_path <- tempdir()
sim_path <- paste0(testdir_path, "/axisym_theis_sim")
ogs6_obj <- OGS6$new(sim_name = "axisym_theis",
sim_path = sim_path)
# Change this to fit your system
prj_path <- system.file("extdata/benchmarks/AxiSymTheis/",
"axisym_theis.prj", package = "r2ogs6")
read_in_prj(ogs6_obj, prj_path, read_in_gml = T)
Single-parameter sequential run
Let’s create a small ensemble where we only alter the value of storage
. Say we don’t want to hardcode the values, but instead examine the effects of changing storage
by 1%, 10% and 50%. We can use the percentages_mode
parameter of OGS6_Ensemble
for this. It already defaults to TRUE
, below I’m merely being explicit for demonstration purposes.
# Assign percentages
percentages <- c(-50, -10, -1, 0, 1, 10, 50)
# Define an ensemble object
ogs6_ens <-
OGS6_Ensemble$new(
ogs6_obj = ogs6_obj,
parameters = list(list(ogs6_obj$media[[1]]$properties[[4]]$value,
percentages)),
percentages_mode = TRUE)
Now you can start the simulation.
ogs6_ens$run_simulation()
lapply(ogs6_ens$ensemble, ogs6_read_output_files)
Plot results for single simulation
Our simulations (hopefully) ran successfully - great! Now it’d be nice to see some results. Say we’re interested in the pressure
data.
# This will get a combined dataframe
storage_tbl <-
ogs6_ens$get_point_data(point_ids = c(0, 1, 2),
keys = c("pressure"))
You may leave out the point_ids
parameter. It defaults to all points in the dataset - I specify it here because there’s about 500 which would slow down building this vignette.
# Let's look at the first 10 rows of the dataset
head(storage_tbl, 10)
#> # A tibble: 10 × 8
#> id x y z pressure timestep sim_id perc
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 0 0.305 0 0 0 0 1 -50
#> 2 0 0.305 0 0 6.99 8.64 1 -50
#> 3 0 0.305 0 0 9.77 86.4 1 -50
#> 4 0 0.305 0 0 13.3 1728 1 -50
#> 5 0 0.305 0 0 16.2 24192 1 -50
#> 6 0 0.305 0 0 16.6 172800 1 -50
#> 7 0 0.305 0 0 16.6 604800 1 -50
#> 8 0 0.305 0 0 16.6 864000 1 -50
#> 9 1 305. 0 0 0 0 1 -50
#> 10 1 305. 0 0 0 8.64 1 -50
You can see there’s one row per timestep for each point ID. Say we want to plot the average pressure for each timestep
, that is, the mean of the pressure
values where the rows have the same timestep
. We’ll consider only the first simulation for now.
# Get average pressure from first simulation
avg_pr_first_sim <- storage_tbl[storage_tbl$sim_id == 1,] %>%
group_by(timestep) %>%
summarize(avg_pressure = mean(pressure))
# Plot pressure over time for first simulation
ggplot(data = avg_pr_first_sim) +
geom_point(mapping = aes(x = as.numeric(row.names(avg_pr_first_sim)),
y = avg_pressure)) + xlab("Timestep")

plot of chunk p-t-first-plot
Plot results for multiple simulations
# Get average pressure for all simulations
avg_pr_df <- storage_tbl %>%
group_by(sim_id, timestep) %>%
summarise(avg_pressure = mean(pressure))
# Plot pressure over time for all simulations
ggplot(avg_pr_df,
aes(x = as.numeric(as.factor(timestep)),
y = avg_pressure)) +
geom_point(aes(color = as.factor(sim_id))) +
geom_line(aes(color = as.factor(sim_id))) +
xlab("Timestep") +
labs(color = "sim id") +
facet_grid(rows = vars(sim_id))

plot of chunk p-t-all-plot
Now we have the average pressure
over time for each of the simulations (rows). Since they’re pretty similar, let’s put them in the same plot for a better comparison.
ggplot(avg_pr_df, aes(x = as.numeric(as.factor(timestep)),
y = avg_pressure,
group = sim_id)) +
geom_point(aes(color = as.factor(sim_id))) +
geom_line(aes(color = as.factor(sim_id))) +
labs(color = "sim id") +
xlab("Timestep")

plot of chunk p-t-all-combined-plot
Multi-parameter sequential run
So far we’ve only considered the storage
parameter. Now we want to have a look at how the other two parameters influence the simulation, so let’s put them into an ensemble together. For permeability
we can reuse the percentages
we already have. For porosity
, this doesn’t work - the original value is 1 and its values can only range between 0 and 1, so we’ll supply a shorter vector.
This time it’s important we set the OGS6_Ensemble
parameter sequential_mode
to TRUE
as this will change the supplied parameters sequentially which means in the end we have 18 (7 + 7 + 4) simulations which is equal to the sum of elements in the value vectors you supply (I’ve named them values
below for clarity, naming them is optional though).
The default FALSE
would give an error message because our value vectors do not have the same length and even if they had, it wouldn’t do what we want - the number of simulations would equal the length of one value vector (thus requiring them to be of the same length). Generally, set sequential_mode
to TRUE
if you want to examine the influence of parameters on a simulation independently. If you want to examine how the parameters influence each other as in wanting to test parameter combinations, the default mode is the way to go.
# Change this to fit your system
sim_path <- paste0(testdir_path, "/axisym_theis_sim_big")
ogs6_obj$sim_path <- sim_path
ogs6_ens_big <-
OGS6_Ensemble$new(
ogs6_obj = ogs6_obj,
parameters = list(per = list(ogs6_obj$media[[1]]$properties[[1]]$value,
values = percentages),
por = list(ogs6_obj$media[[1]]$properties[[3]]$value,
c(-50, -10, -1, 0)),
sto = list(ogs6_obj$media[[1]]$properties[[4]]$value,
values = percentages)),
sequential_mode = TRUE)
Now you can start the simulation.
ogs6_ens_big$run_simulation()
lapply(ogs6_ens_big$ensemble, ogs6_read_output_files)
This will take a short time. As soon as the simulations are done, we can extract the point data much like we did before. This time we want to plot the point x coodinates on the x axis so we’re leaving out point_ids
to get all points. Also we just want the data from the last timestep.
# Get combined dataframe
per_por_sto_df <-
ogs6_ens_big$get_point_data(
keys = c("pressure"),
start_at_timestep = ogs6_ens_big$ensemble[[1]]$pvds[[1]]$last_timestep)
Plotting time! Since we set sequential_mode
to TRUE
, the dataframe we just created contains a name
column which allows us to group by parameters. Because we’ve also set percentages_mode
to TRUE
, it also has a column perc
which allows us to group by percentages. Now we can simply use a facet grid to plot.
# Make plot
ggplot(per_por_sto_df,
aes(x = x,
y = -pressure, # Flip pressure because source term was positive
group = perc)) +
geom_point(aes(color = as.factor(perc))) +
xlab("Radius (m)") +
ylab("Head (m)") +
labs(color = "%") +
facet_grid(cols = vars(name),
labeller = as_labeller(c(per = "permeability",
por = "porosity",
sto = "storage")))

plot of chunk p-t-group-plot
Ta-Daa! We can see permeability
has the most influence on the pressure. Though they may seem suspicious, porosity
and storage
are being plotted correctly - the points are just being placed right on top of each other. Since porosity
can’t go over the value 1
which was the original value, our value vector only went from -50% to 0% which is why the line colors of porosity
and storage
differ. Maybe we want to try and use a logarithmic approach for storage
. This won’t work with the built-in functionality of OGS6_Ensemble
so we’ll set up our Ensemble a little differently.
# Calculate log value
log_val <- log(as.numeric(
ogs6_obj$media[[1]]$properties[[4]]$value),
base = 10)
# Apply changes to log value
log_vals <- vapply(percentages, function(x){
log_val + (log_val * (x / 100))
}, FUN.VALUE = numeric(1))
# Transfer back to non-logarithmic scale
back_transf_vals <- 10^log_vals
# Change sim_path to fit your system
ogs6_obj$sim_path <- paste0(testdir_path, "/axisym_theis_sim_log_storage")
# Set up new ensemble
ogs6_ens_sto <-
OGS6_Ensemble$new(
ogs6_obj = ogs6_obj,
parameters =
list(
sto = list(
ogs6_obj$media[[1]]$properties[[4]]$value,
values = back_transf_vals)
),
percentages_mode = FALSE,
sequential_mode = TRUE
)
As before, we can run the simulation right away.
ogs6_ens_sto$run_simulation()
lapply(ogs6_ens_sto$ensemble, ogs6_read_output_files)
Let’s check if we can observe any influence of storage
on pressure
now.
# Get combined dataframe
sto_df <-
ogs6_ens_sto$get_point_data(
keys = c("pressure"),
start_at_timestep = ogs6_ens_sto$ensemble[[1]]$pvds[[1]]$last_timestep)
# Supply percentages manually since we couldn't use `percentages_mode`
percs <- vapply(sto_df$sim_id,
function(x){percentages[[x]]},
FUN.VALUE = numeric(1))
ggplot(sto_df,
aes(x = x,
y = -pressure)) +
geom_point(aes(color = as.factor(percs))) +
xlab("Radius (m)") +
ylab("Head (m)") +
labs(color = "%")

plot of chunk p-t-log-plot
Theis solution for well pumping
We will consider the following parameters for our sensitivity analysis:
permeability
porosity
slope
Setup
First, we create a simulation object to base our ensemble on and read in the .prj
file. This time we want to specify that an output file only gets written at the last timestep.
# Change this to fit your system
sim_path <- paste0(testdir_path, "/theis_sim")
ogs6_obj <- OGS6$new(sim_name = "theis",
sim_path = sim_path)
# Change this to fit your system
prj_path <- system.file("extdata/benchmarks/theis_well_pumping/",
"theis.prj", package = "r2ogs6")
read_in_prj(ogs6_obj, prj_path, read_in_gml = T)
# Increase each_steps
ogs6_obj$time_loop$output$timesteps$pair$each_steps <- 200
Multi-parameter sequential run
# Assign percentages
percentages <- c(-50, -10, -1, 0, 1, 10, 50)
ogs6_ens_theis_2 <-
OGS6_Ensemble$new(
ogs6_obj = ogs6_obj,
parameters =
list(
per = list(ogs6_obj$parameters[[3]]$values,
values = percentages),
por = list(ogs6_obj$parameters[[2]]$value,
values = percentages),
slo = list(
ogs6_obj$media[[1]]$phases[[1]]$properties[[1]]$independent_variable[[2]]$slope,
values = percentages)
),
sequential_mode = TRUE
)
Now you can start the simulation.
ogs6_ens_theis_2$run_simulation()
lapply(ogs6_ens_theis_2$ensemble, ogs6_read_output_files)
When the simulations have run, we can extract and plot the results like before. To avoid cluttering the plot, we only extract the pressure
values for a single line. For this, we get the IDs of all points on the x axis.
# Extract point ids
get_point_ids_x <- function(points){
x_axis_ids <- numeric()
for(i in seq_len(dim(points)[[1]])) {
if (points[i, ][[2]] == 0 && points[i, ][[3]] == 0) {
x_axis_ids <- c(x_axis_ids, (i - 1))
}
}
return(x_axis_ids)
}
point_ids_x <- get_point_ids_x(
ogs6_ens_theis_2$ensemble[[1]]$pvds[[1]]$OGS6_vtus[[1]]$points)
# Get combined dataframe
per_por_slo_df <-
ogs6_ens_theis_2$get_point_data(
point_ids = point_ids_x,
keys = c("pressure"),
start_at_timestep = ogs6_ens_theis_2$ensemble[[1]]$pvds[[1]]$last_timestep)
# Make plot
ggplot(per_por_slo_df,
aes(x = x,
y = pressure / 9806.65, # 1mH2O = 9806.65 kPa
group = perc)) +
geom_point(aes(color = as.factor(perc))) +
xlab("Radius (m)") +
ylab("Absenkung (m)") +
labs(color = "%") +
facet_grid(cols = vars(name),
labeller = as_labeller(c(per = "permeability",
por = "porosity",
slo = "slope"
)))

plot of chunk p-x-plot
Let’s take a closer look at permeability
.
per_df <- subset(per_por_slo_df, name == "per")
# Make plot
ggplot(per_df,
aes(x = x,
y = pressure)) +
geom_point(aes(color = as.factor(perc))) +
xlab("Radius (m)") +
ylab("Head (m)") +
labs(color = "%")

plot of chunk p-x-subset-plot
Maybe we want to try and use a logarithmic approach for slope
. This won’t work with the built-in functionality of OGS6_Ensemble
so we’ll set up our Ensemble a little differently.
# Calculate log value
log_val <- log(as.numeric(
ogs6_obj$media[[1]]$phases[[1]]$properties[[1]]$independent_variable[[2]]$slope),
base = 10)
# Apply changes to log value
log_vals <- vapply(percentages, function(x){
log_val + (log_val * (x / 100))
}, FUN.VALUE = numeric(1))
# Transfer back to non-logarithmic scale
back_transf_vals <- 10^log_vals
# Change sim_path to fit your system
ogs6_obj$sim_path <- paste0(testdir_path, "/theis_sim_log_slope")
# Set up new ensemble
ogs6_ens_slo <-
OGS6_Ensemble$new(
ogs6_obj = ogs6_obj,
parameters =
list(
slo = list(
ogs6_obj$media[[1]]$phases[[1]]$properties[[1]]$independent_variable[[2]]$slope,
values = back_transf_vals)
),
percentages_mode = FALSE,
sequential_mode = TRUE
)
As before, we can run the simulation right away.
ogs6_ens_slo$run_simulation()
lapply(ogs6_ens_slo$ensemble, ogs6_read_output_files)
Let’s check if we can observe any influence of slope
on pressure
now.
# Filter point ids
point_ids_x <- get_point_ids_x(
ogs6_ens_slo$ensemble[[1]]$pvds[[1]]$OGS6_vtus[[1]]$points)
# Get combined dataframe
slo_df <-
ogs6_ens_slo$get_point_data(
point_ids = point_ids_x,
keys = c("pressure"),
start_at_timestep = ogs6_ens_slo$ensemble[[1]]$pvds[[1]]$last_timestep)
# Supply percentages manually since we couldn't use `percentages_mode`
percs <- vapply(slo_df$sim_id,
function(x){percentages[[x]]},
FUN.VALUE = numeric(1))
ggplot(slo_df,
aes(x = x,
y = pressure / 9806.65)) + # 1mH2O = 9806.65 kPa
geom_point(aes(color = as.factor(percs))) +
xlab("Radius (m)") +
ylab("Head (m)") +
labs(color = "%")

plot of chunk p-x-log-plot
Summary
The OGS6_Ensemble
class is a useful tool to set up ensemble runs for sensitivity analyses. In this vignette, we learned how to create OGS6_Ensemble
objects. We looked at how the parameters sequential_mode
and percentages_mode
influence how our ensemble object is initialised. We started simulations via OGS6_Ensemble$run_simulation()
and extracted information from the output files to plot them.