r2ogs6 Developer Guide

library(r2ogs6)

Hi there!

Welcome to my dev guide on r2ogs6. This is a collection of tips, useful info (and admittedly a few warnings) which will hopefully make your life a bit easier when developing this package.

The basics

Before we dive into any implementation details, we will take a look at how exactly this package is structured first. r2ogs6 was developed using the workflow described here. I strongly recommend keeping it that way as it will save you time and headaches.

…

In the main folder R/ you will find a lot of scripts, most of which can be grouped into the following categories:

export_*.R export functions
generate_*.R code generation
read_in_*.R import functions
ogs6_*.R simulation class definitions
prj_*.R class definitions for XML tags found in a .prj file
*_utils.R utility functions used in multiple scripts

The classes

r2ogs6 is largely built on top of S3 classes at the moment. For reasons I will elaborate on later, it is very viable to switch to R6 classes. But let’s look at what we have first.

….

Generating new classes

If you’ve familiarized yourself with OpenGeoSys 6, you know that there are a lot, and by a lot I mean a LOT of parameters and special cases regarding the .prj XML tags. For a nice new class based on such a tag, you will have to consider all of them.

To save me (and you) a bit of typing, I’ve written a few useful functions for this.

analyse_xml()

The first and arguably most important one is analyse_xml(). It matches files in a folder, reads them in as XML and searches for XML elements of a given name. It then analyses those elements and returns useful information about them, namely the names of their attributes and child elements. It prints a summary of its findings and also returns a list which we will look at in a moment.

I used this function for two things: Analysing … . Secondly, as soon as I had decided which tags should be represented by a class, I used the function output for class generation.

generate_*()

So say we have some .prj files stored in a folder. I will show the workflow on a small dataset (that is, on a folder with only two .prj files) here, the path I usually passed to analyse_xml() was the directory containing all of the benchmark files for OpenGeoSys 6 which can be downloaded from here.

test_folder <- system.file("extdata/vignettes_data/analyse_xml_demo", 
                           package = "r2ogs6")

Now say we have decided we are going to make a class based on the element with tag name nonlinear_solver. For readability reasons, I will store the results of analyse_xml() in a variable and pass it to our generator function. If you want, you can skip this step and call analyse_xml() in the generator function directly.

analysis_results <- analyse_xml(path = test_folder,
                                pattern = "\\.prj$",
                                xpath = "//nonlinear_solver",
                                print_findings = TRUE)
#> 
#> I parsed 2 valid XML files matching your pattern.
#> 
#> I found at least one element named nonlinear_solver in the following file(s):
#> beam.prj 
#> beam3d.prj 
#> 
#> In total, I found 5 element(s) named nonlinear_solver.
#> 
#> These are the child elements I found:
#>                 name ex_occ p_occ total total_mean
#> 1               name      2   0.4     2        0.4
#> 2               type      2   0.4     2        0.4
#> 3           max_iter      2   0.4     2        0.4
#> 4      linear_solver      2   0.4     2        0.4
#> 5 maximum_iterations      1   0.2     1        0.2
#> 6    error_tolerance      1   0.2     1        0.2
#> 7            damping      1   0.2     1        0.2

First, I define my path and specify that only files with the ending .prj will be parsed. I’m looking for elements named nonlinear_solver, and I’m looking for them in the whole document. This often isn’t the best option since sometimes nodes may have the same name but contain different things depending on their exact position in the document, which is also the case here. To narrow it down further, change xpath accordingly.

analysis_results <- analyse_xml(path = test_folder,
                                pattern = "\\.prj$",
                                xpath = "/OpenGeoSysProject/nonlinear_solvers/nonlinear_solver",
                                print_findings = TRUE)
#> 
#> I parsed 2 valid XML files matching your pattern.
#> 
#> I found at least one element named nonlinear_solver in the following file(s):
#> beam.prj 
#> beam3d.prj 
#> 
#> In total, I found 2 element(s) named nonlinear_solver.
#> 
#> These are the child elements I found:
#>            name ex_occ p_occ total total_mean
#> 1          name      2   1.0     2        1.0
#> 2          type      2   1.0     2        1.0
#> 3      max_iter      2   1.0     2        1.0
#> 4 linear_solver      2   1.0     2        1.0
#> 5       damping      1   0.5     1        0.5

Now we can be sure our future class will be generated from the correct parameters. analyse_xml() returns a named list invisibly, let’s have a short look at it.

analysis_results
#> $xpath
#> [1] "/OpenGeoSysProject/nonlinear_solvers/nonlinear_solver"
#> 
#> $children
#>          name          type      max_iter linear_solver       damping 
#>          TRUE          TRUE          TRUE          TRUE         FALSE 
#> 
#> $attributes
#> logical(0)
#> 
#> $both_sorted
#>          name          type      max_iter linear_solver       damping 
#>          TRUE          TRUE          TRUE          TRUE         FALSE

You can see the list contains the xpath parameter passed to analyse_xml(), along with three named logical vectors called children, attributes and both_sorted respectively. They can be read like this: If an attribute or a child of the element specified by xpath always occurred, it is a required parameter for the new class. Else, it is an optional parameter. The logical vectors are sorted by occurrency, so the rarest children and attributes will go to the very end of their logical vector. Now, let’s generate some code!

For S3 classes, we generate a constructor like this:

generate_constructor(params = analysis_results,
                     print_result = TRUE)
#> new_prj_nonlinear_solver <- function(name,
#> type,
#> max_iter,
#> linear_solver,
#> damping = NULL) {
#> structure(list(name = name,
#> type = type,
#> max_iter = max_iter,
#> linear_solver = linear_solver,
#> damping = damping,
#> xpath = "nonlinear_solvers/nonlinear_solver",
#> attr_names = c(),
#> flatten_on_exp = character()
#> ),
#> class = "prj_nonlinear_solver"
#> )
#> }
#>

For S3 classes, we generate a helper like this:

generate_helper(params = analysis_results,
                print_result = TRUE)
#> #'prj_nonlinear_solver
#> #'@description tag: nonlinear_solver
#> #'@param name
#> #'@param type
#> #'@param max_iter
#> #'@param linear_solver
#> #'@param damping Optional: 
#> #'@export
#> prj_nonlinear_solver <- function(name,
#> type,
#> max_iter,
#> linear_solver,
#> damping = NULL) {
#> 
#> # Add coercing utility here
#> 
#> new_prj_nonlinear_solver(name,
#> type,
#> max_iter,
#> linear_solver,
#> damping)
#> }
#>

For R6 classes, we generate a constructor like this:

generate_R6(params = analysis_results,
            print_result = TRUE)
#> OGS6_nonlinear_solver <- R6::R6Class("OGS6_nonlinear_solver",
#> public = list(
#> #'@description
#> #'Creates new OGS6_nonlinear_solverobject
#> #'@param name
#> #'@param type
#> #'@param max_iter
#> #'@param linear_solver
#> #'@param damping Optional: initialize = function(name,
#> type,
#> max_iter,
#> linear_solver,
#> damping = NULL){
#> self$name <- name
#> self$type <- type
#> self$max_iter <- max_iter
#> self$linear_solver <- linear_solver
#> self$damping <- damping
#> }
#> ),
#> 
#> active = list(
#> #'@field name
#> #'Access to private parameter '.name'
#> name = function(value) {
#> if(missing(value)) {
#> private$.name
#> }else{
#> private$.name <- value
#> }
#> },
#> 
#> #'@field type
#> #'Access to private parameter '.type'
#> type = function(value) {
#> if(missing(value)) {
#> private$.type
#> }else{
#> private$.type <- value
#> }
#> },
#> 
#> #'@field max_iter
#> #'Access to private parameter '.max_iter'
#> max_iter = function(value) {
#> if(missing(value)) {
#> private$.max_iter
#> }else{
#> private$.max_iter <- value
#> }
#> },
#> 
#> #'@field linear_solver
#> #'Access to private parameter '.linear_solver'
#> linear_solver = function(value) {
#> if(missing(value)) {
#> private$.linear_solver
#> }else{
#> private$.linear_solver <- value
#> }
#> },
#> 
#> #'@field damping
#> #'Access to private parameter '.damping'
#> damping = function(value) {
#> if(missing(value)) {
#> private$.damping
#> }else{
#> private$.damping <- value
#> }
#> },
#> 
#> #'@field is_subclass
#> #'Access to private parameter '.is_subclass'
#> is_subclass = function() {
#> private$.is_subclass
#> },
#> 
#> #'@field subclasses_names
#> #'Access to private parameter '.subclasses_names'
#> subclasses_names = function() {
#> private$.subclasses_names
#> },
#> 
#> #'@field attr_names
#> #'Access to private parameter '.attr_names'
#> attr_names = function() {
#> private$.attr_names
#> }
#> ),
#> 
#> private = list(
#> .name = NULL,
#> .type = NULL,
#> .max_iter = NULL,
#> .linear_solver = NULL,
#> .damping = NULL,
#> .is_subclass = TRUE,
#> .subclasses_names = character(),
#> .attr_names = c(),
#> )
#> )

Ta-daa, you now have some nice stubs. Copy them into a script in the R folder of this package, add some documentation and validation to it and you’re almost done.

Integrating new classes

Now that we have a class, we need to tell the package it exists. This is so when we’re reading in or exporting a .prj file, it knows to automatically turn the content of our nonlinear_solver tag into an object of our new class and the other way around. To achieve this, execute the code in data_raw/xpaths_for_classes.R. What this will do is update the xpaths_for_classes parameter, adding an entry for your class. Afterwards, run xpaths_for_classes[["your_class_name"]]. It should return the xpath parameter of your class like so:

xpaths_for_classes[["prj_process"]]

# A class can have multiple xpaths if the represented node occurs at different positions.
xpaths_for_classes[["prj_convergence_criterion"]]

If the class you’ve created is a .prj top level class or a child of a top level wrapper node like processes, add a corresponding OGS6 private parameter and an active field. For example, the processes node is represented as a list, so I added the private parameter .processes = list() and the active field processes.

A lot of things in the r2ogs6 package work in a way that is a bit “meta”. Often times, functions are called via eval(parse(text = call_string)) where call_string has for example been concatenated out of info about the parameter names of a certain class. This saves a lot of code regarding import, export and script generation but requires that you’ve made the respective info available as shown here.

So we’ve analysed some files, generated some code, created a new class and registered it with the package… what now? That’s it actually, that’s the workflow. Well, at least it’s supposed to be.

Recursive function guide

If that wasn’t it, I’m afraid you might have to take a look at the functions handling import, export and benchmark script generation. These are a bit tricky because they use recursion which so far has proven to be efficient structure-wise but not exactly fun to think about.

read_in

to_node

generate_benchmark_script

Conclusion

I hope you’ve taken away some helpful information from this short guide. If you make changes to improve the workflow, please update this vignette for the next dev!

2023-04-05