library(AquaSensR)AquaSensR requires two input files to use the functions in the package:
The DQO file is an Excel workbook (.xlsx). The continuous monitoring data file can be an Excel workbook (.xlsx), a CSV file (.csv), or a comma-delimited text file (.txt). This vignette describes how to import and check each input dataset. It is critical that the input datasets follow the exact specified format. Example files with the correct format are included with the package and are used throughout.
Load the package in an R session after installation:
library(AquaSensR)First, specify the location of the two files by saving their paths to R variables. In practice you will supply paths to your own files, for example:
contpth <- "path/to/your/ContinuousData.xlsx"
dqopth <- "path/to/your/DQO.xlsx"The examples below use the files included with the package:
contpth <- system.file("extdata/ExampleCont1.xlsx", package = "AquaSensR")
dqopth <- system.file("extdata/ExampleDQO.xlsx", package = "AquaSensR")Use readASRcont() to import continuous monitoring data. The function reads the Excel file, automatically runs a series of checks via checkASRcont(), and then formats the result for downstream use. The tz argument sets the time zone for the output DateTime column (see OlsonNames() for valid values). The default value is Eastern without daylight savings (Etc/GMT+5) and does not need to be set explicitly, unless you need a different time zone. For example, if your data are in local time and the time zone observes DST, consider using a time zone like America/New_York that will automatically adjust for daylight savings.
AquaSensR accepts two input formats for the date and time information. The examples below demonstrate both.
Format 1 — separate Date and Time columns (ExampleCont1.xlsx):
contdat <- readASRcont(contpth)
#> Running checks on continuous data...
#> Checking column names... OK
#> Checking Date, Time are present... OK
#> Checking at least one parameter column is present... OK
#> Checking date format... OK
#> Checking time format... OK
#> Checking for missing values... OK
#> Checking parameter columns for non-numeric values... OK
#>
#> All checks passed!Format 2 — combined DateTime column (ExampleCont2.xlsx):
contpth2 <- system.file("extdata/ExampleCont2.xlsx", package = "AquaSensR")
contdat2 <- readASRcont(contpth2)
#> Running checks on continuous data...
#> Checking column names... OK
#> Checking DateTime is present... OK
#> Checking at least one parameter column is present... OK
#> Checking DateTime format... OK
#> Checking for missing values... OK
#> Checking parameter columns for non-numeric values... OK
#>
#> All checks passed!Both calls return identically structured output (see Output format below).
The continuous monitoring data file must follow one of two accepted schemas. Additional unrecognised columns will trigger an error.
Format 1: separate Date and Time columns
| Column | Description |
|---|---|
Date |
Observation date, parseable by lubridate::parse_date_time() in year-first (e.g., 2024-06-01), month-first (e.g., 06/01/2024), or day-first (e.g., 01/06/2024) formats |
Time |
Observation time in 24-hour (e.g., 16:30:33), 12-hour AM/PM (e.g., 4:30:33 PM), or Excel-native format (e.g., 1899-12-31 16:30:33) |
| At least one parameter column | Column name must match a Parameter entry in paramsASR (e.g., Water_Temp_C) |
Format 2: combined DateTime column
| Column | Description |
|---|---|
DateTime |
Combined date and time with the date in year-first (e.g., 2024-06-01 16:30:33), month-first (e.g., 06/01/2024 16:30:33), or day-first format, combined with 24-hour or 12-hour AM/PM time (e.g., 2024-06-01 4:30:33 PM) |
| At least one parameter column | Column name must match a Parameter entry in paramsASR (e.g., Water_Temp_C) |
Currently, AquaSensR allows the following parameters. Note the inclusion of the units in the parameter name. Make sure the parameter name matches the units used in your data.
| Description | Required file name | Units |
|---|---|---|
| Air Temp (C) | Air_Temp_C | deg C |
| Air Temp (F) | Air_Temp_F | deg F |
| Air BP (psi) | Air_BP_psi | psi |
| Air BP (mmHg) | Air_BP_mmHg | mmHg |
| Chlorophyll-a (μg/l) | Chlorophylla_ug_l | ug/l |
| Chlorophyll-a (RFU) | Chlorophylla_RFU | RFU |
| Pheophytin (μg/l) | Pheophytin_ug_l | ug/l |
| Pheophytin (RFU) | Pheophytin_RFU | RFU |
| pCO2 (ppm) | pCO2_ppm | ppm |
| Conductivity (μS/cm) | Conductivity_uS_cm | uS/cm |
| Salinity (ppt) | Salinity_ppt | ppt |
| Specific Conductance (μS/cm) | Sp_Conductance_uS_cm | uS/cm |
| Cyanobacteria (μg/l) | Cyanobacteria_ug_l | ug/l |
| Phycocyanin (μg/l) | Phycocyanin_ug_l | ug/l |
| Phycoerythrin (μg/l) | Phycoerythrin_ug_l | ug/l |
| DO (mg/l) | DO_mg_l | mg/l |
| DO Adjusted (mg/l) | DO_adj_mg_l | mg/l |
| DO (% Sat) | DO_pctsat | % |
| CDOM (mg/l) | CDOM_mg_l | mg/l |
| FDOM (mg/l) | FDOM_mg_l | mg/l |
| E. coli (#/100ml) | E_coli_#_100ml | #/100ml |
| E. coli (CFU/100ml) | E_coli_CFU_100ml | CFU/100ml |
| Discharge (cfs) | Discharge_cfs | cfs |
| Nitrate (μg/l) | Nitrate_ug_l | ug/l |
| PAR (μmol/m2/s) | PAR_umol_m2_s | umol/m2/s |
| pH | pH_SU | None |
| TDS (mg/l) | TDS_mg_l | mg/l |
| TSS (mg/l) | TSS_mg_l | mg/l |
| Turbidity (NTU) | Turbidity_NTU | NTU |
| Turbidity (FNU) | Turbidity_FNU | FNU |
| Gage Height (ft) | Gage_Height_ft | ft |
| Sensor Depth (ft) | Sensor_Depth_ft | ft |
| Water Pressure (psi) | Water_Pressure_psi | psi |
| Water Pressure (mmHg) | Water_Pressure_mmHg | mmHg |
| Water Temp (C) | Water_Temp_C | deg C |
| Water Temp (F) | Water_Temp_F | deg F |
The list above can also be viewed in R with the paramsASR dataset, which is included in the package and used for the checks.
paramsASR
#> # A tibble: 36 × 6
#> `Parameter Group` Parameter uom Label `WQX Parameter` `WQX Unit of measure`
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Air Temp Air_Temp… deg C Air … Temperature, a… deg C
#> 2 Air Temp Air_Temp… deg F Air … Temperature, a… deg F
#> 3 Barometric Press… Air_BP_p… psi Air … Barometric pre… psi
#> 4 Barometric Press… Air_BP_m… mmHg Air … Barometric pre… mmHg
#> 5 Chlorophyll Chloroph… ug/l Chlo… Chlorophyll a … ug/l
#> 6 Chlorophyll Chloroph… RFU Chlo… Chlorophyll a … RFU
#> 7 Chlorophyll Pheophyt… ug/l Pheo… Pheophytin a ug/l
#> 8 Chlorophyll Pheophyt… RFU Pheo… Pheophytin a RFU
#> 9 CO2 pCO2_ppm ppm pCO2… Partial Pressu… ppm
#> 10 Conductivity Conducti… uS/cm Cond… Conductivity uS/cm
#> # ℹ 26 more rowsThe readASRcont() function imports the data and runs a series of checks using the checkASRcont() function. Most checks stop with an informative error if they fail, except the check for missing values which produces a warning since these may occur in continuous data. The checks evaluate the following:
Date, Time, DateTime, or a recognised parameter from paramsASR.Date and Time (Format 1) or DateTime (Format 2).paramsASR$Parameter.Date are parseable by lubridate::parse_date_time() in year-first, month-first, or day-first formats.Time are parseable by lubridate::parse_date_time() in 24-hour, 12-hour AM/PM, or Excel-native formats.DateTime are parseable by lubridate::parse_date_time() with year-first, month-first, or day-first date order combined with 24-hour or 12-hour AM/PM time.NA values in parameter columns produce a warning listing the affected columns and row numbers. Missing values in DateTime, Date, or Time columns remain an error.Adding an unrecognised column causes checkASRcont() to stop immediately. The following examples demonstrate this for both formats.
nms <- names(readxl::read_excel(contpth, n_max = 0))
col_types <- ifelse(nms %in% c("Date", "Time", "DateTime"), "text", "guess")
contdat_raw <- suppressWarnings(
readxl::read_excel(
contpth,
col_types = col_types,
na = c("NA", "na", ""),
guess_max = Inf
)
)
contdat_raw$BadColumn <- 1
checkASRcont(contdat_raw)
#> Running checks on continuous data...
#> Error:
#> ! Checking column names...
#> Please correct the column names or remove: BadColumnAfter passing all checks, readASRcont() returns a data frame with the same structure regardless of input format:
DateTime: time-zone-aware POSIXct columnhead(contdat)
#> # A tibble: 6 × 8
#> DateTime Water_Temp_C DO_pctsat DO_mg_l Conductivity_uS_cm TDS_mg_l
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2024-08-14 13:56:33 24.2 76.9 6.44 410. 266
#> 2 2024-08-14 13:56:43 24.2 76.7 6.43 410. 266
#> 3 2024-08-14 13:56:53 24.2 76.6 6.42 410. 266
#> 4 2024-08-14 13:57:03 24.2 76.5 6.41 410. 266
#> 5 2024-08-14 13:57:13 24.2 76.3 6.4 409 266
#> 6 2024-08-14 13:57:23 24.2 76.3 6.39 409. 266
#> # ℹ 2 more variables: Salinity_ppt <dbl>, pH_SU <dbl>head(contdat2)
#> # A tibble: 6 × 8
#> DateTime Water_Temp_C DO_pctsat DO_mg_l Conductivity_uS_cm TDS_mg_l
#> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2024-08-14 13:56:33 24.2 76.9 6.44 410. 266
#> 2 2024-08-14 13:56:43 24.2 76.7 6.43 410. 266
#> 3 2024-08-14 13:56:53 24.2 76.6 6.42 410. 266
#> 4 2024-08-14 13:57:03 24.2 76.5 6.41 410. 266
#> 5 2024-08-14 13:57:13 24.2 76.3 6.4 409 266
#> 6 2024-08-14 13:57:23 24.2 76.3 6.39 409. 266
#> # ℹ 2 more variables: Salinity_ppt <dbl>, pH_SU <dbl>The data quality objectives file includes various information for the quality control checks applied to each parameter (see the quality control vignette for details). Use readASRdqo() to import the data quality objectives. The function reads the workbook, runs checks via checkASRdqo(), and returns a formatted data frame.
dqodat <- readASRdqo(dqopth)
#> Running checks on data quality objectives...
#> Checking column names... OK
#> Checking all columns present... OK
#> Checking at least one parameter is present... OK
#> Checking parameter format... OK
#> Checking Flag column... OK
#> Checking columns for non-numeric values... OK
#>
#> All checks passed!The workbook must contain exactly the following columns (all required; thresholds you do not want to apply should be left blank / NA):
| Column | Description |
|---|---|
Parameter |
Parameter name matching paramsASR$Parameter |
Flag |
Flag level for the thresholds in the row, either “Fail” or “Suspect” |
GrMin |
Gross range, lower threshold |
GrMax |
Gross range, upper threshold |
Spike |
Spike, absolute step size for a flag |
FlatN |
Flatline, run length at which a flag is triggered |
FlatDelta |
Flatline, the run range (max minus min) must be strictly less than this value to continue the run; a change equal to or greater than FlatDelta resets the run |
RoCStDv |
Rate of change, multiplier applied to the rolling SD (flag if \|diff\| > SD × RoCStDv) |
RoCHours |
Rate of change, look-back window length in hours |
The readASRdqo() function imports the data quality objectives and runs a series of checks using the checkASRdqo() function. The checks evaluate the following and stops with an informative error if any check fails:
Parameter column matches the Parameter column in paramsASRParameter column should match those in the Parameter column in paramsASRFlag column should contain only “Fail” or “Suspect” entriesParameter and Flag should be numeric valuesSupplying an unrecognised parameter name fails the parameter format check:
# import the data for the example
dqodat_raw <- suppressWarnings(
readxl::read_excel(dqopth, na = c("NA", "na", ""), guess_max = Inf)
)
# introduce a typo in the Parameter column
dqodat_raw$Parameter[1] <- "WaterTemp"
checkASRdqo(dqodat_raw)
#> Running checks on data quality objectives...
#> Checking column names... OK
#> Checking all columns present... OK
#> Checking at least one parameter is present... OK
#> Error:
#> ! Checking parameter format...
#> Incorrect parameter format: WaterTempAfter passing all checks, readASRdqo() returns a data frame with the columns listed in the format requirements table above, with all threshold columns coerced to numeric.
head(dqodat)
#> # A tibble: 6 × 9
#> Parameter Flag GrMin GrMax Spike FlatN FlatDelta RoCStDv RoCHours
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Water_Temp_C Suspect -0.5 28 1.5 60 0.01 6 25
#> 2 Water_Temp_C Fail -1 30 2 100 0.01 8 25
#> 3 DO_pctsat Suspect 0 100 10 30 0.01 6 25
#> 4 DO_pctsat Fail -1 120 25 60 0.01 NA NA
#> 5 DO_mg_l Suspect 2 16 2 30 0.01 6 25
#> 6 DO_mg_l Fail 1 18 4 60 0.01 NA NAThe remaining functions in AquaSensR can now be used after the continuous data and data quality objectives files are successfully imported.