Quality control overview

The quality control (QC) functions in AquaSensR can be used once the required data are successfully imported into R (see the inputs vignette for details). This vignette covers the primary functions of the QC workflow:

  • utilASRflag(): Applies four independent QC checks to a selected parameter and returns a data frame of flag results.
  • anlzASRflag(): Produces an interactive time-series plot of those flags for visual review.

Load the data

The examples throughout this vignette use the example files bundled with the package. Import both files before proceeding:

contpth <- system.file("extdata/ExampleCont1.xlsx", package = "AquaSensR")
dqopth <- system.file("extdata/ExampleDQO.xlsx", package = "AquaSensR")

contdat <- readASRcont(contpth)
#> Running checks on continuous data...
#>  Checking column names... OK
#>  Checking Date, Time are present... OK
#>  Checking at least one parameter column is present... OK
#>  Checking date format... OK
#>  Checking time format... OK
#>  Checking for missing values... OK
#>  Checking parameter columns for non-numeric values... OK
#> 
#> All checks passed!
dqodat <- readASRdqo(dqopth)
#> Running checks on data quality objectives...
#>  Checking column names... OK
#>  Checking all columns present... OK
#>  Checking at least one parameter is present... OK
#>  Checking parameter format... OK
#>  Checking Flag column... OK
#>  Checking columns for non-numeric values... OK
#> 
#> All checks passed!

contdat is a data frame with columns DateTime, and one numeric column per parameter. dqodat contains the parameter-specific data quality objectives (DQOs) for each check. See the inputs vignette for more information on the required formats.

utilASRflag() to flag continuous data

utilASRflag() is the primary QC function. It applies four independent checks to the chosen parameter in contdat.

Arguments

Three arguments are required for the function:

Argument Description
cont contdat data frame returned by readASRcont()
dqo dqodat data frame returned by readASRdqo()
param Name of the parameter column to evaluate (must match a column in contdat and a Parameter entry in dqodat)

Basic usage

Pass the two data frames and the name of the parameter to evaluate:

flagdat <- utilASRflag(contdat, dqodat, param = "Water_Temp_C")
head(flagdat)
#> # A tibble: 6 × 6
#>   DateTime            Water_Temp_C gross_flag spike_flag roc_flag flat_flag
#>   <dttm>                     <dbl> <chr>      <chr>      <chr>    <chr>    
#> 1 2024-08-14 13:56:33         24.2 pass       pass       pass     pass     
#> 2 2024-08-14 13:56:43         24.2 pass       pass       pass     pass     
#> 3 2024-08-14 13:56:53         24.2 pass       pass       pass     pass     
#> 4 2024-08-14 13:57:03         24.2 pass       pass       pass     pass     
#> 5 2024-08-14 13:57:13         24.2 pass       pass       pass     pass     
#> 6 2024-08-14 13:57:23         24.2 pass       pass       pass     pass

Output

utilASRflag() returns a data frame with the following columns:

Column Description
DateTime Observation timestamp
param The evaluated parameter values
gross_flag Flag from the gross range check
spike_flag Flag from the spike check
roc_flag Flag from the rate-of-change check
flat_flag Flag from the flatline check

Each flag column contains one of three values: "pass", "suspect", or "fail". Checks are independent of each other such that a single observation can receive any combination of flags across the four columns.

If no row in dqodat matches a parameter a warning is returned and the function leaves all flags as "pass" and continues.

QC checks explained

AquaSensR implements four QC checks that reflect widely used sensor data quality standards. The underlying concepts and code borrow heavily from the ContDataQC package. All threshold values are set in the data quality objectives file and can be customised per parameter. Manual update of these thresholds is likely necessary to avoide false positives and negatives. Importantly, these flags require manual verification and should not be used to automatically exclude data without review.

Any threshold value set to NA in the dqodat file is silently skipped such that the corresponding severity level is not applied and affected observations remain "pass" for that check. This applies to the "Suspect" and "Fail" rows independently, so individual checks or severity levels can be disabled selectively by leaving their threshold columns blank in the input file.

1. Gross range

DQO columns: GrMin, GrMax (thresholds differ by row: Flag = "Fail" vs Flag = "Suspect")

Flag column: gross_flag

The gross range check tests whether each observation falls within absolute physical or sensor limits. It is the broadest of the four checks and is intended to catch values that are simply impossible or outside the expected operating range of the instrument.

Each observation is compared to the thresholds in the two data quality objectives rows for that parameter:

  • Values below GrMin or above GrMax in the "Fail" row return "fail"
  • Values below GrMin or above GrMax in the "Suspect" row but within the fail bounds return "suspect"

The fail thresholds define hard physical limits (e.g., water temperature cannot be below −5 °C for a freshwater deployment). The suspect bounds are set somewhat more conservatively to flag readings that are unusual but not impossible.

Any threshold can be set to NA in the data quality objectives file to skip that particular flag.

Quickly view how many flags of each type were generated by the gross range check:

# Check which observations received a gross range flag
table(flagdat$gross_flag)
#> 
#>    pass suspect 
#>     923       4

2. Spike

DQO columns: Spike (threshold differs by row: Flag = "Fail" vs Flag = "Suspect")

Flag column: spike_flag

The spike check detects sudden, anomalous jumps (either up or down) between consecutive observations. It computes the absolute difference between each reading and the one immediately before it, then compares that difference to the thresholds in the two data quality objectives rows for that parameter:

  • |diff| ≥ Spike in the "Suspect" row returns "suspect"
  • |diff| ≥ Spike in the "Fail" row returns "fail"

The first observation in each series has no predecessor and is always left as "pass". Because the spike check flags the observation at the large step, a single anomalous reading embedded in otherwise stable data will generate two flagged observations — one for the step up (or down) to the outlier, and one for the step back to baseline.

The spike thresholds are absolute. For example, a 5 °C step is flagged regardless of whether the surrounding series is calm or noisy. The rate-of-change check (below) evaluates potentially spurious changes when relative variability matters.

Quickly view how many flags of each type were generated by the spike check:

table(flagdat$spike_flag)
#> 
#>    fail    pass suspect 
#>       1     925       1

3. Rate of change

DQO columns: RoCStDv, RoCHours (thresholds differ by row: Flag = "Fail" vs Flag = "Suspect")

Flag column: roc_flag

The rate-of-change (RoC) check is an adaptive counterpart to the spike check. Rather than comparing against a fixed step size, the check determines whether a step is large relative to the recent variability in the series.

For each observation the function:

  1. Collects all values within a trailing RoCHours-hour window ending just before that timestamp (the current observation is excluded).
  2. Computes the standard deviation (SD) of those preceding values.
  3. Multiplies the SD by RoCStDv to produce a contextual threshold.
  4. Flags the observation if the absolute lag-1 difference exceeds that threshold — "suspect" using the "Suspect" row thresholds and "fail" using the "Fail" row thresholds.

At least two values must fall within the window before a standard deviation can be computed; observations with fewer window values are not flagged. Each row is evaluated independently, so either or both severity levels can be active. Setting RoCStDv or RoCHours to NA for a row skips that severity level entirely.

The key advantage over the spike check is sensitivity scaling. During a “calm” period, a small absolute change can exceed the threshold, while during a naturally variable period (e.g., diurnal temperature swings) the threshold rises accordingly.

Quickly view how many flags of each type were generated by the rate of change check:

table(flagdat$roc_flag)
#> 
#> pass 
#>  927

4. Flatline

DQO columns: FlatN, FlatDelta (thresholds differ by row: Flag = "Fail" vs Flag = "Suspect")

Flag column: flat_flag

The flatline check identifies periods where a sensor appears to be stuck at a constant value, which can occur from sensor fouling, burial, or loss of power. The check counts the length of “runs” of near-identical consecutive values and flags observations whose run length reaches a specified count.

A run is defined by a minimum length (FlatN) and tolerance (FlatDelta), each read from the appropriate data quality objectives row. An observation extends the current run only when the range (max minus min) of all values in the run so far — including the new observation — is strictly less than FlatDelta. A change equal to FlatDelta is not treated as flat and resets the run. When the condition fails the run length resets to 1 starting from the current observation. The range-based approach prevents both single large jumps and slow cumulative drift from accumulating run length.

  • Run length ≥ FlatN (using FlatDelta tolerance) from the "Suspect" row returns "suspect"
  • Run length ≥ FlatN (using FlatDelta tolerance) from the "Fail" row returns "fail"

The suspect and fail thresholds are evaluated independently using their respective delta tolerances, so the two run lengths may differ. Either row can have NA values to skip that level.

Quickly view how many flags of each type were generated by the flatline check:

table(flagdat$flat_flag)
#> 
#> pass 
#>  927

anlzASRflag() to visualise flag results

The flags generated by utilASRflag() can be viewed using the anlzASRflag() function. This produces an interactive time-series plot:

anlzASRflag(flagdat)

The plot shows all observations as a continuous line. Non-passing observations are overlaid as coloured markers, with colour encoding the check type and shape encoding the severity:

Check Colour
Gross range Red
Spike Orange
Rate of change Purple
Flatline Blue
Severity Marker shape
Suspect Upward triangle
Fail Cross (×)

An observation flagged by more than one check appears as overlying markers for each check, so that all potential issues remain visible. Hovering over a marker reveals the check name, severity, parameter value, and timestamp. Items in the legend can be clicked to toggle visibility of a check or severity level, which is useful for reviewing specific flags in a busy plot. The plot can also be zoomed and panned to focus on specific periods.

A second parameter can be overlaid on the plot by passing a two-column data frame (with DateTime and the parameter of interest) to the overlay argument. The overlay is drawn as a light blue line on a right-side y-axis, making it easy to see whether flagged observations in one parameter co-occur with changes in another.

overlay_df <- contdat[, c("DateTime", "DO_pctsat")]
anlzASRflag(flagdat, overlay = overlay_df)

editASRflag() to review and clean data interactively

editASRflag() opens a Shiny application that lets you inspect the flag plot for every parameter and selectively remove observations before exporting the cleaned data back to R. The app uses utilASRflag() and anlzASRflag() under the hood to generate the flags and plots, but adds interactive selection tools to facilitate data cleaning.

The app can be opened by providing contdat and dqodat as arguments to editASRflag(). The app lets you interactively evaluate your data until you click Done / Close, at which point the cleaned data are returned to your R session.

cleaned <- editASRflag(contdat, dqodat)

Interface overview

Screenshot of the editASRflag Shiny app showing the flag plot and left sidebar.

The main editASRflag interface. The left sidebar contains parameter selection, overlay options, linked-removal controls, and the removed-points table. The flag plot for the selected parameter is in the center. The DQO Settings panel (right, not shown here) is accessed by clicking the toggle on the right edge of the plot area.

Selecting and removing points

Zoom and pan with the plot toolbar (visible when the pointer hovers over the plot, top-right corner) to focus on regions of interest before selecting. Three removal methods are available:

  • Click: remove a single point directly on the line or flag marker.
  • Box Select: drag a rectangle over a region to remove multiple points at once.
  • Lasso Select: draw a free-form outline around the points you want to remove.

After a box or lasso selection, double-click the plot background to clear the selection highlight before starting a new one. Each removal action is logged in the Removed Points table in the left sidebar (scroll down to view and/or expand the sidebard by clicking and dragging the edge to the right).

Control Action
Parameter Switch between parameters. Prev/Next buttons cycle through all parameters.
Overlay Display a second parameter from contdat on a right-side axis.
USGS Overlay Enter a USGS site number and select a parameter type, then click Load to fetch continuous data from NWIS and display it on the secondary axis. Loading USGS data clears any contdat overlay. Selecting a contdat overlay clears the USGS data. Site numbers can be found using the NWIS Mapper.
Linked Removal When checked, propagate every removal to all other parameters simultaneously. Undo restores all parameters together in the same batch.
Undo Last Removal Restore the most recently removed point or selection batch. Linked parameters are restored together.
Start Over Restore all removed points for every parameter and reset all DQO thresholds to their original values.
Export Progress Save the current cleaned data and DQO thresholds as Excel files in a ZIP archive. If any points have been removed, a removed-observations file is included.
Done / Close Stop the app and return the cleaned data.

The USGS Overlay feature uses readASRusgs() internally to pull unit-value (continuous) data from the NWIS API over the same date range as contdat. Supported parameter types are streamflow (00060), gage height (00065), and precipitation (00045). The fetched time series is displayed on the secondary y-axis in the same position as a contdat overlay but is retrieved live when Load is clicked. Users without an internet connection or outside NWIS coverage can still use the contdat Overlay selector instead.

DQO Settings panel

Clicking the toggle on the right edge of the plot area opens the DQO Settings panel. The panel shows the numeric QC thresholds from dqodat for the currently selected parameter across all four checks and both severity levels.

Screenshot of the editASRflag DQO Settings panel with numeric inputs for gross range, spike, rate of change, and flatline thresholds.

The DQO Settings panel, showing editable Suspect and Fail threshold inputs for each of the four QC checks.

Editing thresholds and clicking Apply re-computes flags for the current parameter while preserving any points already removed. Reset to original reverts the inputs to the values from the original dqo file and re-evaluates all flags. Threshold edits are per-parameter and independent.

Return value

editASRflag() returns a named list with three elements:

Element Description
contdat The original data frame sorted by DateTime, with all removed observations replaced by NA.
dqodat The DQO thresholds data frame reflecting any edits made in the DQO Settings panel. If no edits were made the values are identical to the input.
removed A stacked data frame of every removed observation, with columns Parameter, DateTime, and all four flag columns.
# View the cleaned continuous data
View(cleaned$contdat)

# Inspect the final DQO thresholds used
cleaned$dqodat

# View what was removed and its flags
View(cleaned$removed)

Removed rows in contdat are set to NA rather than dropped so the time series remains regular and aligned across all parameters.