Tag Archive for: Compliance

12 ways commercial SaaS can save your complex environmental data (part 2/4)

Continued from Part 1

Complex data - Data quality2) Data quality is better with databases

Since 2002, a dedicated group of Locus employees has been involved with migrating data into EIM from spreadsheets provided to us by customers and their consultants. As such, we have firsthand experience with the types of data quality issues that arise when using spreadsheets for entering and storing environmental data.

Here is just a small selection of these issues:

  • Locations with multiple variations of the same ID/name (e.g., MW-1, MW-01, MW 1, MW1, etc.)
  • Use of multiple codes for the same entity (e.g., SW and SURFW for surface water samples)
  • Loss of significant figures for numeric data
  • Special characters (such as commas) that may cause cells to break unintentionally over rows when moving data into another application
  • Excel’s frustrating insistence (unless a cell format has been explicitly specified) to convert CAS numbers like “7440-09-7 (Potassium)” into dates (“9/7/7440”)
  • Bogus dates like “November 31” in columns that have do not have date formats applied to them
  • Loss of leading zeros associated with cost codes and projects numbers (e.g., “005241”) that have only numbers in them but must be stored as text fields
  • The inability to enforce uniqueness, leading to duplicate entries
  • Null values in key fields (because entries cannot be marked as required)
  • Hidden rows and/or columns that can cause data to be shifted unintentionally or modified erroneously
  • Bogus numerical values (e.g., “1..3”, “.1.2”) stored in text fields
  • Inconsistent use of lab qualifiers— in some cases, these appear concatenated in the same Excel column (e.g., “10U, <5”) while in other cases they appear in separate columns

With some planning and discipline, you can avoid some of these problems in Excel. For example, you can create dropdown list boxes to limit the entries in a cell to certain values. However, this is not standard practice as most spreadsheets we receive come with few constraints built into them.

While databases are indeed not immune to data quality issues, it is much easier for database designers to impose effective constraints on users’ entries. Tasks such as limiting the values in a column to selected entries, ensuring that values are valid dates or numbers, forcing values to be entered in selected fields, and preventing duplicate records from being entered are all easy to implement and standard practice in databases.

However, properly designed databases can do even more. They can check that various combinations of values make sense—for example:

  • They can prevent users from entering analysis dates that are earlier than the associated sample dates.
  • They can verify that numerical entries are within a permitted range of values and make sense based on past entries. This is so popular its even part of our Locus Mobile app for collecting field data.

Databases also provide the ability to verify the completeness of your data:

  • Have all samples been collected?
  • Have all analyses been performed on a sample?
  • Are there any analytes missing from the laboratory’s findings?

You can specify such queries to run at any time. Replicating these checks within Excel, while not impossible, is simply not something most Excel users have the time, skill, or desire to build.


Complex data - Data redundancy3) It’s easier to prevent data duplication and redundancy when your data resides in your database

One of the most striking differences between spreadsheets and databases is the prevalence of redundant information in spreadsheets. Consider, for example, these three tables in EIM:

  1. LOCATION
  2. FIELD_SAMPLE
  3. FIELD_SAMPLE_RESULT

In this subset of their columns, “PK” signifies that the column is a member of the “primary key” of the table. The combination of values in these columns must be unique for any given record.

Complex data - Table - Primary key

The two columns LOCATION_ID and SITE_ID can be used to link (join) the information in the FIELD_SAMPLE table. Furthermore, FIELD_SAMPLE_ID and SITE_ID can be used to link the information in FIELD_SAMPLE_RESULT to FIELD_SAMPLE. Because these links exist, we only need to store the above attributes of a given location or field sample once— in one table. This is very different from how data is handled in a single spreadsheet.

Let’s compare how the data in a few of these columns might appear in a single spreadsheet compared to a database. We’ll look at the spreadsheet first:

Complex data - Location Table

Next, let’s see how this information would be stored in a database. Here we can see more fields since we’re not as constrained by width.

First, the LOCATION table:

Complex data - Location ID Table

Then, FIELD_SAMPLE:

Complex data - Field Sample Table

Lastly, FIELD_SAMPLE_RESULT:

Complex data - Field Sample Result Table

Note one of the most striking differences between the spreadsheet and the database tables above is that much redundant information is included in the spreadsheet. The Location Type of “WELL” is repeated in every record where location MW-01 appears, and the sample date of “04/17/2017” is repeated wherever sample MW-01-12 is present. Redundant information represents one of the most significant drawbacks of using spreadsheets for storing large amounts of data when many of the data values themselves (e.g., LOCATION_ID and FIELD_SAMPLE_ID above) have multiple attributes that need to be stored as well.

Most spreadsheet data that we have received for import into EIM have consisted of either:

  1. Multiple worksheets of the same or similar formats, all containing a combination of sampling and analytical data
  2. A single worksheet containing tens of thousands of rows of such data

Occasionally, customers have sent us multiple spreadsheets containing very different types of data, with one or more hosting sample and analytical results, and others containing location, well construction, or other supporting data. However, this is atypical; in most of the migrations that we have performed, redundant data is pervasive in the spreadsheet’s contents and inconsistencies in entries are common.

Entering new records in a spreadsheet structured like the example above requires that the attributes entered for LOCATION_ID and FIELD_SAMPLE_ID be consistent across all records whose values are the same in these columns.

The real problems surface when you have to edit records. You must correctly identify all affected records and change them all identically and immediately.

Sounds relatively straightforward, doesn’t it?

In fact, judging by what we have seem in our data migrations, discrepancies invariably creep into spreadsheets when edits are attempted. These discrepancies must be resolved when moving the data into a database where constraints prohibit, for example, a single sample from having multiple sample dates, times, purposes, etc.

In addition, audit trails are all but nonexistent in Excel. Many users tend to save the edited version with a new filename as a crude form of audit tracking. This can quickly lead to a data management nightmare with no documented audit tracking. Just as important, almost all our customers, especially customers involved with regulatory reporting, require audit tracking. This is typically required on sites that may be involved in litigation and decisions are made on the health and safety risks of the site necessitating defensible and unimpeachable data.


Complex data - Entity relationships4) Entity relationships are more manageable in databases

The discussion of data duplication and redundancy touches on another significant difference between databases and spreadsheets—how entity relationships are handled.

Excel stores data in a two-dimensional grid. While it is possible to create relationships between data in different worksheets, this is not the norm and there are many limitations. More often, as we have stated elsewhere, Excel users tend to store their data in a single spreadsheet that grows increasingly unwieldy and hard to read as records are added to it.

Let’s consider some of the relationships that characterize environmental sampling and analytical data:

  • Sampling locations are associated with sites or facilities—or, for our water utility customers, individual water systems. They may also belong to one or more planned sampling routes.
  • Different sampling locations have their own analytical and field measurement requirements.
  • Individual samples may be associated with one or more specific permits or regulatory requirements.
  • Trip, field, and equipment rinsate samples are linked to one or more regular field samples.
  • Analytical results are assigned to analysis lots and sample delivery groups (SDGs) by the laboratory.
  • Analysis lots and SDGs are the vehicle for linking laboratory QC samples to regular samples.
  • Analytical parameters are associated with one or more regulatory limits.
  • Individual wells are linked to specific boreholes and one or more aquifers.

Modeling and building these relationships in Excel would be quite difficult. Moreover, they would likely lack most of the checks that a DBMS offers, like preventing orphans (e.g., a location referenced in the FIELD_SAMPLE table that has no entry in the LOCATION table).


Complex data - Reporting & Integration5) Data reporting and integration is faster and easier with databases

How do you create a report in Excel? If you’re working with a single spreadsheet, you use the “Data Filter” and “Sort” options to identify the records of interest, then move the columns around to get them in the desired sequence. This might involve hiding some columns temporarily.

If you make a copy of your data, you can delete records and columns that you don’t want to show. If your data is stored in multiple spreadsheets, you can pull information from one sheet to another to create a report that integrates the different types of data housed in these spreadsheets. But this is a somewhat tedious process for all but the simplest of reports.

Let’s contrast this drudgery with the simplicity and power offered by relational databases.

In Locus EIM, for example, you pick the primary and secondary filter categories that you want to use to restrict your output to the records of interest. Then, you select the specific values for these data filter categories (usually from dropdowns or list-builder widgets). There is no limit on how many categories you can filter on.

Typically, you then choose a date range. Lastly, you pick which data columns you want to view, and in what order. These columns can come from many different tables in the database. For ease of selection, these also appear in dropdowns or list-builder widgets.

When you have made your filter selections, Locus EIM pulls up the records matching your selection criteria in a data grid. You can further filter the records by values in specific columns in this grid, or hide or rearrange columns. If you want to share or keep a record of these data, you can export the contents of the displayed grid to a text file, Excel, XML, PDF, or copy to your clipboard.

The list of reports spans all the major types of data stored in Locus EIM, including location and sample collection information, chain of custody and requested analyses data, analytical results, field measurements, and well and borehole data. Additional reports provide options to perform statistical calculations, trend analyses, and comparisons with regulatory and other limits.

In short, when it comes to generating reports, databases are superior to spreadsheets in almost every aspect. However, that doesn’t mean spreadsheets have no role to play. Many Locus EIM users charged with creating an ad hoc report prefer to download their selected output to Excel, where they apply final formatting and add a title and footer.  Although, with some of the newer reporting tools, such as Locus EIM’s new enhanced formatted reports, that functionality is also built into the DBMS. The more sophisticated the database, the more advanced and robust reporting options will be available.

12 reasons why commercial SaaS databases are ideal

Make sure to read the entire series to find out about 12 reasons commercial SaaS databases excel at managing complex environmental data!

About the author—Gregory Buckle, PhD, Locus Technologies

Gregory Buckle, PH.D.Dr. Buckle has more than 30 years of experience in the environmental field, most of which have been devoted to the design, development, and implementation of environmental database management systems. When he joined Locus in 1999, he was responsible for building and deploying Locus’ cloud-based EIM software. He was also instrumental in customizing EIM for the water utility industry and developing EIM’s powerful Sample Planning and Data Validation modules. The latest iteration of the Sample Planning module that Dr. Buckle built is currently being used by Los Alamos National Laboratory and San Jose Water Company to plan and schedule thousands of samples per year.


About the author—Marian Carr, Locus Technologies

Marian CarrMs. Carr is responsible for managing overall customer solution deployments and customer relationships with Locus’ government accounts. Her career at Locus includes heading the product development team of the award-winning cloud-based environmental ePortal solution as well as maintaining and growing key customer accounts with Locus’ Fortune 100 enterprise deployments. In addition, Ms. Carr was instrumental in driving the growth and adoption of the Locus EIM platform with key federal and water organizations.


 

Have a question about Locus’ cloud-based environmental software?

    First name

    Last name

    Email address

    Phone number

    Company

    Job title

    Tell us about your company's needs

    Locus is committed to preserving your privacy.

    12 ways commercial SaaS can save your complex environmental data (part 1/4)

    Do you currently use a system of Excel spreadsheets to store your environmental data? If so, ask yourself the following questions:

    • Do you find yourself having to make the same changes in multiple spreadsheets?
    • Is your spreadsheet growing unwieldy and difficult to manage?
    • Are you finding that you’re spending more and more time scrolling through your spreadsheet, looking for specific information?
    • Do you have to jump through hoops to view specific subsets of data?
    • Do multiple people sometimes need access to the data at the same time? Or, are your colleagues continually asking you to provide them with copies or subsets of the data in your spreadsheet?
    • Are there redundancies in your data? Is the same information repeated on multiple rows of your spreadsheet?
    • Do you ever encounter erroneous entries that have been typed in by hand?
    • Are you concerned about the long-term security of your data?
    • Do you often wonder exactly where your data are?
    • Does someone else really own your data (perhaps your IT department)?

    If you answer “yes” to any of these questions, you might be outgrowing your homegrown system of Excel spreadsheets.  It may be time to consider a more mature tool to manage and store your environmental data.

    The advantages of databases over spreadsheets for managing complex data

    Before we look at other options, let’s examine the differences in how data are stored and managed in spreadsheets and databases.

    A spreadsheet consists of rows and columns. At the intersection of each are cells that store data values. Some cells can refer to other cells, and some cells can perform processing on other individual (or groups of) cell values.

    In contrast, a database is made up of named tables that contain records. Each record has columns in which values are stored.  Each table stores information on a particular type of entity. For environmental data, this could be field samples, sampling locations, analytical results, regulatory limits, or laboratory methods. Typically, one or more columns in each record store values that uniquely identify an instance of the entity. In the case of a field sample, this could be the “field sample ID”; for a location, the “location ID”.

    Complex data - Excel spreadsheets

    Locus user tips
    In Locus EIM, Site ID is also part of the primary key for locations and field samples to accommodate customers with multiple waste sites, facilities, or water systems.

    As we move to analytical or field measurements, we have to use more columns to uniquely identify a record (e.g., date, time, field sample or location ID, parameter). The remaining columns in a table that are not part of the “primary key” identify other attributes of the entity.  For samples, these attributes include sample date and time, sample matrix, sample purpose, sampling event, sampling program, etc.

    If you think of a data table as a grid with rows and columns, it seems very similar to a spreadsheet—but there’s a fudamental difference. With a spreadsheet, how you view or report the data is dictated by how it appears in the spreadsheetWYSIWYG. If you need to view the data differently, you must reformat the spreadsheet.  In contrast, you can view information stored in a database (or serve it up in a report) in multiple ways that doesn’t necessarily depend on how the data is stored in the underlying tables.

    Databases, which are often referred to by the acronym DBMS (Database Management Systems), offer many other advantages over spreadsheets when dealing with complex data.

    Here are 12 key areas where databases—especially cloud databases built for industry-specific needs—surpass their spreadsheet counterparts.

    Locus user tips
    Pay close attention to section 2 on “Data quality”. Over the years, Locus has helped many of our customers move their data from spreadsheets into Locus EIM. Invariably, these migrations have unearthed many data issues that went undetected until we had to map and move the data into Locus EIM.

    12 reasons why commercial SaaS databases are ideal

    If, at the end of this guide, you’re still not convinced of the advantages of databases over spreadsheets for data storage, consider Microsoft’s recommendations as to when to use its low-end DBMS (Access) and when to use Excel.

    Microsoft emphasizes that Excel can store large amounts of data in worksheets. However, it notes that Excel is not intended to serve as a database, but is optimized for data analysis and calculation.

    According to Microsoft:

    Use Access when you:

    • Anticipate many people working in the database and you want robust options that safely handle updates to your data, such as record locking and conflict resolution.
    • Anticipate the need to add more tables to a dataset that originated as a flat or nonrelational table.
    • Want to run complex queries.
    • Want to produce a variety of reports or mailing labels.

    Use Excel when you:

    • Require a flat or nonrelational view of your data instead of a relational database that uses multiple tables, and when your data is mostly numeric.
    • Frequently run calculations and statistical comparisons on your data.
    • Want to perform sophisticated “what-if” analysis operations on your data, such as statistical, engineering, and regression analysis.
    • Want to keep track of items in a simple list, either for personal use or for limited collaboration purposes.

    In this 4-part blog series, we’ll explore in detail each of the 12 key areas where cloud-based environmental databases excel over home-grown spreadsheets.

    Let’s get started!


    1) Data entry is better with databases

    Complex data - Data entryIf you use spreadsheets to manage your environmental information, how do you get data into it?

    If you’re collecting the same information every week, month, quarter, or year, perhaps you have a template that you use. You might fill in only the data fields that change from one event to another, then append the rows in this template to an existing worksheet, or insert them into a new one. Alternatively, you might copy a set of rows in your spreadsheet, and then edit any fields with values that have changed.

    In the case of analytical data, if you don’t have to type in the data manually, perhaps your lab provides data in a spreadsheet that mirrors the structure of your spreadsheet, allowing you to cut and paste it without edits.

    Each of these methods of entering data has limitations and risks:

    • Manual entry inevitably introduces errors, unless someone is independently checking every entry for accuracy.
    • Copying and editing are notoriously prone to mistakes. It is too easy to overlook fields that should be updated in the copied records.
    • Getting a lab to send you data in a spreadsheet whose structure mirrors yours can be problematic, even more so if you deal with different labs for different types of analyses. Even then, there is no check on the validity of the laboratories’ entries.
      • Are all date and number fields actually the correct data types?
      • Do all required fields have values in them?

    Databases provide various means of data input.  Two of the most commonly used methods are form entry (for when you need to enter a few records at a time) and EDDs (Electronic Data Deliverables), used for uploading text files containing tens, hundreds, or even thousands of data records in text or zipped files.

    Flexible form configuration as a standard database feature

    Databases provide unlimited flexibility in designing forms—with searchable lookup fields, advanced form controls, sophisticated styling, context-sensitive help, data validation, event handlers, and the ability to conditionally display individual or blocks of fields, based on the user’s selections.

    Locus user tips
    Locus EIM offers over 30 forms for entering and editing water systems, locations, sample collection, chains of custody, analytical data, field measurements, water levels, boreholes, well construction and other information. All of these input forms have multiple dropdown list boxes for the display of lookup values and online help.  You can easily hide unused forms for your organization to simplify the system interface and menu structure.

    Better, faster batch data loading with EDDs

    The real strength of databases comes about from their ability to load and process EDDs. Each record in an EDD typically consists of 10-50 fields (e.g., in the case of laboratory analyses: Field Sample ID, Analytical Method, Analysis Date, Lab Result, Units, etc.).  The data in these EDDs can be checked for incorrect data types, missing required values, entries that are restricted by lookup tables or LOVs (Lists of Values), and duplicates.

    Locus user tips
    Locus EIM’s powerful EDD loader can upload and error check several thousand records in under a minute. Labs need not all use the same format – the data will still end up in the same place in the database.  In fact, Locus EIM even has a special lab interface so (with your permission) your labs can upload their own EDDs.  This lab interface shows only a very small part of Locus EIM, namely the EDD Loader and selected LOVs that lab users would need to know.

     

    Make sure to read the entire series to find out about 12 reasons commercial SaaS databases excel at managing complex environmental data!

     


    About the author—Gregory Buckle, PhD, Locus Technologies

    Gregory Buckle, PH.D.Dr. Buckle has more than 30 years of experience in the environmental field, most of which have been devoted to the design, development, and implementation of environmental database management systems. When he joined Locus in 1999, he was responsible for building and deploying Locus’ cloud-based EIM software. He was also instrumental in customizing EIM for the water utility industry and developing EIM’s powerful Sample Planning and Data Validation modules. The latest iteration of the Sample Planning module that Dr. Buckle built is currently being used by Los Alamos National Laboratory and San Jose Water Company to plan and schedule thousands of samples per year.


    About the author—Marian Carr, Locus Technologies

    Marian CarrMs. Carr is responsible for managing overall customer solution deployments and customer relationships with Locus’ government accounts. Her career at Locus includes heading the product development team of the award-winning cloud-based environmental ePortal solution as well as maintaining and growing key customer accounts with Locus’ Fortune 100 enterprise deployments. In addition, Ms. Carr was instrumental in driving the growth and adoption of the Locus EIM platform with key federal and water organizations.


     

    Have a question about Locus’ cloud-based environmental software?

      First name

      Last name

      Email address

      Phone number

      Company

      Job title

      Tell us about your company's needs

      Locus is committed to preserving your privacy.

      EHS Compliance Software: The difference between configurability and customization

      As you shop around for EHS compliance software, you’re quite likely to hear two similar words: “configurable” and “customizable.” You might hear these two words in answer to your question, “Can your software do _______ ?” Your implementation success will depend on which of the two words you put more weight in your selection of the vendor. Therefore, it is important to understand the difference between these two similar words.

      Configurable means the software can do what you’re asking it to do “out of the box” with a few simple keystrokes. The software is designed to be easily modified by the end user (user developer) who has no programming background. For example, if exceeding water quality limit for a certain parameter in your software is called an “exceedance” but your new water utility customer is using the term “outlier”, configurable software lets you change the word on the form from “exceedance” to “outlier” without any programming or recompiling of the code involved, and without needing assistance from your software vendor. Often, the software will feature configuration options or a configuration workbench where you simply input all such terms and titles from a series of dropdown menus or drag-and-drop functionality. In other words, features and functions of the software are configurable if they are part of the off-the-shelf product.

      Customization is a completely different feature. Unlike configurability, customization requires additional software programming (expensive), typically performed by software developers. Customizing software often incurs additional expense to the client. It also takes longer time and requires you to execute a change order—never a pleasant process.

      Understanding the difference between configurability and customization also brings awareness of the total cost of ownership (TCO) of your EHS software. Configurability is rolled into the software and has no additional fees. Customization requires expensive programming, usually for an additional charge (think “change order”). It is good practice to ask your software vendor upfront which features are configurable and which are customizable. The entire focus of EHS software selection should be on configurability.

      I have seen many customers and their consultants and research analysts make a cardinal mistake by focusing on software features and functionality that exist in the software off-the-shelf without asking a single question about configurability. No wonder so many EHS software implementations fail or cost orders of magnitude more than the winning bid. It is not about features and functionality that exist in existing EHS applications, but it is about how easy it is to add, build, or configure features, functionality, or whole new applications that may not be present today using non-developers. It is about the flexibility of the platform, not about the rigidity of applications.

       

      Locus Platform EHS configuration workbench custom workflows

       

      When you’re selecting configurable EHS software, make sure to consider this: If you have domain expertise in EHS and you know how to build a PowerPoint presentation, or you can draw a flowchart, or you can build a spreadsheet using formulae, with sorting tables and charts, then you can build any feature and functionality into your EHS software—provided the software is configurable off-the-shelf.

      To put it in simple terms, you are a user developer. You will save your company lots of money and headache and avoid tons of change orders. I should also note that most of the end-user configurable software is built on multi-tenant SaaS architecture and offers drag-and-drop functionality.

      Locus application support services for configurable EHS software

      Locus is ready for e-Manifest

      EPA is establishing a national system for tracking hazardous waste shipments electronically. This system, known as “e-Manifest,” will modernize the nation’s cradle-to-grave hazardous waste tracking process. EPA is on schedule to launch e-Manifest on June 30, 2018.


      e-Manifest infographic

      Download the latest fact sheets for e-Manifest stakeholders.
      These fact sheets provide an overview of the e-Manifest program and the impacts it will have on each stakeholder. Each fact sheet outlines basic information about the e-Manifest system, how the specific stakeholder will be impacted, and what actions they need to take to use the e-Manifest system.

      [sc_button link=”https://www.locustec.com/applications/ehs-compliance/waste-management/” text=”Learn more about Locus’ Waste Management application” link_target=”_self” color=”ffffff” background_color=”52a6ea”]

      Have a question about Locus’ e-Manifest support?

        First name

        Last name

        Email address

        Phone number

        Company

        Job title

        Tell us about your company's needs

        Locus is committed to preserving your privacy.