Tag Archive for: hub-ai

Predicting Water Quality with Machine Learning

At Locus Technologies, we’re always looking for innovative ways to help water users better utilize their data. One way we can do that is with powerful technologies such as machine learning. Machine learning is a powerful tool which can be very useful when analyzing environmental data, including water quality, and can form a backbone for competent AI systems which help manage and monitor water. When done correctly, it can even predict the quality of a water system going forward in time. Such a versatile method is a huge asset when analyzing data on the quality of water.

To explore machine learning in water a little bit, we are going to use some groundwater data collected from Locus EIM, which can be loaded into Locus Platform with our API. Using this data, which includes various measurements on water quality, such as turbidity, we will build a model to estimate the pH of the water source from various other parameters, to an error of about 1 pH point. For the purpose of this post, we will be building the model in Python, utilizing a Jupyter Notebook environment.

When building a machine learning model, the first thing you need to do is get to know your data a bit. In this case, our EIM water data has 16,114 separate measurements. Plus, each of these measurements has a lot of info, including the Site ID, Location ID, the Field Parameter measured, the Measurement Date and Time, the Field Measurement itself, the Measurement Units, Field Sample ID and Comments, and the Latitude and Longitude. So, we need to do some janitorial work on our data. We can get rid of some columns we don’t need and separate the field measurements based on which specific parameter they measure and the time they were taken. Now, we have a datasheet with the columns Location ID, Year, Measurement Date, Measurement Time, Casing Volume, Dissolved Oxygen, Flow, Oxidation-Reduction Potential, pH, Specific Conductance, Temperature, and Turbidity, where the last eight are the parameters which had been measured. A small section of it is below.

Locus Machine Learning - Data

Alright, now our data is better organized, and we can move over to Jupyter Notebook. But we still need to do a bit more maintenance. By looking at the specifics of our data set, we can see one major problem immediately. As shown in the picture below, the Casing Volume parameter has only 6 values. Since so much is missing, this parameter is useless for prediction, and we’ll remove it from the set.

Locus Machine Learning - Data

We can check the set and see that some of our measurements have missing data. In fact, 261 of them have no data for pH. To train a model, we need data which has a result for our target, so these rows must be thrown out. Then, our dataset will have a value for pH in every row, but might still have missing values in the other columns. We can deal with these missing values in a number of ways, and it might be worth it to drop columns which are missing too much, like we did with Casing Volume. Luckily, none of our other parameters are, so for this example I filled in empty spaces in the other columns with the average of the other measurements. However, if you do this, it is necessary that you eliminate any major outliers which might skew this average.

Once your data is usable, then it is time to start building a model! You can start off by creating some helpful graphs, such as a correlation matrix, which can show the relationships between parameters.

Locus Machine Learning - Corr

For this example, we will build our model with the library Keras. Once the features and targets have been chosen, we can construct a model with code such as this:

Locus Machine Learning - Construct

This code will create a sequential deep learning model with 4 layers. The first three all have 64 nodes, and of them, the initial two use a rectified linear unit activation function, while the third uses a sigmoid activation function. The fourth layer has a single node and serves as the output.

Our model must be trained on the data, which is usually split into training and test sets. In this case, we will put 80% of the data into the training set and 20% into the test set. From the training set, 20% will be used as a validation subset. Then, our model examines the datapoints and the corresponding pH values and develops a solution with a fit. With Keras, you can save a history of the reduction in error throughout the fit for plotting, which can be useful when analyzing results. We can see that for our model, the training error gradually decreases as it learns a relationship between the parameters.

Locus Machine Learning - Construct

The end result is a trained model which has been tested on the test set and resulted in a certain error. When we ran the code, the test set error value was 1.11. As we are predicting pH, a full point of error could be fairly large, but the precision required of any model will depend on the situation. This error could be improved through modifying the model itself, for example by adjusting the learning rate or restructuring layers.

Locus Machine Learning - Error

You can also graph the true target values with the model’s predictions, which can help when analyzing where the model can be improved. In our case, pH values in the middle of the range seem fairly accurate, but towards the higher values they become more unreliable.

Locus Machine Learning - Predict

So what do we do now that we have this model? In a sense, what is the point of machine learning? Well, one of the major strengths of this technology is the predictive capabilities it has. Say that we later acquire some data on a water source without information on the pH value. As long as the rest of the data is intact, we can predict what that value should be. Machine learning can also be incorporated into examination of things such as time series, to forecast a trend of predictions. Overall, machine learning is a very important part of data analytics and the development of powerful AI systems, and its importance will only increase in the future.

What’s next?

As the technology around machine learning and artificial intelligence evolves, Locus will be working to integrate these tools into our EHS software. More accurate predictions will lead to more insightful data, empowering our customers to make better business decisions.

 

Artificial Intelligence and EHS Compliance Revisited, Part 4: AI, Big Data + Multitenancy = The Perfect System

This article was originally published in 2019. It has been updated to reflect the realities of AI for EHS in 2025.

AI and Big Data to Drive EHS Decisions via Multitenant SaaS

With data and information streaming from devices like fire hydrants, there is little benefit from raw data unless a company owning the data has a way to integrate it into its record system and pair it with regulatory databases and GIS. That is where the advancement in SaaS tools and data sources mashups has positioned some EHS software companies to capitalize on AI.

Humans are not very good at analyzing large datasets. This is particularly true with data at the planetary level that are now growing exponentially to understand causes and fight climate change. Faced with a proliferation of new regulations and pressure to make their companies “sustainable,” EHS departments keep adding more and more compliance officers, managers, and outside consultants, instead of investing in technology that can help them. Soon, they will be able to rely on AI technology to stay on top of the ever-changing regulatory landscape — but only if they have software that was built to accommodate AI.

Locus - Big Data - IoT - AI

AI, in addition to being faster and more accurate, should make compliance easier. Companies spend too much time and effort on the comprehensive quarterly or annual reporting—only to have to duplicate the work for the next reporting period. The integrated approach, aided by AI, will automate these repetitive tasks and make it easier than just having separate analyses performed on every silo of information before having a conversation with regulators.

In summary, whether it is being used to help with GHG emissions monitoring and reporting, water quality management, waste management, incident management, or other general compliance functions, AI can improve efficiency, weed out false-positive results, cut costs and make better use of managers’ time and company resources.

Complex data - Data redundancy

Another advantage of AI, assuming it is deployed properly, concerns its inherent neutrality on data evaluation and decision making. Time and time again we read in the papers about psychological studies and surveys that show people on opposite sides of a question or topic cannot even agree on the “facts.” It should not be surprising then to find that EHS managers and engineers are often limited by their biases. As noted in the recent best-seller book by Nobel Memorial Prize in Economics laureate Daniel Kahneman, “Thinking, Fast and Slow,” when making decisions, they frequently see what they want, ignore probabilities, and minimize risks that uproot their hopes. Even worse, they are often confident even when they are wrong. Algorithms with AI built-in are more likely to detect our errors than we are. AI-driven intelligent databases are now becoming powerful enough to help us reduce human biases from our decision-making. For that reason, large datasets, applied analytics, and advanced charting and data visualization tools, will soon be driving daily EHS decisions.

In the past, companies almost exclusively relied upon on-premise software (or single-tenant cloud software, which is not much different from on-premise). Barriers were strewn everywhere. Legacy systems did not talk to one another, as few of the systems interfaced with one another. Getting data into third-party apps usually required the information to be first exported in a prescribed format, then imported to a third-party app for further processing and analysis. Sometimes data was duplicated across multiple systems and apps to avoid the headache of moving data from one to another.  As the world moves to the multitenant SaaS cloud, all this is now changing. Customers are now being given the opportunity to analyze not just their company’s data, but data from other companies and different but potentially related and coupled categories via mashups. As customers are doing so, interesting patterns are beginning to emerge.

The explosion of content—especially unstructured content—is an opportunity and an obstacle for every business today.

The emergence of artificial intelligence is a game-changer for enterprise EHS and content management because it can deliver business insights at scale and make EHS compliance more productive. There are numerous advantages when you combine the leading multitenant EHS software with AI:

  • Ability to handle the explosion of unstructured content where legacy on-premise EHS solutions can’t.
  • AI can organize, illuminate, and extract valuable business insights if all your content is managed in one secure location in the cloud.
  • Locus software helps you take advantage of the latest AI developments and apply them to all your content.

As noted in a NAEM white paper, Why Companies Replace Their EHS&S Software Systems, people want the ability to integrate with other systems as a top priority.  Once the ability to share/consolidate data is available, they are positioned to leverage AI every day.

This concludes the four-part blog series on Big Data, IoT, AI, and multitenancy. We look forward to feedback on our ideas and are interested in hearing where others see the future of AI in EHS software – contact us for more discussion or ideas! Read the full Series: Part One, Part Two, Part Three.

 

Artificial Intelligence and EHS Compliance Revisited, Part 3: Multitenancy and AI

Key Benefits of Multitenant Cloud Architecture for EHS SaaS 

This article was originally published in 2019, and it has been updated to reflect the realities of AI for EHS in 2025.

Multitenancy offers distinct benefits over traditional, single-tenant software hosting. A multitenant SaaS provider’s resources are focused on maintaining a single, current version of the application, rather than having its resources diluted in an attempt to support multiple software versions for its customers. If a provider is not using multitenancy, it may be hosting or supporting thousands of single-tenant customer implementations. By doing so, a provider cannot aggregate information across customers and extract knowledge from large data sets as every customer may be housed on a different server and possibly a different version of software. For these reasons, it is almost impossible and prohibitively expensive to deliver modern artificial intelligence (AI) tools via single tenancy.

Locus Multi-Tenant Software
View Infographic | Download White Paper

Multitenancy has other advantages as well. Because every customer is on the same version of the software and the same instance, machine-learning (a prerequisite for AI) can happen more quickly as large datasets are constantly fed into one system. A multitenant SaaS vendor can integrate and deploy new AI features more quickly, more frequently, and to all customers at once. Lastly, a single software version creates more of a sense of community among users and facilitates the customers’ ability to share their lessons learned with one another (if they chose to do that). Most of today’s vendors in the EHS&S software space cannot offer AI, sustain their businesses, and grow unless they are a true multitenant SaaS provider. 

Leveraging AI for EHS Compliance, ESG Reporting, and More

Almost 30 years after the publication of our paper on the hazardous data explosion, SaaS technologies combined with other advancements in big data processing are rising to the challenge of successful processing, analyzing, and interpreting large quantities of environmental and sustainability data. It is finally time to stop saying that AI is a promising technology of the future. It’s here. 

Gone are the days when EHS software was just a database. There are two factors that are fueling the adoption of AI technologies for EHS compliance and water quality management. First, there is Big Data and all the burdens and opportunities that come with it. Second, there is the move to true multitenant SaaS solutions, which enables the intake and dissection of data from multiple digital sources (streaming data) from multiple customers, all in real-time.

AI has entered the mainstream with the backing and advocacy of companies like Microsoft, IBM, Google, and Salesforce, who are heavily investing in the technology and generating lots of buzz. It is remarkable to observe how quickly AI is proliferating in so many verticals. Think back to this old CBS’s 60 Minutes segment and compare that to how most of us leverage AI tools on a daily basis now.

For our purposes, let’s look at where AI is likely to be applied in the EHS space. The mission-critical problem for EHS enterprise software companies is finding solutions that both enhance compliance and reduce manual labor and costs. This is where AI will play a major role. So far, companies have largely focused on aggregating their data in a record system(s); they have done little to interpret that data without human interaction. To address the ever-changing growth in environmental regulations, companies have been throwing people at the problem, but that is not sustainable.

Locus Artificial Intelligence

AI and natural language processing (NLP) systems have matured enough to read through the legalese of regulations, couple them with company’s monitoring and emissions data, and generate suggestions for actions based on relevant regulations and data. Take, for example; a CEMS installed at many plants to monitor air emissions in real-time. Alternatively, a drinking water supply system monitoring for water quality. In each of these systems, there are too many transactions taking place to monitor manually to ascertain which ones are compliant and which ones are not. It is an onerous task to figure out every exceedance on a case-by-case basis. Intelligent databases with a built-in AI layer can interpret data on arrival and signal when emissions exceed prescribed limits or when other things go wrong. The main driver behind applying AI to EHS compliance is to lower costs and increase the quality of EHS compliance, data management, and interpretation, and ultimately, to avoid all fines for exceedances.

For example, a large water utility company has to wade through thousands of analytical results to look for outliers of a few dozen chemicals they are required to monitor for compliance. Some of these may be false positives, but that still leaves some results to be investigated for outliers. Each of those investigations can take time. However, if a software algorithm has access to analytical results and can determine that the problem rests with a test in the lab, that problem can be solved quickly, almost without human interaction. That is powerful.

Combing through data and doing this by hand or via spreadsheet could take days and create a colossal waste of time and uncertainty. Hundreds of billable hours can be wasted with no guaranteed result. Using AI-driven SaaS software to determine what outliers need investigation allows compliance managers, engineers, and chemists to focus their expertise on just these cases and thus avoid wasting their time on the remaining ones that the AI engine indicates need no further examination.

Predictive analytics based on Big Data and AI will also make customer data (legacy and new) work harder for customers than any team(s) of consultants. A good analogy that came to me after watching that old 60 minutes story is that the same way the clinical center in North Carolina used AI to improve cancer treatment for their patients, engineers and geologists can improve on selecting the site remedy that will be optimized for given site conditions and will lead to a faster and less expensive cleanup with minimum long-term monitoring requirements.

Another example where AI will add value is in the area of enterprise carbon management. SaaS software is capable of integrating data from multiple sources, analyzing and aggregating it. This aggregated information can then be distributed to a company’s divisions or regulatory agencies for final reporting and validation/verification, all in real-time. This approach can save companies lots of time and resources. Companies will be able to access information from thousands of emission sources across the states, provinces, and even countries where their plants are located. Because each plant is likely to have its own set of regulatory drivers and reporting requirements, these would have to be incorporated into the calculation and reporting engine. After data from each plant is uploaded to a central processing facility, the information would be translated into a “common language,” the correct calculation formulae and reporting requirements applied, and the results then returned to each division in a format suitable for reporting internally and externally.

Blockchain for EHS: Looking ahead

Blockchain can further augment the power of AI for EHS monitoring and compliance. Blockchain’s decentralized approach coupled with AI will bring another revolution to EHS compliance and water monitoring.

Blockchain technology

Parts one, two, and four of this blog series complete the overview of Big Data, IoT, AI, and multitenancy. We look forward to feedback on our ideas and are interested in hearing where others see the future of AI in EHS software – contact us for more discussion or ideas!