Data management in Life Cycle Assessment (LCA) often involves navigating a complex landscape of different formats, databases, and classification systems. Matching flows between datasets and ensuring consistent interpretation across systems can be a significant challenge. For the GTDR project at Aalborg University, we developed a solution to address these challenges: py-semantic-taxonomy (pyst).
Pyst is an open-source software system designed to store, maintain, and publish semantic taxonomies. In this article, we'll explore how this innovative approach can transform LCA data management and improve the accuracy and interoperability of sustainability assessments.
The Challenge of LCA Data Exchange and Usage
Life Cycle Assessment practitioners face several recurring challenges when working with on building on data from different sources. These challenges not only create inefficiencies but can also impact the accuracy and reproducibility of LCA studies. When practitioners need to manually match items across systems or simplify their models to accommodate database limitations, valuable information can be lost.
Manual matching of flows across different systems
LCA data can come in various data formats, and each format and data publisher uses a slightly different set of significant fields and field labels. Because these differences often don't fit into set patterns, manual matching is needed to specify that datasets in separate databases describe the underlying object.
Example: When linking Agribalyse to its original ecoinvent input data, we needed to translate from Agribalyse names like:
"Swath, by rotary windrower {GLO}| market for | Cut-off, S - Copied from Ecoinvent"
to the ecoinvent data attributes:
'activity name': 'swath, by rotary windrower',
'geography': 'CA-QC',
'reference product': 'swath, by rotary windrower'
Pyst can't do manual matches for you yet (though we have a tool for that), but it can publish these matches using whatever fields you need to describe that particular match, while keeping an audit log of who added or changed that match, and make all such matches accessible via a web interface and an API.
Loss of specificity when linking foreground systems to background databases
When building LCA models, analysts often build their inventories directly using the nomenclature and classification systems of their current background database versions. However, any such modelling will almost always lose information, as the background processes and names do not follow the identification systems or standards that the client uses internally or specifies in purchase orders.
Pyst allows you to build your inventory against existing international or technical standards, or to add and use your existing ERP or PLM identifiers. You can then resolve the specific material or energy providers when doing calculations. Because Pyst is oriented around hierarchical taxonomies, it can automatically search up or down the hierarchy to find the best supplier match for a given background database version. The matching algorithm can also be tuned to include additional metadata, if desired. These matches can also be calculated in advance or manually adjusted, if needed.
Example: When building a life cycle inventory for a specialized steel component, an analyst might specify "Flat-rolled products of stainless steel, of a width of >= 600 mm, cold-reduced, perforated" (HS code 72 19 90 20) based on actual manufacturing data. However, the background database verison might only have a more generic "steel, chromium steel 18/8" category. Pyst allows specifying the correct, detailed input, while using the generic product in LCA calculations.
This dynamic matching allows for analysts to specify inputs at the highest and most accurate level of detail, and include any additional data needed to understand the input or how best to match it in future analyses. In the above example, the Pyst matching data could also be extended to automatically include machining processes for the rolling or perforation of the steel.
Dynamic matching also eases calculations with other background databases or new background database versions. In most cases, these only need to be added to the taxonomy, and the search algorithm can now link your studies to and make calculations with the new database.
Difficulty maintaining relationships between items when database versions change
Some background databases make frequent minor changes to their metadata, and these changes are not always presented in easily applicable ways. Pyst can't generate mappings across versions on its own, but can publish these mappings in a transparent and usable fashion, accessible in both human- and machine-readable formats.
Example: Pyst can build on our set of matching products, including flowmapper and ecoinvent_migrate, to generate and publish complete and accurate mappings across versions. Here is some examples from ecoinvent. First, a disaggregation of an activity from 3.10.1 to 3.11:
'source': {
'unit': 'kg',
'name': 'rye production, organic',
'reference product': 'straw, organic',
'location': 'CH'
},
'targets': [
{
'name': 'rye grain production, winter, organic, hill region',
'allocation': 0.2571
}, {
'name': 'rye grain production, winter, organic, mountain region',
'allocation': 0.0653
}, {
'name': 'rye grain production, winter, organic, plain region',
'allocation': 0.6775
}
]
Next, an undocumented change in an elementary flow name from 3.9.1 to 3.10:
'source': {
'name': 'Benzene, ethyl-',
'uuid': '4abbbc48-a8d0-4c3a-9dea-4e67f1505230'
},
'target': {
'name': 'Ethyl benzene'
},
'comment': 'Flow attribute change not listed in change report'
Limited flexibility to accommodate different levels of detail in inventories and inventory mapping
Current best practice in the LCA community is to share mapping data across databases or database versions as tabular (Excel) files (see examples from GLAD and US EPA). These files can be standardized, but they don't follow standards, and so require custom code and tests for safe usage. Moreover, tabular datasets by their nature have a fixed set of columns, preventing the addition of extra useful data.
In contrast, pyst allows for addition of any additional data useful for understanding a concept or relationship between concepts. This additional data can follow a semantic web ontology, but this is not a requirement.
Example: Pyst allows the elementary flow nickel (2+) to be described with reference to multiple systems and chemical properties:
'skos:prefLabel': [
{'@value': 'nickel(2+)', '@language' 'en'},
{'@value': 'никель(2+)', '@language': 'ru'}
],
'skos:altLabel': [{
'@value': 'Nickel, ion',
'@language': 'en',
'rdfs:comment': 'Ecoinvent version 3.8'
}],
'CHEMINF:000407': '49786',
'CHEMINF:000007': '[Ni++]',
'CHEMINF:000402': '+2',
'skos:exactMatch': 'https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:49786',
'skos:narrowMatch': 'https://glossary.ecoinvent.org/ids/56815b4f-6138-4e0b-9fac-c94fd6b102b3',
'PATO:0000125': '58.69340',
'CHEMINF:000218': '57.93424'
This data would not be read by humans, but instead processed before display. For example, the identifier "CHEMINF:000218" - part of the Chemical Information Ontology - would be exanded to "Monoisotopic mass descriptor" if the interface language was English.
Pyst also allows for conditional matching - i.e. term A is equivalent to term B, but only in the presence of additional qualifying information. For example, pyst can store a unit conversion from volume to mass of natural gas, with the conversion factor dependent on the origin of the natural gas and when it was extracted.
Poor discovery of primary and interoperability data
The LCA community has embraced the use of universally unique identifiers (UUIDs), and these are a great advancement over previous practice. However, UUIDs are not self-describing - you need to know where to go to look up what they mean. The semantic web instead uses internationalized resource identifiers (IRIs), like "https://example.com/flows/nickel%282%2B%29". IRIs resolve to web resources which give complete data on the terms being referenced, making it much easier to translate identifiers into their underlying data.
Pyst also has a pluggable semantic search engine, making it easier to navigate complex taxonomies. Semantic search allows you to search for "Kodak" and get terms related to film and camera manufacturing, or for "bronze" and get terms related to metallurgy, statue carving, and ammunition manufacture.
Semantic Taxonomies: A Flexible and Practical Solution
What Are Semantic Taxonomies?
Semantic taxonomies provide a hierarchical structure for organizing and relating concepts while storing data in standardized and widely-used ontologies. They allow for precise classification, description of the hierarchy, and correspondence across taxonomies. The pyst system primarily leverages the SKOS (Simple Knowledge Organization System) ontology, a widely-used standard that is part of the semantic web family of standards. SKOS is widely used by international organizations including the European Union and FAO.
SKOS is particularly well-suited for LCA applications because it:
- Supports hierarchical relationships (broader/narrower concepts)
- Accommodates temporal correspondence (newer/older versions)
- Enables pair-wise matching between concepts across different schemes
- Allows for rich metadata, including multilingual labels, descriptions, and synonyms
- Is extensible without requiring new data format definitions
What is Py-semantic-taxonomy (Pyst)?
Pyst is an open-source software system with a server and a client library, with documentation available at docs.pyst.dev. The Pyst server provides both a web interface and an API for working with these taxonomies. The pyst client library, also written in Python, provides simple functions to access the API endpoints. The API is RESTful, making it easy to integrate pyst with existing LCA software and workflows.
Future Possibilities
While the current implementation of pyst already offers significant value, there are several exciting directions for future development:
Integration with Machine Learning
The rich semantic structure provided by pyst creates opportunities for advanced matching algorithms. Machine learning approaches could leverage this structure to suggest potential matches between concepts, further reducing manual work.
Extended Data Transformation
Future versions could incorporate more sophisticated data transformation and export capabilities, including automatic generation of correspondences between taxonomies and direct integration with data analysis tools and pipelines.
Conclusion
At Cauldron, we have a lot of experience working with LCA data interoperability, and have used or even written our own formats in the past. Pyst is the culmination of our experience writing software and supporting user needs in this domain over the last 15 years. While there is certainly lots of room for improvement, the current software is production-ready and is a substantial improvement over existing formats or tools. By leveraging semantic web standards, particularly SKOS, pyst provides a flexible, interoperable solution that preserves data specificity while enabling practical calculations across different systems and database versions.
As the LCA field continues to evolve, with increasing demands for transparency, reproducibility, and interoperability, tools like Pyst will play a crucial role in enabling more efficient and accurate sustainability assessments.
At Cauldron Solutions, we're proud to contribute to the advancement of LCA methodology through open-source solutions like pyst. Our work with Aalborg University on the GTDR project demonstrates our commitment to developing practical tools that address real-world challenges in sustainability assessment. Please feel free to contact us to discuss how we can deploy or customize Pyst for your sustainability assessment needs.