Environment & Energy
Related: About this forumThe Use of AI for Adding Sense in Characterizing Environmental Microplastics and Trash.
The paper I'll briefly discuss in this post is this one: Microplastics and Trash Cleaning and Harmonization (MaTCH): Semantic Data Ingestion and Harmonization Using Artificial Intelligence Hannah Hapich, Win Cowger, and Andrew B. Gray Environmental Science & Technology 2024 58 (46), 20502-20512.
I don't have a lot of time on my hands right now; I'm behind on everything in my life, particularly as I'm losing sleep over the impending collapse of the United States, but in any case, the paper is open to the public for reading, I'll just briefly excerpt it, and show some pictures.
The ubiquity of microplastics and the vast number of papers on the subject are overwhelming, and in this paper, the authors have suggested a way to harmonize the data in order to make better sense of the literature.
The introduction states the problem:
Standards have been developed for managing mandated trash assessment data (8) and tabulating microplastics data with respect to specific reporting guidelines. (9) Such strategies serve as hubs for accumulating already standardized data and are often specific to certain geographic regions, study media, or government protocols. (8,10,16) Certain database structures may be better suited for data at the sample level (reported as a concentration) or the particle level (information reported for individual particles). Standardization is limited by the rate at which scientists, government organizations, nongovernmental organizations (NGOs), or industry adapt to such protocols. Additionally, most protocols do not have a strategy to utilize data that does not fit their standardized structures, which alienates potentially useful data. In these cases, users must perform data harmonization manually. It is particularly important in the field of microplastics monitoring to utilize existing databases due to the cost and time prohibitive nature of the field, wherein it commonly requires up to thousands of dollars and tens of hours to process a single sample, making data from each monitoring study highly valuable...
Another Excerpt:
The field of microplastics and trash is not the first to encounter such issues. Many divisions of the environmental and biological sciences have similar problems, which will worsen over time with ever-growing datasets and a focus on curating big data to identify knowledge gaps and answer key questions. (17) Previous work has assessed the use of natural language processing (NLP) algorithms as a means for information retrieval to assemble databases and organize their taxonomic structures. (18,19) Until recently, the technology available consisted of different pattern matching and syntactic/semantic parsing, some of which rely on extracting exact matches, and most have a narrow application range tailored to a specific subfield. (19) Results from early exploration of NLP for scientific data curation were discouraging (20) and may have led to underutilization.
NLP technology has vastly improved in accuracy and efficiency just over the past few years, primarily a result of increases in computing power and the development of open-source artificial intelligence (AI) software capable of employing transformers and embeddings. (21) Transformers are a type of neural network structure able to interpret data nonsequentially... (22)
Figures from the text:
The caption:
The caption:
The caption:
The authors briefly discuss the limitations of their approach, and suggest future refinements on this work.