#dataharmonization
Working with multi-regional data? 🗺️

Learn how Maelstrom Research tackles data harmonization using CanPath as a case study.

📅 June 10 | 🕐 1 PM EDT

🔗 Register >> us02web.zoom.us/webinar/regi...

#HealthData #DataHarmonization #CohortResearch
May 9, 2025 at 4:01 PM
Conduct HR surveys or customer feedback? Automate text analysis and topic clustering in bulk spreadsheets using cutting-edge LLM-powered workflows.
🔗 matasoft.hr/qtrendcontro...
#HRAutomation #NLP #SurveyTools #DataHarmonization #Automation #SmartTools #LLM #CloudComputing #OnPremAI #ScalableAI
Matasoft's AI-Driven Spreadsheet Processing Services and Software
Transform your business data workflows with Matasoft’s AI-driven spreadsheet processing services and software. (Un)Perplexed Spready, powered by Perplexity AI, automates data extraction, categorizatio...
matasoft.hr
September 12, 2025 at 3:49 PM
Hate wrangling spreadsheets with different data schemas? (Un)Perplexed Spready normalizes structures and names, harmonizing workbooks in bulk.
🔗 matasoft.hr/qtrendcontro...
#DataHarmonization #Automation #SmartTools #LLM #CloudComputing #OnPremAI #ScalableAI #BigData #DataProcessing #AI #Data
Matasoft's AI-Driven Spreadsheet Processing Services and Software
Transform your business data workflows with Matasoft’s AI-driven spreadsheet processing services and software. (Un)Perplexed Spready, powered by Perplexity AI, automates data extraction, categorizatio...
matasoft.hr
September 11, 2025 at 5:32 PM
All normalized to biomedical ontologies, such as MONDO and UBERON.

All traceable to the source.

Read the full preprint - biorxiv.org/content/10.1...

#FAIRdata #metadata #biomedicalAI #healthcaredata #dataharmonization #AI
biorxiv.org
June 19, 2025 at 7:07 PM
JMIR Formative Res: Automated Data Harmonization in Clinical Research: Natural Language Processing Approach #DataHarmonization #ClinicalResearch #NaturalLanguageProcessing #NLP #MachineLearning
Automated Data Harmonization in Clinical Research: Natural Language Processing Approach
Background: Integrating data is essential for advancing clinical and epidemiological research. However, because datasets often describe variables (e.g., demographic, health conditions, etc.) in diverse ways, the process of integrating and harmonizing variables from research studies remains a major bottleneck. Objective: The objective was to assess a natural language processing (NLP)-based method to automate variable harmonization to achieve a scalable approach to integration of multiple datasets. Methods: We developed a fully connected neural network (FCN) method, enhanced with contrastive learning, using domain-specific embeddings from the BioBERT language representation model, using three cardiovascular datasets: the Atherosclerosis Risk in Communities (ARIC) study, the Framingham Heart Study (FHS) and the Multi-Ethnic Study of Atherosclerosis (MESA). We used metadata variable descriptions and curated harmonized concepts as ground truth. We framed the problem as a paired sentence classification task. The accuracy of this method was compared to a logistic regression baseline method. To assess the generalizability of the trained models, we also evaluated their performance by separating the three datasets when preparing the training and validation sets. Results: The newly developed fully connected neural network (FCN) achieved a top-5 accuracy of 98.95% (95% CI: 98.31%-99.47%) and an AUC of 0.990 (95% CI: 0.988-0.991), outperforming the standard logistic regression model, which exhibited a top-5 accuracy of 22.23% (95% CI: 19.91% - 24.87%) and an AUC of 0.824 (95% CI: 0.815 – 0.834). The contrastive learning enhancement also outperformed the logistic regression model, although slightly below the base FCN model, exhibiting a top-5 accuracy of 89.88% (95% CI: 87.88% - 91.68%) and an AUC of 0.977 (95% CI: 0.975 – 0.979). Conclusions: This novel approach provides a scalable solution for harmonizing metadata across large-scale cohort studies. The proposed method significantly enhances the performance over the baseline method by utilizing learned representations to categorize harmonized concepts more accurately for cohorts in cardiovascular disease and stroke.
dlvr.it
August 27, 2025 at 2:40 PM
Merging biological datasets like public, proprietary, or platform-specific is no simple task.

It takes real expertise to resolve inconsistencies, integrate them seamlessly, and extract insights.

#bioinformatics #computationalbiology #dataintegration #dataharmonization
July 3, 2025 at 5:18 AM
TOMORROW: Learn how Maelstrom Research tackles data harmonization using CanPath as a case study.

🗓️ June 10 | 🕐 1 PM EDT

🔗 Register: canpath.ca/2025/05/less...
#DataHarmonization #HealthData #CohortResearch
Lessons from CanPath on demystifying data harmonization - CanPath - Canadian Partnership for Tomorrow’s Health
Data harmonization is essential to produce multi-cohort research. Learn more about best practices for harmonizing large-scale cohort data.
canpath.ca
June 9, 2025 at 7:03 PM