Automatic identification of variables in epidemiological datasets using logic regression

BMC Medical Informatics and Decision Making

26 April Apr 2017 5 months ago
  • Baldassarre D, Amato M, Tremoli E

For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable.

Reference
Lorenz MW, Abdi NA, Scheckenbach F, Pflug A, Bülbül A, Catapano AL, Agewall S, Ezhov M, Bots ML, Kiechl S, Orth A; PROG-IMT study group. Baldassarre D, Amato M, Tremoli E tra i collaboratori. Automatic identification of variables in epidemiological datasets using logic regression. BMC Med Inform Decis Mak 2017 Apr 13;17(1):40. doi: 10.1186/s12911-017-0429-1.

Go to PubMed