Home Reference Deduplicator

Reference Deduplicator

The Reference Deduplicator is a web-based application designed to effectively remove duplicates in a given set of bibliographic references. It reads source references in the standard RIS format, removes the references determined to be duplicates, and exports the remaining unique references.

  • Accuracy: The application uses a variety of data normalization techniques and reference matching algorithms to identify duplicates with improved sensitivity and precision. The application was tested for accuracy with references mostly in the health sciences subject area.
  • Transparency: With the application's detailed technical documentation, one can always pinpoint the reason why a reference was removed as a duplicate.
  • Reproducibility: The application’s code base is versioned and the deduplication process is automatic. The same source reference set will always result in the same unique reference set, if processed with the same version of the code base.
  • Verifiability: Each deduplication results page can be saved for offline viewing or for archival purposes. One can always retrospectively verify which references were removed as deduplicates and the reasons for their removal.

The Reference Deduplicator is currently an internal tool used at the Cushing/Whitney Medical Library of Yale University.

Lei Wang designed and coded the application. Kelly Perry conducted thorough tests. Justin DeMayo provided server administration support. The Deduplicator was built upon years of valuable practice and experience in reference deduplication by members of the Cross-Departmental Team (CDT) at the Cushing/Whitney Medical Library. Thank you, Vasean Daniels, Khadija El-Hazimy, Pamela Gibson, Mary Hughes, Dorota Peglow, Vermetha Polite, and Chris Zollo!

Download Citation for the Reference Deduplicator (.ris)