Do not let Excel to deplete your gene list

24 November 2015 - 3:25pm by Rolando Garcia-Milian

Last night, while preparing an RNAseq dataset for functional analysis. I found this problem again. When opening high-throughput data results into Excel be aware that this software will convert (by default) some gene symbols into a date format- see examples in the table below. These conversions are not reversible so the original name cannot be recovered. Zeeberg et al. reported this problem back in 2004. If you are not aware of this and proceed with the functional analysis, those genes (converted into dates) will not be recognized and will not be computed. If you think that this will never happen to you, this error have been found in a project as important as the Cancer Genome Atlas.  

One way to avoid this –from the end-user bioinformatics perspective- is to define the column containing the gene symbols as “Text” under the “Column data format” as shown in the figure below. It is always recommended –whenever possible- to use unique identifiers (Ensembl IDs, Gene IDs, Affymetrix IDs, etc.) other than gene symbols. If you are not sure, you can always go to the Gene database (NCBI, NIH) whenever looking for the official symbol of a gene.   

For questions, consultations, or help with you functional analysis, please do not hesitate to contact me.

Example of some human gene symbols that will be converted into dates by Excel.