TY - JOUR
T1 - A transcriptional sketch of a primary human breast cancer by 454 deep sequencing
AU - Guffanti, Alessandro
AU - Iacono, Michele
AU - Pelucchi, Paride
AU - Kim, Namshin
AU - Soldà, Giulia
AU - Croft, Larry J.
AU - Taft, Ryan J.
AU - Rizzi, Ermanno
AU - Askarian-Amiri, Marjan
AU - Bonnal, Raoul J.
AU - Callari, Maurizio
AU - Mignone, Flavio
AU - Pesole, Graziano
AU - Bertalot, Giovanni
AU - Bernardi, Luigi
AU - Albertini, Alberto
AU - Lee, Christopher
AU - Mattick, John S.
AU - Zucchi, Ileana
AU - De Bellis, Gianluca
N1 - Funding Information:
This work was supported by the following research grants: CARIPLO grant 2006-0772 'Genomic, epigenetic and transcriptional analysis of tumors by deep sequencing' to IZ and GdB; Italian Fund for Basic Research grant 'Large Laboratories' RBLA03ER38 to GdB; Net2Drug grant n. 037590 to IZ. PP fellowship is supported by the CARIPLO-NOBEL grant to IZ. Bioinformatic analysis and validation strategies are based on the methods developed in the research grant 'Identification of new cancer biomarkers through bioinfor-matics and application to tumor prognosis and therapy' assigned to AG by Italian Cancer Research Association in 2004.
Funding Information:
JSM and LJC are supported by grants from the Australian Research Council ( FF0561986 and S00001543) and the National Health and Medical Research Council (DP456080). RJT is supported by a United States National Science Foundation Graduate Research Fellowship.
PY - 2009/4/20
Y1 - 2009/4/20
N2 - Background: The cancer transcriptome is difficult to explore due to the heterogeneity of quantitative and qualitative changes in gene expression linked to the disease status. An increasing number of "unconventional" transcripts, such as novel isoforms, non-coding RNAs, somatic gene fusions and deletions have been associated with the tumoral state. Massively parallel sequencing techniques provide a framework for exploring the transcriptional complexity inherent to cancer with a limited laboratory and financial effort. We developed a deep sequencing and bioinformatics analysis protocol to investigate the molecular composition of a breast cancer poly(A)+ transcriptome. This method utilizes a cDNA library normalization step to diminish the representation of highly expressed transcripts and biology-oriented bioinformatic analyses to facilitate detection of rare and novel transcripts. Results: We analyzed over 132,000 Roche 454 high-confidence deep sequencing reads from a primary human lobular breast cancer tissue specimen, and detected a range of unusual transcriptional events that were subsequently validated by RT-PCR in additional eight primary human breast cancer samples. We identified and validated one deletion, two novel ncRNAs (one intergenic and one intragenic), ten previously unknown or rare transcript isoforms and a novel gene fusion specific to a single primary tissue sample. We also explored the non-protein-coding portion of the breast cancer transcriptome, identifying thousands of novel non-coding transcripts and more than three hundred reads corresponding to the non-coding RNA MALAT1, which is highly expressed in many human carcinomas. Conclusion: Our results demonstrate that combining 454 deep sequencing with a normalization step and careful bioinformatic analysis facilitates the discovery and quantification of rare transcripts or ncRNAs, and can be used as a qualitative tool to characterize transcriptome complexity, revealing many hitherto unknown transcripts, splice isoforms, gene fusion events and ncRNAs, even at a relatively low sequence sampling.
AB - Background: The cancer transcriptome is difficult to explore due to the heterogeneity of quantitative and qualitative changes in gene expression linked to the disease status. An increasing number of "unconventional" transcripts, such as novel isoforms, non-coding RNAs, somatic gene fusions and deletions have been associated with the tumoral state. Massively parallel sequencing techniques provide a framework for exploring the transcriptional complexity inherent to cancer with a limited laboratory and financial effort. We developed a deep sequencing and bioinformatics analysis protocol to investigate the molecular composition of a breast cancer poly(A)+ transcriptome. This method utilizes a cDNA library normalization step to diminish the representation of highly expressed transcripts and biology-oriented bioinformatic analyses to facilitate detection of rare and novel transcripts. Results: We analyzed over 132,000 Roche 454 high-confidence deep sequencing reads from a primary human lobular breast cancer tissue specimen, and detected a range of unusual transcriptional events that were subsequently validated by RT-PCR in additional eight primary human breast cancer samples. We identified and validated one deletion, two novel ncRNAs (one intergenic and one intragenic), ten previously unknown or rare transcript isoforms and a novel gene fusion specific to a single primary tissue sample. We also explored the non-protein-coding portion of the breast cancer transcriptome, identifying thousands of novel non-coding transcripts and more than three hundred reads corresponding to the non-coding RNA MALAT1, which is highly expressed in many human carcinomas. Conclusion: Our results demonstrate that combining 454 deep sequencing with a normalization step and careful bioinformatic analysis facilitates the discovery and quantification of rare transcripts or ncRNAs, and can be used as a qualitative tool to characterize transcriptome complexity, revealing many hitherto unknown transcripts, splice isoforms, gene fusion events and ncRNAs, even at a relatively low sequence sampling.
UR - http://www.scopus.com/inward/record.url?scp=65549155761&partnerID=8YFLogxK
U2 - 10.1186/1471-2164-10-163
DO - 10.1186/1471-2164-10-163
M3 - Article
SN - 1471-2164
VL - 10
JO - BMC Genomics
JF - BMC Genomics
M1 - 163
ER -