Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/10495/12525
Registro completo de metadatos
Campo DC Valor Lengua/Idioma
dc.contributor.advisorIsaza Ramírez, Sebastián-
dc.contributor.advisorAedo Cobo, José Edinson-
dc.contributor.authorGuerra Soler, Aníbal José-
dc.date.accessioned2019-11-29T16:49:54Z-
dc.date.available2019-11-29T16:49:54Z-
dc.date.issued2019-
dc.identifier.citationGuerra-Soler, A.,J. (2019). Efficient Storage of Genomic Sequences in High Performance Computing Systems. (Tesis doctoral). Universidad de Antioquia. Medellín, Colombia.spa
dc.identifier.urihttp://hdl.handle.net/10495/12525-
dc.description.abstractABSTRACT: In this dissertation, we address the challenges of genomic data storage in high performance computing systems. In particular, we focus on developing a referential compression approach for Next Generation Sequence data stored in FASTQ format files. The amount of genomic data available for researchers to process has increased exponentially, bringing enormous challenges for its efficient storage and transmission. General-purpose compressors can only offer limited performance for genomic data, thus the need for specialized compression solutions. Two trends have emerged as alternatives to harness the particular properties of genomic data: non-referential and referential compression. Non-referential compressors offer higher compression rations than general purpose compressors, but still below of what a referential compressor could theoretically achieve. However, the effectiveness of referential compression depends on selecting a good reference and on having enough computing resources available. This thesis presents one of the first referential compressors for FASTQ files. We first present a comprehensive analytical and experimental evaluation of the most relevant tools for genomic raw data compression, which led us to identify the main needs and opportunities in this field. As a consequence, we propose a novel compression workflow that aims at improving the usability of referential compressors. Subsequently, we discuss the implementation and performance evaluation for the core of the proposed workflow: a referential compressor for reads in FASTQ format that combines local read-to-reference alignments with a specialized binary-encoding strategy. The compression algorithm, named UdeACompress, achieved very competitive compression ratios when compared to the best compressors in the current state of the art, while showing reasonable execution times and memory use. In particular, UdeACompress outperformed all competitors when compressing long reads, typical of the newest sequencing technologies. Finally, we study the main aspects of the data-level parallelism in the Intel AVX-512 architecture, in order to develop a parallel version of the UdeACompress algorithms to reduce the runtime. Through the use of SIMD programming, we managed to significantly accelerate the main bottleneck found in UdeACompress, the Suffix Array Construction.spa
dc.format.extent130spa
dc.format.mimetypeapplication/pdfspa
dc.language.isospaspa
dc.type.hasversioninfo:eu-repo/semantics/draftspa
dc.rightsAtribución-NoComercial-SinDerivadas 2.5 Colombia (CC BY-NC-ND 2.5 CO)*
dc.rightsinfo:eu-repo/semantics/openAccessspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/2.5/co/*
dc.subject.lcshPerformance - evaluation-
dc.titleEfficient Storage of Genomic Sequences in High Performance Computing Systemsspa
dc.typeinfo:eu-repo/semantics/doctoralThesisspa
dc.publisher.groupSistemas Embebidos e Inteligencia Computacional (SISTEMIC)spa
oaire.versionhttp://purl.org/coar/version/c_b1a7d7d4d402bccespa
dc.rights.accessrightshttp://purl.org/coar/access_right/c_abf2spa
thesis.degree.nameDoctor en Ingeniería Electrónicaspa
thesis.degree.levelDoctoradospa
thesis.degree.disciplineFacultad de Ingeniería. Doctorado en Ingeniería electrónicaspa
thesis.degree.grantorUniversidad de Antioquiaspa
dc.rights.creativecommonshttps://creativecommons.org/licenses/by-nc-nd/4.0/spa
dc.publisher.placeMedellín, Colombiaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_db06spa
dc.type.redcolhttps://purl.org/redcol/resource_type/TDspa
dc.type.localTesis/Trabajo de grado - Monografía - Doctoradospa
dc.subject.proposalGenomic sequencesspa
dc.subject.proposalParallel computingspa
dc.subject.proposalReads alignmentspa
dc.subject.proposalReads compressionspa
dc.subject.proposalReferential compressionspa
dc.subject.proposalSIMD programmingspa
dc.subject.lcshurihttp://id.loc.gov/authorities/subjects/sh2010105499-
Aparece en las colecciones: Doctorados de la Facultad de Ingeniería

Ficheros en este ítem:
Fichero Descripción Tamaño Formato  
GuerraSolerAnibal_2019_EfficientStorageGenomic.pdfTesis doctoral6.88 MBAdobe PDFVisualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons