Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/10495/37581
Registro completo de metadatos
Campo DC Valor Lengua/Idioma
dc.contributor.advisorOrozco Arroyave, Juan Rafael-
dc.contributor.advisorVasquez Correa, Juan Camilo-
dc.contributor.authorMoreno Acevedo, Santiago Andres-
dc.date.accessioned2023-12-13T16:22:31Z-
dc.date.available2023-12-13T16:22:31Z-
dc.date.issued2023-
dc.identifier.urihttps://hdl.handle.net/10495/37581-
dc.description.abstractABSTRACT : Information Extraction (IE) is a topic of Natural Language Processing that has gained interest in the research community for its applications in real-world areas, such as law environments where the analysis of documents is very important. So far, IE has been extensively studied in general contexts with ideal data with many samples per class. However, real-world contexts do not have either large amounts of data or balance among classes. Therefore, it is necessary to develop models that can handle real-world data problems. This master's thesis aims to investigate techniques and methods for handling limited and unbalanced data in Natural Language Processing (NLP) contexts. The goal is to implement these techniques and methods into a software tool that can automatically extract information from documents. With this aim, two NLP approaches were studied: Named Entity Recognition (NER) and Relation Classification (RC). Different methods were analyzed, including both architectural and data-related approaches. To address the class imbalance, several loss functions were explored to create a model that prioritizes samples that are hard to classify. Additionally, data augmentation strategies were employed to face the limited data problem. A methodology for NER was developed, integrating data augmentation strategies and the focal loss function into a benchmark model. For RC, we identified a state-of-the art architecture that uses the focal loss function and performs well with limited data. The outcomes for NER and RC were satisfactory at the end of the work. Finally, both the NER methodology and the RC architecture wwew integrated into a software tool that enables automatic NER and RC tasks for any given document. This work is the first stage in creating an automatic document analysis tool.spa
dc.format.extent85spa
dc.format.mimetypeapplication/pdfspa
dc.language.isoengspa
dc.type.hasversioninfo:eu-repo/semantics/draftspa
dc.rightsinfo:eu-repo/semantics/openAccessspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/2.5/co/*
dc.titleAutomatic information extraction in business documentspa
dc.typeinfo:eu-repo/semantics/masterThesisspa
dc.publisher.groupGrupo de Investigación en Telecomunicaciones Aplicadas (GITA)spa
oaire.versionhttp://purl.org/coar/version/c_b1a7d7d4d402bccespa
dc.rights.accessrightshttp://purl.org/coar/access_right/c_abf2spa
thesis.degree.nameMagister en Ingeniería de Telecomunicacionesspa
thesis.degree.levelMaestríaspa
thesis.degree.disciplineFacultad de Ingeniería. Maestría en Ingeniería de Telecomunicacionesspa
thesis.degree.grantorUniversidad de Antioquiaspa
dc.rights.creativecommonshttps://creativecommons.org/licenses/by-nc-sa/4.0/spa
dc.publisher.placeMedellín, Colombiaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.redcolhttps://purl.org/redcol/resource_type/TMspa
dc.type.localTesis/Trabajo de grado - Monografía - Maestríaspa
dc.subject.decsProcesamiento de Lenguaje Natural-
dc.subject.decsNatural Language Processing-
dc.subject.lembInteligencia artificial-
dc.subject.lembMachine learning-
dc.subject.lembAprendizaje Profundo-
dc.subject.lembDeep Learning-
dc.subject.lembMinería de datos-
dc.subject.lembData mining-
dc.subject.proposalArtifical Intelligencespa
dc.subject.proposalBusiness Intelligencespa
dc.subject.proposalInformation Extractionspa
Aparece en las colecciones: Maestrías de la Facultad de Ingeniería

Ficheros en este ítem:
Fichero Descripción Tamaño Formato  
MorenoSantiago_2023_InformationExtractionDeepLearningNaturalLanguageProcessing.pdfTesis de maestría2.39 MBAdobe PDFVisualizar/Abrir


Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons Creative Commons