Por favor, use este identificador para citar o enlazar este ítem:
https://hdl.handle.net/10495/37581
Registro completo de metadatos
Campo DC | Valor | Lengua/Idioma |
---|---|---|
dc.contributor.advisor | Orozco Arroyave, Juan Rafael | - |
dc.contributor.advisor | Vasquez Correa, Juan Camilo | - |
dc.contributor.author | Moreno Acevedo, Santiago Andres | - |
dc.date.accessioned | 2023-12-13T16:22:31Z | - |
dc.date.available | 2023-12-13T16:22:31Z | - |
dc.date.issued | 2023 | - |
dc.identifier.uri | https://hdl.handle.net/10495/37581 | - |
dc.description.abstract | ABSTRACT : Information Extraction (IE) is a topic of Natural Language Processing that has gained interest in the research community for its applications in real-world areas, such as law environments where the analysis of documents is very important. So far, IE has been extensively studied in general contexts with ideal data with many samples per class. However, real-world contexts do not have either large amounts of data or balance among classes. Therefore, it is necessary to develop models that can handle real-world data problems. This master's thesis aims to investigate techniques and methods for handling limited and unbalanced data in Natural Language Processing (NLP) contexts. The goal is to implement these techniques and methods into a software tool that can automatically extract information from documents. With this aim, two NLP approaches were studied: Named Entity Recognition (NER) and Relation Classification (RC). Different methods were analyzed, including both architectural and data-related approaches. To address the class imbalance, several loss functions were explored to create a model that prioritizes samples that are hard to classify. Additionally, data augmentation strategies were employed to face the limited data problem. A methodology for NER was developed, integrating data augmentation strategies and the focal loss function into a benchmark model. For RC, we identified a state-of-the art architecture that uses the focal loss function and performs well with limited data. The outcomes for NER and RC were satisfactory at the end of the work. Finally, both the NER methodology and the RC architecture wwew integrated into a software tool that enables automatic NER and RC tasks for any given document. This work is the first stage in creating an automatic document analysis tool. | spa |
dc.format.extent | 85 | spa |
dc.format.mimetype | application/pdf | spa |
dc.language.iso | eng | spa |
dc.type.hasversion | info:eu-repo/semantics/draft | spa |
dc.rights | info:eu-repo/semantics/openAccess | spa |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/2.5/co/ | * |
dc.title | Automatic information extraction in business document | spa |
dc.type | info:eu-repo/semantics/masterThesis | spa |
dc.publisher.group | Grupo de Investigación en Telecomunicaciones Aplicadas (GITA) | spa |
oaire.version | http://purl.org/coar/version/c_b1a7d7d4d402bcce | spa |
dc.rights.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |
thesis.degree.name | Magister en Ingeniería de Telecomunicaciones | spa |
thesis.degree.level | Maestría | spa |
thesis.degree.discipline | Facultad de Ingeniería. Maestría en Ingeniería de Telecomunicaciones | spa |
thesis.degree.grantor | Universidad de Antioquia | spa |
dc.rights.creativecommons | https://creativecommons.org/licenses/by-nc-sa/4.0/ | spa |
dc.publisher.place | Medellín, Colombia | spa |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
dc.type.redcol | https://purl.org/redcol/resource_type/TM | spa |
dc.type.local | Tesis/Trabajo de grado - Monografía - Maestría | spa |
dc.subject.decs | Procesamiento de Lenguaje Natural | - |
dc.subject.decs | Natural Language Processing | - |
dc.subject.lemb | Inteligencia artificial | - |
dc.subject.lemb | Machine learning | - |
dc.subject.lemb | Aprendizaje Profundo | - |
dc.subject.lemb | Deep Learning | - |
dc.subject.lemb | Minería de datos | - |
dc.subject.lemb | Data mining | - |
dc.subject.proposal | Artifical Intelligence | spa |
dc.subject.proposal | Business Intelligence | spa |
dc.subject.proposal | Information Extraction | spa |
Aparece en las colecciones: | Maestrías de la Facultad de Ingeniería |
Ficheros en este ítem:
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
MorenoSantiago_2023_InformationExtractionDeepLearningNaturalLanguageProcessing.pdf | Tesis de maestría | 2.39 MB | Adobe PDF | Visualizar/Abrir |
Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons