Automatic information extraction in business document

Moreno Acevedo, Santiago Andres

Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/10495/37581

Registro completo de metadatos

Campo DC	Valor	Lengua/Idioma
dc.contributor.advisor	Orozco Arroyave, Juan Rafael	-
dc.contributor.advisor	Vasquez Correa, Juan Camilo	-
dc.contributor.author	Moreno Acevedo, Santiago Andres	-
dc.date.accessioned	2023-12-13T16:22:31Z	-
dc.date.available	2023-12-13T16:22:31Z	-
dc.date.issued	2023	-
dc.identifier.uri	https://hdl.handle.net/10495/37581	-
dc.description.abstract	ABSTRACT : Information Extraction (IE) is a topic of Natural Language Processing that has gained interest in the research community for its applications in real-world areas, such as law environments where the analysis of documents is very important. So far, IE has been extensively studied in general contexts with ideal data with many samples per class. However, real-world contexts do not have either large amounts of data or balance among classes. Therefore, it is necessary to develop models that can handle real-world data problems. This master's thesis aims to investigate techniques and methods for handling limited and unbalanced data in Natural Language Processing (NLP) contexts. The goal is to implement these techniques and methods into a software tool that can automatically extract information from documents. With this aim, two NLP approaches were studied: Named Entity Recognition (NER) and Relation Classification (RC). Different methods were analyzed, including both architectural and data-related approaches. To address the class imbalance, several loss functions were explored to create a model that prioritizes samples that are hard to classify. Additionally, data augmentation strategies were employed to face the limited data problem. A methodology for NER was developed, integrating data augmentation strategies and the focal loss function into a benchmark model. For RC, we identified a state-of-the art architecture that uses the focal loss function and performs well with limited data. The outcomes for NER and RC were satisfactory at the end of the work. Finally, both the NER methodology and the RC architecture wwew integrated into a software tool that enables automatic NER and RC tasks for any given document. This work is the first stage in creating an automatic document analysis tool.	spa
dc.format.extent	85	spa
dc.format.mimetype	application/pdf	spa
dc.language.iso	eng	spa
dc.type.hasversion	info:eu-repo/semantics/draft	spa
dc.rights	info:eu-repo/semantics/openAccess	spa
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/2.5/co/	*
dc.title	Automatic information extraction in business document	spa
dc.type	info:eu-repo/semantics/masterThesis	spa
dc.publisher.group	Grupo de Investigación en Telecomunicaciones Aplicadas (GITA)	spa
oaire.version	http://purl.org/coar/version/c_b1a7d7d4d402bcce	spa
dc.rights.accessrights	http://purl.org/coar/access_right/c_abf2	spa
thesis.degree.name	Magister en Ingeniería de Telecomunicaciones	spa
thesis.degree.level	Maestría	spa
thesis.degree.discipline	Facultad de Ingeniería. Maestría en Ingeniería de Telecomunicaciones	spa
thesis.degree.grantor	Universidad de Antioquia	spa
dc.rights.creativecommons	https://creativecommons.org/licenses/by-nc-sa/4.0/	spa
dc.publisher.place	Medellín, Colombia	spa
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc	spa
dc.type.redcol	https://purl.org/redcol/resource_type/TM	spa
dc.type.local	Tesis/Trabajo de grado - Monografía - Maestría	spa
dc.subject.decs	Procesamiento de Lenguaje Natural	-
dc.subject.decs	Natural Language Processing	-
dc.subject.lemb	Inteligencia artificial	-
dc.subject.lemb	Machine learning	-
dc.subject.lemb	Aprendizaje Profundo	-
dc.subject.lemb	Deep Learning	-
dc.subject.lemb	Minería de datos	-
dc.subject.lemb	Data mining	-
dc.subject.proposal	Artifical Intelligence	spa
dc.subject.proposal	Business Intelligence	spa
dc.subject.proposal	Information Extraction	spa
Aparece en las colecciones:	Maestrías de la Facultad de Ingeniería

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
MorenoSantiago_2023_InformationExtractionDeepLearningNaturalLanguageProcessing.pdf	Tesis de maestría	2.39 MB	Adobe PDF	Visualizar/Abrir

Mostrar el registro sencillo del ítem

Este ítem está sujeto a una licencia Creative Commons Licencia Creative Commons