Thesis Obtención de información a partir de la componente visual en presentaciones en video
Loading...
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Program
Ingeniería Civil Informática
Departament
Campus
Campus Santiago San Joaquín
Abstract
Un constante incremento en la creación de empresas pone presión para la obtención de financiamiento para desarrollar sus ideas y productos. La empresa Digevo, se encarga de incubar emprendimientos y evaluar su viabilidad a través de un extenso formulario llenado de manera manual. Por ello, se busca extraer automáticamente la información relevante de los video pitchesincluso desde el contenido visual de las presentaciones, por esto se desea encontrar un método para obtenerla directo del video y aprovechar esta información que de otra manera se perdería. Se creó una librería que permita la manipulación de los frames de un video y la obtención de una transcripción del texto presentado: A partir de los videos en varios formatos, extrae los frames y realiza una limpieza para eliminar información redundante, para luego, identificar diapositivas distintas y transcribirlas a texto usando sistemas OCR. Aplica un proceso de estructuración, que incluye la agrupación de textos según la división espacial en las diapositivas y también lematización o stemming sobre el texto. De los resultados obtenidos se observó un buen y similar desempeño entre los sistemas OCR utilizados y para el caso de algunos parámetros dejados a libre elección en el proceso, se obtuvieron valores optimizados para la muestra de entrenamiento. En la fase de pruebas y validación, se obtuvo una baja pérdida en las estimaciones de diapositivas y se logró una precisión aproximada del 60 % para las transcripciones finales.
A constant increase in the creation of companies puts pressure on obtaining financing to develop their ideas and products. Digevo, as a company, takes on the task of incubating startups and evaluating their feasibility through an extensive form filled out manually. Therefore, the aim is to automatically extract relevant information from video pitches, even from the visual content of the presentations, to find a method that directly obtains this information from the video and utilizes it, avoiding any potential loss. iv A library was created to allow the manipulation of video frames and obtain a transcription of the presented text: starting from videos in various formats, it extracts frames and cleans them to eliminate redundant information. Then, it identifies different slides and transcribes them to text using OCR systems. The library applies a structuring process, including the grouping of texts based on spatial division in the slides and also lemmatization or stemming of the text. The results showed good and similar performance between the used OCR systems, and for some parameters left to be freely chosen in the process, optimized values were obtained for the training sample. In the testing and validation phase, there was minimal loss in slide estimations, and an approximate accuracy of 60 % was achieved for the final transcriptions.
A constant increase in the creation of companies puts pressure on obtaining financing to develop their ideas and products. Digevo, as a company, takes on the task of incubating startups and evaluating their feasibility through an extensive form filled out manually. Therefore, the aim is to automatically extract relevant information from video pitches, even from the visual content of the presentations, to find a method that directly obtains this information from the video and utilizes it, avoiding any potential loss. iv A library was created to allow the manipulation of video frames and obtain a transcription of the presented text: starting from videos in various formats, it extracts frames and cleans them to eliminate redundant information. Then, it identifies different slides and transcribes them to text using OCR systems. The library applies a structuring process, including the grouping of texts based on spatial division in the slides and also lemmatization or stemming of the text. The results showed good and similar performance between the used OCR systems, and for some parameters left to be freely chosen in the process, optimized values were obtained for the training sample. In the testing and validation phase, there was minimal loss in slide estimations, and an approximate accuracy of 60 % was achieved for the final transcriptions.
Description
Keywords
Procesamiento de texto, Python, Ingeniería de software
