Applications of Topology to Natural Language Processing

Abstract

A model used in text classification is that of co-occurrence networks, where frequent terms in a specific domain corpus are represented by vertices in a graph and whose edges connect terms that appear in the same document. A novel way to describe networks of co-occurrences in a corpus is through the use of simplicial complexes, we can associate to these simplicial complexes a filtering give it by the frequencies in which groups of words appear in different texts (weighted rank clique filtration) and in some cases we can also add a filtration given by the document dates (temporary filtration). This allows the calculation of persistent homology in both cases. The obtained homology cycles describe new relationships between the terms of the co-occurrence network. We will see some examples of how to apply these techniques to identify sets of terms that belong to the same category (Betti zero) and how cycles can be interpreted (Betti one) in certain contexts.

Date
Oct 11, 2018 10:20 AM
Location
Villahermosa, Tabasco