The present research aims mainly, at establishing an error tolerant procedure that extracts information from Natural Language (NL) Communication Standard Documents along with storing error knowledge. The error knowledge will contain information about the detected errors and inconsistencies as well as the actions taken to solve them. It will act as a key tool for solving the detected errors at various levels of the procedure. As a particular scope, the searching of errors and inconsistencies will be based on comparing results from two NLP tools, parsing and chunking. Information Extraction (IE) technics, aided by some specific-developed heuristic algorithms, are used. The approach has been applied to two different-written texts describing the Alternating Bit Protocol (ABP). A Semantic Net is automatically extracted. The error knowledge provides information to the user about what fragments of the text contained inconsistent structures or words and how they were or not solved. The implemented algorithm solved inconsistencies related to words tagged differently by the NLP tools and showed other errors due to the use of complex syntactic structures. Specific metrics were extracted that permitted identify some features of the texts.
Palabras clave: Error tolerant process; Heuristic algorithm; Industrial communication standard; Information Extraction; Natural language processing; NLP tools; Semantic Network; Setting chunking; Syntactical patterns
14th International Conference on Soft Computing Models in Industrial and Environmental Applications, Sevilla (España). 13 mayo 2019
Fecha de publicación: mayo 2019.
S. León, J.A. Rodríguez-Mondéjar, C. Puente, Inconsistency detection on data communication standards using information extraction techniques: the ABP case, 14th International Conference on Soft Computing Models in Industrial and Environmental Applications - SOCO 2019. ISBN: 978-3-030-20055-8, pp. 291-300, Sevilla, España, 13-15 Mayo 2019