--- Forwarded Message from Karen Smith <[log in to unmask]> --- >Date: Tue, 20 Jul 1999 15:37:30 -1000 >To: [log in to unmask] >From: Karen Smith <[log in to unmask]> >Subject: ERGO: Parser Integrity ------------------ >To the readers. > I am a linguist working at Ergo Linguistic Technologies in Honolulu, HI. We are currently attempting to refresh and update our collection of parsers and parser web sites. We currently have the following parsers in our offices: Davy Temperley, Daniel Sleator, and John Lafferty's "The Link Grammar Parser" from Carnegie Mellon University, "LFG" from Xerox PARC, "Apple Pie Parser" from NYU, "ENGCG Constraint Grammar Parser of English" from Lingsoft, Inc., "The Functional Dependency Grammar Parser" of Atro Voutilainen and Mikko Silvonen from Finland, Georgetown University's "Natural Language Processing Parser", Stanford University's "LinGO Parser", Prospero Software's "Parser Version 1.0 for DOS", "The FranklinParser" from Proximity Technology, Inc., and "Natural Language Parser Demo" from The University of Finland's Natural Language Processing Department. If anyone knows of any other parsers,especially from universities or high technology development corporations like IBM or Microsoft, please let me know. We are also looking for software tools which use parsers as an internal component. We will post a complete list of these tools and the relevant websites on our homepage on a "related sites" link. All feedback is welcomed. The standards by which each of the parsers listed below were judged can be located at the Ergo Linguistic Technologies website under "parser contest". Here you will find a full explanation of what Ergo Linguistic Technologies feels the standards for parsing technology should be. Basically, the analysis is broken into seven different areas, each having several objectives which need to be met. The seven categories are as follows: structural analysis of strings, evaluation of strings, manipulation of strings, question/answer, statement/response repartee, recognition of the essential identity of synonymous structures, navigation and control, and lexicography. CARNEGIE MELLON UNIVERSITY [LINK GRAMMAR] The Link Grammar Parser is "a syntactic parser of English, based on link grammar, an original theory of English syntax." This parser identifies parts of speech, parts of sentence, internal clauses and sentence type, but does not identify tense and voice of main clause and internal clauses (identifies tense only). It recognizes acceptable strings, gives number of correct parses that succeeded, identifies phrases of acceptable parses, and gives the number of unacceptable parses that were tried, but does not give the exact time of the parses in seconds or reject unacceptable strings. It has no manipulation of strings. It identifies whether a string is a yes/no question, a wh-question, or a command, but does not have any other statement/response, question/answer repartee. It demonstrates no recognition of the essential identity of synonymous structures and demonstrates no navigation and control functions. The lexicon has 60,000 words and the core vocabulary is suitable to a wide variety of applications. The parser recognizes single and multi-word items and recognizes a variety of grammatical features. It does not have tools to facilitate the addition, modification, or deletion of lexical entries and it can not mark and link synonyms and classes of lexical items. The output of this parser is in the form of a tree diagram consisting of a series of linkages. Each link is marked with Link Grammar's own proprietary labels. This system was found to be rather hard to follow since at every link one must refer back to a previous page to uncover the meaning of that particular link. http://bobo.link.cs.cmu.edu/grammar/html/intro.html LINGSOFT, INC. [ENGCG CONSTRAINT GRAMMAR PARSER OF ENGLISH] This parser, developed at the Department of General Linguistics at the University of Helsinki, gives a morphological analysis of running English text. It identifies the parts of speech and parts of the sentence, but does not identify internal clauses, sentence type or tense and voice of the main clause or internal clauses. It does identify the phrases of successful parses, but does not recognize acceptable strings, reject unacceptable strings, give the correct number of parses that succeeded or the number of unacceptable parses that were tried. It also does not give the exact time of parses in seconds. This parser generates no manipulation of strings. It is capable of identifying whether a string is a statement, yes/no question, wh-question or a command, but demonstrates no other question/answer, statement/response repartee. This parser also recognizes the heads of phrases with and without associated modifiers, but it has no other recognition of the essential identity of synonymous structures. It distinguishes commands from questions and statements, but does not distinguish commands for OS characters or programs, does not provide a sufficiently detailed analysis of commands to allow proper responses, and it does not recognize synonymous commands. There is no data available on the size of the lexicon, but it does recognize single and multi-word items, recognizes a variety of grammatical features, and has a core vocabulary that is suitable to a wide variety of applications. However, it can not mark and link synonyms and classes of lexical items, and it does not have tools to facilitate the addition, modification, and deletion of lexical entries. The output of this parser is in the form of a list which provides a part of speech and part of sentence analysis. http://www.lingsoft.fi/cgi-pub/engcg UNIVERSITY OF HELSINKI [FUNCTIONAL DEPENDENCY GRAMMAR PARSER FOR ENGLISH] This parser gives a surface-syntactic analysis of a running text. This parser identifies parts of speech and parts of the sentence, but does not identify internal clauses, sentence type or tense and voice of the main clause or internal clauses. It does identify the phrases of successful parses, but does not recognize acceptable strings, reject unacceptable strings, give the correct number of parses that succeeded or the number of unacceptable parses that were tried. It also does not give the exact time of parses in seconds. This parser generates no manipulation of strings, and has no question/ answer, statement/response repartee. Furthermore, it does not recognize the essential identity of synonymous structures and demonstrates no navigation and control functions. No information was available on the size of the lexicon, but it does recognize single and multi-word items, a variety of grammatical features, and seems to have a core vocabulary that is suitable to a wide variety of applications. However, it does not have any tools to facilitate the addition, modification, or deletion of lexical entries and it is unable to mark and link synonyms and classes of lexical items. The output of this parser provides a part of speech and some part of sentence analysis in the form of a list. http://www.ling.helsinki.fi/~tapanain/dg/eng/demo.html GEORGETOWN UNIVERSITY [NATURAL LANGUAGE PROCESSING PARSER MODULARITY DEMONSTRATION] This parser identifies parts of speech and parts of the sentence, but does not identify internal clauses, sentence type or tense and voice of the main clause or internal clauses. It does recognize acceptable strings and reject unacceptable strings, gives the number of correct parses that succeeded, but not the number of unacceptable parses that were tried. It also identifies the phrases of acceptable parses and gives the exact time of parses in seconds. This parser demonstrates no manipulation of strings or question/answer, statement/response repartee. It also can not recognize the essential identity of synonymous structures and demonstrates no navigation and control functions. The lexicon does not contain a minimum of 50,000 words, but rather has only 23,000 entries. However, it does recognize single and multi-word items as well as a variety of grammatical features. It also has tools which facilitate the addition, modification, and deletion of lexical items. However, its core vocabulary is not suitable to a wide variety of applications and it is unable to mark and link synonyms and classes of lexical items. The output of this parser provides a part of speech analysis for each word in the sentence in the form of a list. http://www.georgetown.edu/cgi-bin/compling/slctscr.pl STANFORD UNIVERSITY [LINGO] Linguistic Grammars Online or LinGo is a "multi-purpose broad-coverage grammar of English". This parser identifies parts of speech and tense and voice of the main clause, but the output from the parse is not very clear. It does not identify parts of the sentence, internal clauses, the tense and voice of internal clauses, or sentence type. It recognizes acceptable strings, unacceptable strings, gives the number of correct parses that succeeded, and identifies the phrases of successful parses. However, it does not give the number of unacceptable parses that were tried or the exact time of parses in seconds. This parser is able to identify tense and voice in sentences with and without internal clauses, but demonstrates no other manipulation of strings. It identifies tense in questions, but does not identify the appropriate tense for responses. It shows no other question/answer, statement/response repartee. It also shows no recognition of the essential identity of synonymous structures and demonstrates no navigation or control functions. There was no information available on the size of the lexicon, however many words were found in the dictionary. It recognizes single and multi-word items and recognizes a variety of grammatical functions. However, it does not have tools to facilitate the addition, modification, and deletion of lexical entries and it is not able to mark and link synonyms and classes of lexical items. The output of this parser provides a part of speech analysis, however it is somewhat hard to follow and no explanation of the labels were given. Upon corresponding with Rob Malouf via email about this, I was referred to another web address containing a document which explains the labels more thoroughly. This explanation can be found at ftp://ftp-csli.stanford.edu/linguistics/sag/mrs.ps.gz http://hpsg.stanford.edu.8000/lingo/parser.html PROSPERO SOFTWARE [PARSER VERSION 1.0 FOR DOS] This parser is able to identify parts of speech, but it is not able to identify parts of a sentence, internal clauses, sentence type, or tense and voice of the main clause or internal clauses. This parser shows no evaluation of strings or manipulation of strings. It does identify tense in questions, but does not identify the appropriate tense for responses. It demonstrates no other question/answer, statement/response repartee and demonstrates no recognition of the essential identity of synonymous structures or navigation and control functions. This parser has a large dictionary with several hundred thousand entries, well above the suggested 50,000. It recognizes single and multi-word units as well as a variety of grammatical features. The core vocabulary is suitable to a wide variety of applications however, the parser does not have tools to facilitate the addition, modification or deletion of lexical entries and it is unable to mark and link synonyms and classes of lexical items. This parser's output provides a part of speech analysis in the form of a list. http://www.prosperosoftware.com/np1id2.html PROXIMITY TECHNOLOGY, INC [FRANKLIN PARSER] This parser can be found in Ken Litkowski's Dictionary Maintenance Programs also referred to as DIMAP. This parser identifies parts of speech, parts of a sentence, and internal clauses, but it is not able to identify sentence type, tense and voice of main and internal clauses. The Franklin parser does not recognize acceptable strings or reject unacceptable strings. It also does not give the number of correct parses that succeeded or the number of unacceptable parses that were tried. It also does not give the exact time of parses in seconds. However, it does identify the phrases of successful parses. It does not show any manipulation of strings, question/answer, statement/response repartee, recognition of the essential identity of synonymous structures, or navigation and control functions. The dictionary includes more than 120,000 headwords and the core vocabulary is suitable to a wide variety of applications. The parser recognizes single and multi-word items as well as a variety of grammatical features , but it is not able to mark and link synonyms and classes of lexical items. This parser's output provides a part of speech and part of sentence analysis in the form of a chart. http://proximity.franklin.com/parse.htm UNIVERSITY OF FINLAND'S NATURAL LANGUAGE PROCESSING DEPT. [NATURAL LANGUAGE PARSER] This parser has not been thoroughly examined as of the present. Preliminary assessments show that the parser identifies parts of speech and parts of sentence, but does not identify internal clauses, sentence type, tense and voice of main and internal clauses. It also recognizes acceptable strings and rejects unacceptable strings. This parser is case sensitive. It gives the correct number of parses that succeeded, but does not give the number of unacceptable parses that were tried or the exact time of parses in seconds. It shows no manipulation of strings, question/answer, statement/response repartee or recognition of the essential identity of synonymous structures. The lexicon uses a collection of dictionaries such as CUOVALD, Word Net, and Link Grammar, so the core vocabulary is suitable for a wide variety of applications and it recognizes a variety of grammatical features and single and multi-word items. A more complete analysis of this parser will be completed in the near future. The output of this parser provides a part of speech and some part of sentence analysis in the form of a tree diagram. http://pointti.vip.fi/nlpd.html XEROX PARC [LFG PARSER] This parser is currently undergoing evaluation and a complete analysis will be posted to our website when it is available. We are in the process of contacting Xerox PARC for more information about this product. ftp://ftp.parc.xerox.com/pub/lfg/ NEW YORK UNIVERSITY [APPLE PIE PARSER] This parser is currently undergoing analysis. When analysis is available, it will be posted to our website. http://cs.nyu.edu/cs/projects/proteus/app/index.html UNIVERSITY OF MANITOBA [MINIPAR] This parser was downloaded, but the demo was unable to be opened. Our programmer is currently working on the problem. When an analysis is available, it will be posted to the website. http://www.cs.umanitoba.ca/~lindek/minipar/htm and of course our own parser at ... ERGO LINGUISTICTECHNOLOGIES http://www.ergo-ling.com/ For those of you who would like to look at and compare parsers but are unfamiliar with parsing, you can go to the Ergo web site "Parsing Contest" page to find good test sentences and a discussion of standards for comparing parsers. It should take just a few hours to actually go through, look at and try all these parsers. > > > Karen Smith > Linguist > Ergo Linguistic Technologies > 2800 Woodlawn Dr., Ste. 175 > Honolulu, HI 96822 > Tel (808) 539-3920 > Fax (808) 539 -3924 [log in to unmask] http://www.ergo-ling.com/