Natural Language Processing is a branch of Artificial Intelligence concerned with using computers to automatically process and understand natural languages. Natural language refers to languages such as English, French, German, etc., rather than artificial computer programming languages. There are a number of reasons why this will be an important technology in the 21st century. First, computers are gaining increasing importance in our society, and being able to communicate with them in a natural way, using spoken and written language, will become more desirable. Second, we are producing very large amounts of online electronic information; we require tools which can automatically process this information, to summarise it, to answer questions about it, to translate it, to find relevant documents within it. The staggering rise of Google demonstrates the importance of this kind of technology.The proposed research concerns the processing of a particular kind of text, namely the scientific articles produced by the biological research community. Biology produces an enormous number of new articles each year, far too many for any one individual to keep up to date with. Automatic computer tools are required which can process this information. For example, a biologist might want to know whether there is a paper on the Web answering a particular question about some gene.Sophisticated text processing, such as translating a document from one language to another, summarising documents, or answering questions, requires sophisticated language processing tools. A very useful tool for these kinds of tasks is a parser , which automatically determines the grammatical structure of a sentence and how the words in the sentence are related. For example, it would determine the verbs in the sentence, and how the nouns are related to the verbs. This information is needed if a computer is to be able to understand the text.The Natural Language Processing community now has very good parsing technology. However, the existing parsers are good at analysing certain kinds of text, such as newspapers, but not so good at other kinds of text, such as biology research papers. The reason is that the parsers have learned about language from linguistic resources created by humans, and the resources are based on newspaper text. Creating these resources from scratch for biology would take too long, and so the proposed research will investigate ways in which parsers tuned for newpaper text can be ported to handle biological text.
|