FRDCSA | external codebases | bios-1.1.0

[Project image]

Jump to: Project Description

Project Description

Bios is a suite of syntactico-semantico analyzers that include the most common tools needed for the shallow analysis of English text. Currently the following tools are included: (*) Smart tokenizer that recognizes abbreviations, SGML tags etc. (*) Part-of-speech (POS) tagger. The POS tagger is implemented as a a wrapper around the TNT tagger by Thorsten Brants. (*) Syntactic chunking using the labels promoted by the CoNLL chunking evaluations (http://www.cnts.ua.ac.be/conll2000/chunking). (*) Named-Entity Recognition and Classification (NERC) for the CoNLL entity types plus an additional 11 numerical entity types.

This page is part of the FWeb package.
Last updated Sat Oct 26 16:56:45 EDT 2019 .