Project Description | Parent Description | Capabilities
workhorse is a system to set up dedicated servers for the creation
of tagged, analyzed and understood texts, and other linguistic
For datasets, we have Wikipedia, Gutenberg, and
hopefully fulltext books from Google Books, all appropriately
We aim to develop a highly annotated freely-available
corpus of marked-up texts that have been processed with a wide
variety of state of the art systems.
We also aim to apply natural
language understanding, knowledge base population, and other
techniques, onto the texts to derive useful knowledge.
- our system should pass all downloaded and visited webpages, probably using a squid proxy, through an analysis system.
It can check for things like repo links, formalize the text knowledge using workhorse, etc. etc.
- workhorse should understand the difference between a sentence, a paragraph, etc. This info should be given from the iaec/universal-parser stuff.
- We need to develop tools to more easily manage the execution of the nlp systems for workhorse.
- Edit workhorse to save the results of the analysis, not just the KNext output.
- Develop a tool which we use to record our decision making for different problems.
For instance, in trying to determine where to put the new aloysius system, I ruled out a merger with the services.frdcsa.org because that system is currently running on justin, which is insecure.
Although, actually, I could move it to workhorse if I could find a way to route it.
- use puck with workhorse
- Troubleshoot the way any throughput to the workhorse computer messes up the rest of the internet.
- For workhorse, use GATE as the corpus manager, etc.
- Buy hard drives and add to ai.frdcsa.org, workhorse.frdcsa.org and node
- Set up the FWeb2 to display datasets for workhorse, such as KNext processed texts, etc.
- For the workhorse system: http://www2003.org/cdrom/papers/refereed/p831/p831-dill.html
This page is part of the FWeb package.
Last updated Sat Oct 26 16:49:17 EDT 2019