Project Description | Parent Description | Capabilities
The seeker algorithm is relatively straightforward.
and URLs are used to seed the search.
Keywords are used to search
online search engines to retrieve web pages, through a module
which learns effective queries.
URLs are spidered.
fetching is performed based on expectation that site is a project
URL or a metasite, as classified by WebKB tools.
In this way, a
database of project URLs is found.
Next, we use information extraction to populate KBs about software
systems, then use these to intiate searches.
Eventually we would
like this to extend this to a set of tactics for retrieving all
information related to packaging and systems integration.
- Use data on what files we have found that are useful to the FRDCSA, and use their descriptions and relevant information to guide a focused crawler into finding similar stuff.
- See about using my browser's link history to build a model of interest and use to train a focused crawler.
- Write a better focused crawler
- Come up with new name for crawler.
- Sorcerer, -a crawler
- Combine focused crawler: deb http://combine.it.lth.se/ debian/
- Ask William Cohen's graduate students about how to write a focused crawler for AI software.
- should look into a domain specific web crawler
This page is part of the FWeb package.
Last updated Sat Oct 26 16:51:03 EDT 2019