Crawler

Brief: Indescript focused crawler for package retrieval
Jump To: Parent Description

  • The seeker algorithm is relatively straightforward. Both keywords and URLs are used to seed the search. Keywords are used to search online search engines to retrieve web pages, through a module which learns effective queries. URLs are spidered. Speculative fetching is performed based on expectation that site is a project URL or a metasite, as classified by WebKB tools. In this way, a database of project URLs is found. Next, we use information extraction to populate KBs about software systems, then use these to intiate searches. Eventually we would like this to extend this to a set of tactics for retrieving all information related to packaging and systems integration.