.
FRDCSA | minor codebases | Software Finder
Homepage

[Project image]
Software Finder

Architecture Diagram: GIF

Jump to: Project Description | Parent Description | Capabilities

Project Description

I added a lot of new features to RADAR today. First, radar-web-search has been extended. This program originally allows for one to search for a topic on the net, say, "event extraction" after which it will build a large search for all the software on the net. It then searches Yahoo and looks at each page, to see if any software is linked:

andrewdo@box:/var/lib/myfrdcsa/codebases/internal/event-system/IE$ radar-web-search "event extraction"

QUERY: "event extraction" system OR java OR project OR library OR php OR web OR framework OR open OR manager OR linux OR engine OR net OR server OR management OR game OR tool OR tools OR client OR simple OR editor OR cms OR database OR\ file OR generator OR software OR network OR xml OR python OR based OR source OR plugin OR data OR amp OR language OR application OR control OR online OR toolkit OR interface OR 3d OR irc OR eclipse OR free OR api OR windows OR code OR \ os OR perl OR virtual OR development OR gui OR driver OR content OR module OR mail OR image OR suite OR player OR portal OR monitor OR platform OR simulator OR script OR object OR log OR media OR text OR easy OR browser OR search OR ser\ vice OR viewer OR de OR chat OR remote OR parser OR mysql OR time OR bot OR mobile OR converter OR sql OR daemon OR tracker OR rpg OR programming OR test OR gnu OR environment OR class OR utility OR gnome OR compiler OR internet OR 0 OR\ user OR utilities OR html OR package OR desktop

      Result: #1
      Url:http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=15494078
      http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=15494078
      Summary: With the explosion of molecular data, tools developed
      by computer scientists are ... BIND-The Biomolecular Interaction
      Network Database. Nucleic Acids Research. ...  Title: PASBio:
      predicate-argument structures for event extraction in molecular
      ...  $VAR1 = [];

      Result: #2
      Url:http://nlp.cs.nyu.edu/info-extr/
      http://nlp.cs.nyu.edu/info-extr/
      Summary: This system combines a web crawler (which searches for reports of outbreaks on a ... engine, and a data base browser to examine the extracted events (Proteus Project ...
      Title: Proteus Project: Information Extraction
      $VAR1 = [];
    

So, that's what it does. But a problem it was having is it only looks one layer deep for tar.gz and zip files and the like. I wanted it to look further, but that would have been bandwidth and time expensive, searching all the links. So what I did was to download a dataset from:

http://cybermetrics.wlv.ac.uk/database

which contained a large dataset of web links. I then rated the last dir or file of the url that linked to a set of files for how many "desireable" files were there.

      i.e.  in the above url it would be "database"

      17.0028327481393        jars    40      9       211
      14.3298883847943        download.html   116     168     498
      12.9734278116825        edit.html       67      33      67
      10.8830909612802        patches 8       2       368
      10.4918735220202        Debug   38      16      42
      10.0194507146122        golem   29      6       30
      8.97441185481296        canaries13      20      0       20
    

So I added that and now it can speculatively search 1 extra ply. It has already helped to find some new software.

Secondly, I went ahead and added the ability to search within pdf and other documents and extract the URLs from them, so that research papers (which often link to systems, or at least name them) can be searched as well. This is a separate script that will integrate eventually with radar.

/var/lib/myfrdcsa/codebases/internal/radar/scripts/get-software-by-searching-publications.pl

Capabilities

  • For software-finder, go ahead and figure out which links to follow on pages that don't contain the links themselves.
  • Add CLI options such as number to list to software-finder.
  • Use Google sets to discover more capabilities like "question answering", "fact extraction", etc, then run through software-finder
    ("completed" "52289")


This page is part of the FWeb package.
Last updated Sat Oct 26 16:46:53 EDT 2019 .