Software Finder

Brief: Searches the web for software that performs a specific task
Jump To: Parent Description

  • I added a lot of new features to RADAR today. First, radar-web-search has been extended. This program originally allows for one to search for a topic on the net, say, "event extraction" after which it will build a large search for all the software on the net. It then searches Yahoo and looks at each page, to see if any software is linked:

    andrewdo@box:/var/lib/myfrdcsa/codebases/internal/event-system/IE$ radar-web-search "event extraction"

    QUERY: "event extraction" system OR java OR project OR library OR php OR web OR framework OR open OR manager OR linux OR engine OR net OR server OR management OR game OR tool OR tools OR client OR simple OR editor OR cms OR database OR\ file OR generator OR software OR network OR xml OR python OR based OR source OR plugin OR data OR amp OR language OR application OR control OR online OR toolkit OR interface OR 3d OR irc OR eclipse OR free OR api OR windows OR code OR \ os OR perl OR virtual OR development OR gui OR driver OR content OR module OR mail OR image OR suite OR player OR portal OR monitor OR platform OR simulator OR script OR object OR log OR media OR text OR easy OR browser OR search OR ser\ vice OR viewer OR de OR chat OR remote OR parser OR mysql OR time OR bot OR mobile OR converter OR sql OR daemon OR tracker OR rpg OR programming OR test OR gnu OR environment OR class OR utility OR gnome OR compiler OR internet OR 0 OR\ user OR utilities OR html OR package OR desktop

          Result: #1
          Url:http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=15494078
          http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=15494078
          Summary: With the explosion of molecular data, tools developed
          by computer scientists are ... BIND-The Biomolecular Interaction
          Network Database. Nucleic Acids Research. ...  Title: PASBio:
          predicate-argument structures for event extraction in molecular
          ...  $VAR1 = [];
    
          Result: #2
          Url:http://nlp.cs.nyu.edu/info-extr/
          http://nlp.cs.nyu.edu/info-extr/
          Summary: This system combines a web crawler (which searches for reports of outbreaks on a ... engine, and a data base browser to examine the extracted events (Proteus Project ...
          Title: Proteus Project: Information Extraction
          $VAR1 = [];
        

    So, that's what it does. But a problem it was having is it only looks one layer deep for tar.gz and zip files and the like. I wanted it to look further, but that would have been bandwidth and time expensive, searching all the links. So what I did was to download a dataset from:

    http://cybermetrics.wlv.ac.uk/database

    which contained a large dataset of web links. I then rated the last dir or file of the url that linked to a set of files for how many "desireable" files were there.

          i.e.  in the above url it would be "database"
    
          17.0028327481393        jars    40      9       211
          14.3298883847943        download.html   116     168     498
          12.9734278116825        edit.html       67      33      67
          10.8830909612802        patches 8       2       368
          10.4918735220202        Debug   38      16      42
          10.0194507146122        golem   29      6       30
          8.97441185481296        canaries13      20      0       20
        

    So I added that and now it can speculatively search 1 extra ply. It has already helped to find some new software.

    Secondly, I went ahead and added the ability to search within pdf and other documents and extract the URLs from them, so that research papers (which often link to systems, or at least name them) can be searched as well. This is a separate script that will integrate eventually with radar.

    /var/lib/myfrdcsa/codebases/internal/radar/scripts/get-software-by-searching-publications.pl