This system simply applies SVM training to debian package tag data to generate a tagger for arbitrary software descriptions.
This is just a proof-of-concept system, I would like to use a more involved system.
Results are qualitatively better using the SVM than the Bayesian classification.
$VAR1 = { 'Contents' => 'A Simple RTS is a project that was started as a 'resume demo'. The source will be publicly available as well as a blog detailing the construction from start to finish of development.', 'EstimatedCats' => [ 'role::program' ], 'ActualCats' => [], 'Name' => 'simplerts#436367' }; $VAR1 = { 'Contents' => 'Find out the number of audio tracks on a CDROM. numtracks is a simple command-line application that will print the number of tracks on the CD-ROM that is currently in the drive. This is useful for automated MP3-ripping of an entire album, via a simple Perl script, cdparanoia, and bladeenc/gogo. ', 'EstimatedCats' => [ 'role::program', 'interface::commandline' ], 'ActualCats' => [], 'Name' => 'numtracks #471879' }; $VAR1 = { 'Contents' => 'The Spacecraft Simulation Framework is a collection of toolkits & libraries to assist in the modeling, testing & analysis of spacecraft.', 'EstimatedCats' => [ 'scope::utility', 'interface::commandline' ], 'ActualCats' => [], 'Name' => 'spacecraft#439728' }; $VAR1 = { 'Contents' => 'UrlPlug provides link-browsing and editing for URL named resources in Eclipse. The UrlPlug view allows you to integrate filesystem and world-wide-web resources, and browse seamlessly between them. Useful for developers of web apps and content.', 'EstimatedCats' => [ 'role::program', 'x11::application' ], 'ActualCats' => [], 'Name' => 'url-plug#453694' }; $VAR1 = { 'Contents' => 'A GNOME Status docklet for XMMS. The XMMS status plugin provides a monitor for thestate of XMMS which docks into the GNOME/KDE panel. ', 'EstimatedCats' => [ 'suite::gnome' ], 'ActualCats' => [], 'Name' => 'xmmsstatusplugin #474491' }; $VAR1 = { 'Contents' => 'Graphical interface for ModelTest and MrModelTest written in Python, for Windows, Linux and Mac OSX', 'EstimatedCats' => [], 'ActualCats' => [], 'Name' => 'modelpie#394713' }; $VAR1 = { 'Contents' => 'A driver for DSL USB modems based on the Analog chipset Eagle 8051 (ADImodem). Eagle-usb is a Linux 2.4 driver for DSL USB modemsbased on the Analog chipset Eagle 8051 (ADImodem).The Sagem F@st 800, USRobotics Sureconnect 9000,Comtrend ct-350 / ct-361, Elcon 111U, and othersare reported to work. For Linux 2.6 and later,please refer to the uEagle-ATM project. ', 'EstimatedCats' => [ 'implemented-in::c', 'hardware::{modem,modem:dsl,usb}', 'hardware::storage', 'use::driver' ], 'ActualCats' => [], 'Name' => 'eagle-usb #499429' }; $VAR1 = { 'Contents' => 'ZBASS is a small, minimal music player that can play WAV, MP3 and OGG music files. It uses the Winamp Visualization plugins for Visualization. Features an Equalizer, Internet Radio, CD Player and Plugins (experimental for now.) Powered by Bass Sound Syst', 'EstimatedCats' => [ 'works-with-format::oggvorbis', 'sound::player', 'use::playing', 'works-with::audio', 'works-with-format::mp3' ], 'ActualCats' => [], 'Name' => 'zbass#466740' }; $VAR1 = { 'Contents' => 'A least cost router for German modem users. optisurf is a smartsurfer or least cost router formodem users living in Germany. It is an extensionto the kppp application from KDE. ', 'EstimatedCats' => [ 'role::program', 'scope::application', 'scope::utility', 'network::client', 'interface::commandline' ], 'ActualCats' => [], 'Name' => 'optisurf #499402' }; $VAR1 = { 'Contents' => 'An application which helps you catalog your Classical CDs. ClassiCollect is an SQL schema and data entry frontend specificallydesigned for cataloging a Classical CD collection. The frontend isdesigned to require as little typing as possible on the user\'s part,with reference tables for quick selection of composer, title, label,etc. ', 'EstimatedCats' => [ 'uitoolkit::gtk', 'x11::application', 'use::editing', 'role::plugin', 'interface::x11' ], 'ActualCats' => [], 'Name' => 'classicollect #491504' };