"Sometimes helping others is the surest way of helping yourself." - Snow Buddies

Its mission, plain and simple, to maximize problem solving capabilities. More precisely, to improve the human condition. Software can solve an incredible range of real-life problems. With Free Software, we can copy this software endlessly free of charge, i.e. with zero-marginal cost.

Theoretical Motivation:

Algorithmic Information Theory warns:

"Program size is a constraint on program capability."

Longer programs are not necessarily better, just by tacking on some characters to the end of the program does not make it better. But it is ultimately necessary, because otherwise you can't fit the required information into the program.

Since better systems must be larger, and since we can't write everything ourselves, a good idea is to conglomerate existing systems.

There are many practical benefits to packaging this software. Once a package has been made, that functionlity is universally accessible, whereas previously it belong only to those with the time and know-how to get it compiled and working. Even among people who know how to do this, they won't have to reinvent the wheel, so instead of NxM configuration attempts you only have N.

There is a hidden benefit to packaging these systems. That means it is easier to both develop and use systems that depend on these, that build on these. So it is software reuse in a larger sense.

I would like to start a group of people that generates at least package every day (if possible). When the automated tools are finished it should be possible to generate literally hundreds of "rough-quality" packages per day.

"FRDCSA" stands for "Formalized Research Database: Cluster, Study and Apply".

Cluster: The FRDCSA is a large knowledge base of software (codebases) that is actively maintained and expanded using a variety of methods (see RADAR Internal Codebase ).

Study: Depending on licensing, packages or installers are semi-automatically created for each codebase. (see Packager Internal Codebase )

Apply: Packaged systems are then used to solve existing problems. (see Architect Internal Codebase )

Overview of the CSA Subsystem

The part of the FRDCSA that is responsible for collecting and packaging software is the CSA toolchain, which stands for Cluster Study and Apply.

The CSA toolchain consists primarily of the internal codebases CSO, RADAR, Packager and Architect, and hundreds of dependencies, such as other FRDCSA systems, FLOSSMole, FOSSology, the debian packaging tools, various Perl modules, and so on.

The CSO system stands for Comprehensive Software Ontology, and has an ontology of software that is available (currently about 150,000 - 300,000 systems).

A shared priority queue should be develop, which roughly orders package creation.

The software is then retrieved by the RADAR system, and put into a local directory.

The Packager system then generates a semi-automatic package of the system.

The packages are then uploaded to an unofficial repository.

Architect is responsible for analyzing system capabilties and matching them to project requirements.

So far, 376 packages have been made. We have not yet begun to automatically iterate over existing codebases, except in the case of CPAN, where Jos I. Bouman has generated hundreds of packages. We wish to do all of CPAN, but require more computers and some labor.

The Detailed Explanation

While much of the work has been completed, I propose to work with people on tools that:

Create a Comprehensive Software Ontology (CSO) of existing software, including license information, project capabilities, all sorts of information about the projects, and make tools that use this information. A large number of projects can be queued from the FLOSSMole data.

The CSO system (http://frdcsa.onshore.net/frdcsa/internal/cso) currently has a MySQL database of all the software from Freshmeat, Sourceforge, and several other major software indexes, data from the FLOSSMole project. (linking redundant entries can be done using the (non-free) MNOP tool (http://www.autonlab.org/autonweb/10514), but they've not yielded their software.) The problem is similar to the problem of "Web People Search" http://nlp.cs.swarthmore.edu/semeval/tasks/task13/summary.shtml.

Create a webspider that can automatically index metasites like the ones like this: (http://ml-site.grantingersoll.com/index.php?title=Existing_Learning_Tools, http://mloss.org/software/), (detecting duplicates using alias detection and the same software they use to differentiate people), and begin adding those repositories to the CSO.

The process of finding and inserting these metasites is more complex. An information extraction technique called MDR (http://citeseer.ist.psu.edu/liu03mining.html) has been mostly implemented for extracting software from metasites, and automatic detection of metasites using one of the projects called WebKB (http://www.cs.cmu.edu/~webkb/) has been experimented with. The most practical method is to just do it semi-automatically using the RADAR system (http://frdcsa.onshore.net/frdcsa/internal/radar). But the best method would be for people to submit metasites to a repository and then either automatically or semiautomatically process those.

When software systems have been found and their ontological relations mapped out, it then becomes possible to make deductions about whether their capabilities may be incorporated to our systems, i.e., we can determine to what extent it is possible to use the codebase, according to licensing restrictions. So for instance, if the license permits, packages are created for various architectures using our packager system, and these are uploaded to our separate use-at-your-own-risk package archive for rough quality packages for various systems (Linux (Debian, Gentoo, RHN, apt-rpm), BSD (FreeBSD), etc.). The alien tool is used to translate from Debian to other formats. We will probably extend Alien to improve its accuracy and formats, such as for emerge and ports. We can also begin manually tweaking these packages for inclusion to the main repositories of Debian, Gentoo, Redhat, etc.

We should improve packager by enhancing the coverage of automated, scripted and data-driven methods of transforming from upstream sources to source packages (a.k.a. packaging the software).

The CSO must manage software licensing information and determine which software is appropriately licensed for packaging into different archives, whether it is free, non-free or simply a package that generates an installer. Fortunately the FOSSology has taken care of analyzing licenses. For software which is not licensed compliant to DFSG, systematically petition the authors to rerelease the software under a compatible license.

Once the codebases are found online, their capabilities need to be indexed/formalized. That is where Architect comes in.

The architect is the next logical step of the radar/packager tool chain. radar is a tool for automatically finding software. packager is a tool for automatically packaging software. Now, architect is a tool for automatically applying software - that is, for planning on how the functionality of a given piece of software could be automatically applied to a certain problem domain. For instance, it seems fairly evident that we should have Q/A technology working with man pages (like the umich demonstration). (A feat which has since been implemented with QUAC) architect would be challenged with recording or discovering this application and for semi-automatically applying it. So architect is obviously the fulfillment of the initial charge to "Cluster (radar), study (packager) and Apply (architect)".

How does architect work? We are developing a tool called Formalize which is responsible for mapping such information to a formal representation. It can be a component of a Recognizing Textual Entailment system.

The Architect system is responsible for knowing how to combine internal and external codebases to meet requirements / wishlist items. BOSS oversees aspects of project development, and Code-Monkey handles some aspects of programming. Requirements are managed by the PSE (Planning Scheduling and Execution system), and more recently, FreeKBS. FreeKBS is a nifty knowledge management system.

Whenever a new idea for a capability is had by the user, they bring up the UniLang Client. UniLang is a multi-agent system with automatic message routing, auto-vivification of agents, logging and RPC, etc.

Ideas are recorded into UniLang. The motto of UniLang is "no sooner said than done". Rather than having to find the appropriate place to put this message, UniLang automatically routes it to the proper agent.

These agents are nice because the systems can use each others functionality. The entries recorded into UniLang are analyzed by several other systems, such as Architect, FWeb, and PSE.

In order to mark a given capability completed, the user uses the FreeKBS system.

Packaging Internal Codebases Themselves.

Packaging the internal codebases themselves proved slightly more difficult than simply using the existing Packager system, so a new expert system was developed for this particular case, the capabilities of which will be migrated to Packager when they are matured.

This system is called Task1, indicating the overall importance of this task.

(Demonstrate Task1)

Redistributing the FRDCSA is a very large task, because its subsystems are all highly internconnected, and have literally hundreds of dependencies, and while I have software that automatically computes this, there is still alot of work with which I would appreciate help.

However, with power comes responsibility, and I fear that these systems will be misused. So it is very important to work on software that actually supports the values you profess to maintain. For instance the meal planner, which solves diet and nutrition problems, could be used by aggressive militaries to maintain their armies of oppression. While I want to use the GPL, I am tempted to use the Hacktivismo License, as it denies use to abusers of human rights, spy groups, etc. My indecisiveness here, in conjunction with the technical difficulty, are the primary reasons that the system has never been fully released.

We will now play around with the system.

# Record a capability request. Package a perl module. Package a system.

To be covered in the next lecture:

Other important FRDCSA systems. Various hacks. The hacklab.

Any questions?