Well, my parameters are that I tend to like to promote and develop
free software, that I am opposed to
propaganda/persuasion/marketing/evangelism, and that I have my own
agenda which is the completion and promotion (already have to be
paraconsistent I guess) of my own project, the FRDCSA.

So, what I propose is that I just help to develop an open source tool
for automatic generation of argument mapping and that you use it
within debategraph.org eventually if it works well enough.

The problem is twofold - 1 that I don't really have experience with
existing argument mapping systems, and 2 that I am perpetually broke
because of always working on free software.

I mention 2 because you say ideal form.  So, if you can pay me that's
great.  If not, that's also great, however work needs must move more
slowly as a result.


I have written a paper about the technology here:

http://www.frdcsa.org/index.php?limitstart=10


I am distrustful of these tools being used to guage consensus because
that would just aid a propagandist in manipulating the arguments.


From a technical point of view, I am thinking that there are a number
of technologies that would make this good.  I think first of all there
needs to be a logical fallacy recognizer...  Perhaps that could work
by loading the arguments into a logical representation and seeing
whether the result is entailed - however, we would have to be
understanding of alethic modal logic in that case.

What we need is a database of marked-up arguments from text.  I should
be able to then train a combination of various technologies - from
topic mapping, to Topic Detection and Tracking (TDT) (etc, for making
segmentation judgements, etc), on such a dataset.  I wonder if you
don't have such a dataset.


In addition to these methods, I would think the best way to proceed is
by training classifiers over a logical representation of text, such as
Logic Form or any NLU format.  I have my own that I am working on.  In
just saying this, I realize that perhaps one way we can extract
arguments for and against is to take the debate extract concepts and
then generate statements of the form "I am in favor of "$X, and "I am
against"$X, etc.  Then, you would use Textual Entailment Recognition
(RTE) systems to identify which texts imply which position.

There is also of course Sentiment Analysis tools which would probably
do that more efficiently - for instance, there is an accessible system
called OpinionFinder (from UPitt), however there are probably more
such systems out there now - we could combine their results and use
voting.


In order to have all of this running correctly you pretty much need my
FRDCSA system as it has wrappers for many or all of these systems.  I
could release them (they are GPLv3) individually, but it may just make
more sense to ask The Perl Foundation to approve my grant to release
everything on CPAN.  Email them if desired.


Now - I have been gathering a variety of systems for the Formalization
component.  As there are rarely any systems for exactly what I am
trying to do, I usually just name things as they are.  The
formalization system is therefore called "Formalize".

What formalize must do is to encode into a logic the various
arguments.  This must not necessarily (for argument mapping) be done
with %100 precision, excepting the parts that influence how it is
mapped into the argument map.

There used to be a thing called Rhetorical Structure Theory - that
could be useful for annotating text, as there is RSTTool which helps
and there is a modified version of it as well.  In addition, we'll
probably use CoNLL style semantic role labellers like Shalamanneser
(spelling?).  Again I have these tools already.

All of these will go as input.  In terms of formatting the text we'll
need some additional systems - mainly normalization (elimination of
idiomatic expressions and paraphrasing / hopefully using the EAT
Thesaurus (and a tool from MIT I think called analogyspace) for
extracting connotation - which is important in flavoring)).
Paraphrasing can be done with DIRT and entailment by using XWordnet
and either CAndC Nutcracker or my own as of yet unfinished Entailment
Recognizer.

There should also be a lexical entailment system, there was a class
assignment to implement such a system so one could be available.

In addition to idiomatic normalization there should be refactoring of
statements - there are two wonderful recently accessible systems.
First is KNext, which extracts implicit assumptions from text (this
would be necessary as otherwise people can load invalid assumptions
into their speech).  For instance - the question "When did you stop
working on AI?" implicitly assumes you were previously working on AI.

Additionally, for identifying independent clauses there is the
FactualStatementExtractor from CMU.

CONLL also has a system for "hinge" recognition - that would be
useful.


So the end result of all these systems would be a logical
representation - the representation could be over Modal, Higher-Order,
and or First Order Logic.  At present I only have the Vampire FOL
system working with the knowledge base.

So once arguments have been mapped into their logic form - I think
their extraction and representation becomes much easier.  Extraction
could be done by posing a set of questions along the debate map graph
and using the entailment recognizer.  The answers, as extracted, would
then be put through a natural language generation system for proper
formatting.


So, I neglected to mention several systems that are relevant to this
whole process, but this pretty much outlines the architecture.  In
order to have this working effectively, it presumes completion of a
few different systems. 

Also, I have completely neglected to mention anything about Document
Management which is critical to this whole enterprise.  Other than TDT
or topic classification.  Focused crawling could be used here to great
success.

Also Formalize requires successful cross document coreference
resolution, and there are three such systems for that (arkref,
reconcile and BART).


The systems that need to be completed are the UniLang cross computer
messaging.


The long and short of it is, although getting all of this to work
correctly would be time consuming and difficult, what could be done is
to get the process started, especially in terms of collecting
annotated datasets.  Then, we could focus on best-of-breed systems
that did increasingly difficult tasks.  For instance, we could do it
piecemeal.


The other big area is how to introduce automated judgements into the
user interface so that users are aware of and can correct mistakes,
and corrections can be adjudicated by people and formally verified by
the system.