FRDCSA | minor codebases | Sayer

[Project image]

Architecture Diagram: GIF
Code: GitHub

Jump to: Project Description | Parent Description | Capabilities

Project Description

sayer gets its name because it builds a context by asserting interesting facts about arbitrary Perl data structures. sayer (along with thinker) is one of the most interesting projects of the FRDCSA. What it does is index arbitrary perl data structures, and attempt to derive interesting information and conclusions about that data using machine learning. For instance, if your data structure consisted of a string, and that string contained a paragraph of text, sayer would apply a decision tree or similar, set of tests, to determine that it indeed was as we described. It would represent this relation as a graph, with verticies as data-points, I.e. the input, and "true", and edges as function calls. All data of course is stored to a database. This graph data is then used as input to classifiers that attempt to distill summarily interesting information about said data. For instance, if it is a sentence, it may well wish to perform various NLP procedures, extracting things like named entities, and recursively analyzing those within it's attention span. It uses Perl as knowledge representation interlingua. The architecture is expansive, complex, and beautiful and integrates many other FRDCSA systems.


  • Does sayer correctly handle hashes as these will change each time?
  • Resolve issue with KNext being slow because of presumably using up a large sayer cache.
  • Resolve issue with sayer being slow because of presumably using up a large sayer cache.
  • For FreeKBS2, we could have indexed versions of statements that refer to the new sayer cached versions of them.
  • Make sayer not dependent on a DB, not big enough. Have data stored on the file system or something.
  • Write a search engine for sayer.
  • nlu should use an algorithm which takes all the objects in sayer that are plain text instances and matches them in strings. For instance if the following were sayer data points: 'the' and 'there', and the text read "there's a lot of stuff", it would assert the matches for 'the' and 'there'. Obviously this needs to be constrained somehow, as there would often be individual words, so there might be an interestingness or relevance constraint - or maybe some kind of procedural semantics.
  • Use KBFS, have it learn when to automatically attribute facts to files based on certain correlations. Also use sayer and sayer-Learner, and thinker to learn when these facts apply to the files. Then, for instance, do automatic classification of text files into subject headings. Ultimately organize all of the research papers and documents I have into a coherent, cohesive whole.
  • Make sayer queryable. Especially through Cyc.
  • Get nlu's sayer working correctly, it seems to always have the same entry id.
  • Use the SubL from CycL function mechanism defined in the paper other-ways-to-extend-cyc.pdf in order to allow Cyc to access sayer information.
  • All of sayer should be searchable with an index.
  • Add crypto-signing to knowledge from sayer and nlu, etc.
  • See about integrating sayer into Cyc via the Semantic Knowledge Source Integration.
  • KMax can have the sayer id of a buffer.
  • Use deep learning combined with sayer.
  • Use KBFS to tag which files are known to be software archives, versus which are known to be data sets, versus which are not known to be either, versus which we know nothing about at all. This database could be bootstrapped by iterating over /var/lib/myfrdcsa/codebases/external versus /var/lib/myfrdcsa/codebases/datasets, and then tagging several of neither kind. Then extract features of those data points, using sayer, etc, to do machine learning to automatically classify items as being software collections or not.
  • Do taint analysis with sayer information.
  • Add privacy controls to sayer information.
  • Make it easy to analyze where data gets written by sayer etc all, and to remove it, say for instance some private information is processed by it.
  • Build the sayer/nlu/KBFS system that asserts information about files and explores all the possible things to assert about them.
  • nlu/KBFS/sayer should practice as in the field of deduction.
  • nlu/KBFS/sayer should say: "Consider this file", and then begin making notes about it.
  • Add a mysql or sayer backend to PerlLib::Collection.
  • Have to have the option of passing context information about the item as derived from KBFS/sayer, such as for instance if it is the top of the stack.
  • Have the option to query the commands that can be run on the given entries in the freekbs2-stack. For instance, if they haven't been processed with nlu/sayer/KBFS
  • Offer the ability to correct automatic annotations by nlu/sayer/KBFS
  • Write a converter for converting existing formats like the storagefile for file based PerlLib::Collection to mysql or sayer based.
  • auto-packager should use data enrichment of package orig.tar.gz, debian/* and included patches via sayer/thinker, nlu and kbfs as input features to various machine learning systems in order to determine how to automatically package something for Debian. brilliant. difficult though.
  • Add profiling to sayer/Capability::TextAnalysis
  • Add the ability for nlu to use sayer information in it's output.
  • There was a paper on exploiting information using MDPs or something for attacking systems, and I imagine that same technique could be used for the sayer/thinker/Learner/suppositional-decomposer systems in order to optimize the exploration of the "hypothesis space"
  • Create a PerlLib::Collection type that lives in sayer and one in KBS2.
  • Combine sayer with CSO, to have a system that asserts knowledge about software packages and archive files, etc. That's KBFS, so finish it.
  • Analyze the congruence between REDIS and sayer
  • Reset all of the sayer data on nlu because it's all messed up, figure out where it's going wrong and right the wrong.
  • Add to ppi-convert-script-to-module the ability to recognize conflicts with included modules, like $self->sayer(sayer->new), and to get the inits correct
  • Add a feature to sayer for execution time.
  • Setup a sayer ammortizor, which ammortizes calls to systems that have a huge overhead
  • Use feature learners as part of sayer.
  • Add a capability to sayer to mark data that does not go through external services.
  • Write the sayer web interface.
  • Make sayer thread safe.
  • sayer can use microtheories.
  • Can use sayer to record what texts we have read.
  • Perhaps sayer can archive the full source of any function, to help determine determinism.
  • sayer should use a function version notation, so that when it is updated, results are recached - to prevent mistakes.
  • There are similarities between sayer and the nlp system.
  • Construct a feature learning to learn which features to use for sayer.
  • sayer: have it consider any general patterns that a function is being called with.
  • sayer: that's what learner does, it memoizes function calls. That's what we need to train the input information.
  • sayer: we can train a system based on existing programming systems.
  • sayer: the problem is similar to the unilang classification problem. of course I knew this. however, for instance, like the problem of multiple dispatch, could be solved by training a learner. like the problem of longest token for the perl 6 parser. All related. The problem of figuring out what function to call based on the type of inputs.
  • sayer: perhaps we should train a neural network to choose which functions to call in order to solve specific algorithmic problems.
  • sayer: The notion of what kind of features to extract is highly variable, for instance, if you were looking at integers, well there are thousands and thousands of number-theoretic features that could be used. So the idea is, for the part of sayer that looks at features to decide what else to do, it should make use of a general purpose feature extractor based on this concept that there are so many potential features. In fact the space of features is roughly synonmyous with that of the predicates.
  • sayer should model which data is compatible as input to which functions.
    ("comment" "51418" "Would be easier in some sense with a strongly typed language.")
  • Note the similarities between sayer and Make
  • sayer should even be able to disambiguate cases like "Geeze, I thought..."
  • sayer is related to the unilang message classification.
  • As an example of sayer's capabilities, it should be able to guess at the meaning of license plates and business names, for instance "addadi"
  • sayer should have for example the capability to parse Perl code, kind of like PPI, its a document parser, except that it handles multiple contexts, so for instance the example is if you have a variable $Ingy, it would not only treat it as a variable but it would note the referency to Ingy.dot.net

This page is part of the FWeb package.
Last updated Sat Oct 26 16:46:13 EDT 2019 .