FRDCSA | minor codebases | Paperless Office

[Project image]
Paperless Office

Architecture Diagram: GIF
Code: GitHub

Jump to: Project Description | Parent Description | Capabilities

Project Description

I have been writing a system that partially satisfies the notion of "Open Source Paperport Equivalent". But it does a lot of things that I don't think Paperport does. For instance, it has automatic document classification, syncs with your filing cabinet, has date extraction and fills a calendar with date mentions for easy checking of due dates, has semantic web integration and can do a lot of sophisticated natural language processing, such as extracting todo lists from documents, spam detection, urgency classification, along with planning, scheduling and execution features. (You can set due dates, and document and task interdependencies, i.e. this document has to be sent to so and so and a reply received before we can fill out this document). So it has workflow support.

There are many more options and plans for this system than are easy to reveal at this moment. Much of the way it handles documents will be similar eventually to KBFS. It can be used to maintain the reading lists for CLEAR and Study. It will integrate with SPSE2 and PICVis when they are complete enough to represent various domains. It may even merge somewhat with them.


  • paperless-office export roundtrip to nlu-mf
  • Add something to paperless-office that when it OCRs something it records that it has been OCRed in it somehow.
  • Figure out why paperless-office OCR is so crappy right now.
  • Create a mode for paperless-office that automatically detects if you scanned the same side twice by mistake.
  • Write something for paperless-office that fixes the problem it had with not scanning properly

This page is part of the FWeb package.
Last updated Sat Oct 26 16:43:32 EDT 2019 .