Table of Contents


Node:Top, Next:, Previous:(dir), Up:(dir)


Node:Introduction, Next:, Previous:Top, Up:Top

Introduction

LEARNER allows ordinary Web users ("teachers") to teach a computer things they know and allows anyone to ask it questions about what others have taught it.

The home page for the project is at http://www.media.mit.edu/~timc/learner . It contains a link to a running LEARNER that you teach and ask questions.

We hope that by capturing things people know and capturing "how they conclude things from other things" we can pave the way to smarter, easier to use information repositories, computers, and other devices.

To put it another way, LEARNER is a system for capturing declarative knowledge and inference rules. A fundamental feature of the system is to "tease" the knowledge out of the user by posing plausible questions.

Importantly, the number and quality of follow-up questions LEARNER can ask goes up as the amount of knowledge in the system goes up. Our approach is to intelligently re-use the knowledge that users put in to get more good information.

However, the mechanisms of "intelligently re-using inputted knowledge" are necessarily diverse.1

From the outset, LEARNER has been designed to be able to capture the large diversity of reasoning methods. It is open source, has a general plug-in architecture for question-posing modules, and an extensive and well documented API.

Even though it does not get sensory information the way people do, it can accumulate and use a vast amount of knowledge and rules. With plenty of luck, LEARNER can become a collaborative creation of mankind that rivals any other artifact in its usefulness.

Motivation

Simply put, there currently are no good tools for dealing with assertion-level unstructured data.

This is generally an acknowledged problem for any knowedge-intensive business. For example, DARPA recognizes the importance of being able to gather and reason with heterogeneous, unstructured knowledge in its recent Rapid Knowledge Formation (RKF) initiative.

However, the enabling technologies such as:

have matured over the past 20 years to a point where robust, new applications can be built on top of them.

Furthermore, the project of collecting knowledge and even the project of developing the set of methods to process such knowledge cannot be effectively tackled by a small team. Thus, it almost inevitably must be the "distributed" or, more specifically, "community-based" approach. Such approaches have become feasible only recently thanks to the World Wide Web and the advent of the Open-Source movement.

It is the goal of the LEARNER project to break new ground in collecting large repositories of unstructured assertions and enable reasoning in them. The underlying technologies and approaches exist. What is needed now is an organized effort of individuals skilled in this field.

The LEARNER technology is still very young. However, there are many huge opportunities that it could, with maturity, address very effectively. Here are some examples of potentially very successful and lucrative applications:

• Self-service Help Desks
Responding to Natural Language queries with ever increasing precision to automate customer support and help-desk operations.
• Voice Command and Control
As an enabling technology for controlling computers and other devices with your voice. Continuous speech recognition has made great advances in recent years. The new gap is: given the recognized speech to figure out what the user actually wanted to do, i.e. what commands should actually be executed.

Generally, when speaking to each other, people can communicate effectively because the listener is assumed to have a lot of commonsense knowledge and some reasoning ability. Computers and other devices can potentially join the class of intelligent listeners if they are equipped with a large common sense knowledge base.

• An Information Repository for "Assertion-Sized Knowledge"
Currently, databases address the issue of storing and manipulating structured knowledge. A lot of valuable knowledge is simply too heterogeneous to be stored that way. That is why the looser organizational approach of the World Wide Web has been so successful. LEARNER can capture roughly sentence-level information and be as useful on that level as the Web is on the document-level.

The rest of the manual describes how LEARNER works so you can use and extend the project.


Node:
Installing Learner, Next:, Previous:Introduction, Up:Top

Installing Learner

Before installing any LEARNER software, you must download and install a large database containing the WordNet lexical database and a released part of the CYC ontology. This combined database is called BRICOLAGE or BRICO, and it is in the format used by the Framerd Scheme interpreter.

As of this writing, you can find this database at

<http://framerd.org/download.html>

by clicking on the .tar.gz link after the words "BRICO ontology".

Now unpack this .tar.gz file, preferably under a new /usr/local/share/brico/ directory. The rest of these installation notes assume that you used this specific directory, such that the file

/usr/local/share/brico/brico/brico.pool

exists.2

Once the BRICO database has been installed, you can install the LEARNER software. As of this writing, the LEARNER software is available for download at

<http://mit.edu/fredette/www/learner/>

To run the LEARNER, you need to download and install a total of four software packages from this site, in the following order:

  1. <link-4.1+learner0.1.tar.gz> is a modified version of the Link Grammar Parser, an English language parser developed at Carnegie-Mellon University.

    Our modifications include some additional functionality, and the ability to be integrated with FramerD.

    Unpack this .tar.gz file and follow the instructions in the INSTALL file to compile and install this package.

  2. <framerd-2.2preA+learner0.1.tar.gz> is a modified version of a distributed object-oriented database FramerD, which includes an extended Scheme implementation and was developed at the Media Lab of the Massachusetts Institute of Technology.

    Our modifications include new functions to integrate the Link Grammar Parser.

    Unpack this .tar.gz file and follow the instructions in the INSTALL file to compile and install this package.

  3. <learnerdict-0.1.tar.gz> is an English dictionary used by the LEARNER software.

    Unpack this .tar.gz file under the exact same directory under which you unpacked the BRICO database.

  4. <learner-0.1.tar.gz> is the LEARNER software itself.

    Unpack this .tar.gz file and follow the instructions in the INSTALL file to build this package. The LEARNER does not need to be installed.

    Please note: If you did not install the BRICO database and English dictionary under the /usr/local/share/brico directory, (for example, to store it on a large second disk or on the network), you will need to tell LEARNER's configure script where to find the database. For example, if you installed BRICO under /usr/bigdisk/brico, you would do:

    ./configure --with-brico=/usr/bigdisk/brico
    


Node:Running Learner, Next:, Previous:Installing Learner, Up:Top

Running Learner on Your System

For these early releases of the LEARNER, you interact with it entirely at a FramerD command line. You can reach this command line by running the command fdscript at your shell prompt, or, if FramerD found an Emacs when it was installed, you can get a captive command line buffer in Emacs by doing M-x fdscript.

The first fdscript command you should run should change the working directory to where you built the LEARNER. If you unpacked and built the LEARNER under the directory /usr/home/test/learner, do:

(cd "/usr/home/test/learner")

There are a few parameters you can set to control how the LEARNER runs:

%keep-pools-and-indices Variable
This is set in learner-init-server.fdx. If it is #f, all the pools and indices LEARNER uses will be wiped out if they are present (and empty ones will be created in their place).

Otherwise, the LEARNER assumes indices and pools are in place and uses those.

%prefetch-brico Variable
When defined and not #f, will prefetch every topic mentioned in LEARNER's database from BRICO. This takes some time but speeds up future operation. It is convenient to turn this off if you are relaunching the system frequently.

If running in client-server configuration, see learner-init-server.fdx to turn prefetching off in launching the server.

%run-as-standalone Variable
When #f, loads the system as the client assuming a server process has been started separately. The sever process can be started with the command fdserver learner-server.fdz --local.

%use-generalization Variable
When not #f, enables another mechanism for generating questions. Roughly, given cats have tails and dogs have tails, uses BRICO's hierarchy to arrive at the hypothesis that (and ask whether) all pets have tails. This feature is still in tuning and should probably be left "off" for beginner users.

%user Variable
This is the default username to use. If not set, "unknown" will be used. When using with a web interface in a multi-user situation, the username should be provided in the 'user slot of some API calls. See The Learner Interface API Functions, for an explanation of which API calls need the 'user slot.

%verbose Variable
When not #f, will cause printing out of various information. Furthermore, if (contains? 2 %verbose), more information (the level 2 printouts) will be outputted. The most output is obtained by setting %verbose as follows: (define %verbose (choice 1 2 3 4 5).

For example, you may set the variables and load the system as follows:

(define %run-as-standalone #t)
(define %user "guest")
(define %prefetch-brico #t)
(define %verbose #t)
(define %allow-slotted-frame-questions #t)
(define %use-generalization #f)
(define launch-learner
  (lambda ()
    (if %run-as-standalone
        (load-library "learner-init-server.fdx")
        ;; run as client of client-server
        ;; NOTE: the server should be started before running the
        ;; following line.  Server can be started with, approx.:
        ;; 'fdserver learner-server.fdz --local'
        (load-library "learner-use-server.fdx"))
    (load-library "learner-init-client.fdx")))
(launch-learner)

If you would like to run it in the client-server configuration, you will need to start the server in a separate shell with

fdserver learner-server.fdz --local

(you may wish to set %prefetch-brico and %verbose in learner-init-server.fdx). When launching the client (inside Emacs if you wish) make sure to set:

(define %run-as-standalone #f)

You can have several client processes talking to the same server. All the database updates and retrievals are done by the server to avoid cache coherency problems.

The client exports the API functions, as listed in the learner-client-exports.txt. Those are accessible by issuing dtcall commands.3 .

Next, you can make sure that things are functional by taking the following steps:

  1. load the assertion and file learner-initial-assertions.fdx,
  2. load the Iframe definition file learner-initial-iframes.fdx, and
  3. evaluate the contents of learner-tests.fdx.

Or, you can go on to the 30-second tutorial (see Command Line Interface).

Note that as one of the first users of the LEARNER, your contribution makes a difference. The knowledge you contribute at this formative stage will shape the future direction of the development of the LEARNER.

We hope you enjoy experimenting with and extending the LEARNER. LEARNER can achieve its potential only with the contributions of many, so we strongly encourage you to contribute any significant changes back to the LEARNER community. See Wishlist, for possible directions.

Should you accumulate a significant knowledge base with your copy of the LEARNER, we strongly encourage you to share all or (if you must) part of it with the LEARNER community by contacting us.


Node:Web Interface, Next:, Previous:Running Learner, Up:Top

Using the Web Interface

Starting a Conversation

You can start by asserting something (in the "say something interesting" box) or by clicking on one of the "hot topics"

You can also get more information on any specific topic by clicking on it when it is a hyperlink, or by entering it in the "Summarize the topic blank" field.

Finally, you can make any existing topic the topic of conversation by entering it in the blank in the "Make blank the new topic" section.

Tips on using the Interface

Here are a few tips on how to (and how not to) enter knowledge so LEARNER can make the best use of it:

• Respond to the questions (guesses) that come up.
For the right guesses, change the radio button next to them to Yes, and for the wrong ones, change it to No. It's OK to leave some as Don't know.
• Do not worry about the trailing ?.
Trailing ? will be stripped off from your replies.
• Provide "because" reasons.
With either Yes or No reply, you can (and, when it makes sense, you are encouraged to) provide a reason in the "because" field.4

For example, when presented with a question a car has a tail?, you are encouraged to respond with:

a car has a tail? <No> because a car is not an animal <Yes>5

Please note: only enter "because" reasons that can stand on their own. Do not rely on the words it or they.

Wrong:

a car has a tail? <No> because it is not an animal <Yes>

• Correct Learner's guesses whenever possible.
Rather than just answer <Yes> or <No>, correct the system when possible. For example, if the system says:
a car uses electricity?

You are encouraged to edit the line to read:

a car uses gasoline
... and select the <Yes> radio button.
• Do Not Over-Capitalize
Do not capitalize the first word in a sentence unless you would capitalize that word in the middle of a sentence. For example:
Ford makes minivans
But there is no capitalization in:
sun rays are sometimes blocked by clouds

Do not forget to select the appropriate <Yes> or <No> checkbox.

• Keep Adding Knowledge
The more the system knows, the better it can be at making analogies. To help make the guesses more on target, try to type in some new facts every two or three question-answering sessions.

To use the interface most effectively, it is best to have the guess and its "because" field fit on a single line. You may need to resize your browser window and/or your browser's font sizes to achieve this.

The interface also allows you to query what LEARNER knows in the input box labeled "try to answer a question". See Asking Learner Questions, for an explanation (with examples) of how to pose the questions.


Node:Command Line Interface, Next:, Previous:Web Interface, Up:Top

Using the Command Line Interface


Node:30-second tutorial, Next:, Previous:Command Line Interface, Up:Command Line Interface

The 30-second tutorial

To run the examples given here, start up FDscript, load the LEARNER code, and make sure the database contains only the assertions from learner-initial-assertions.fdx and the Iframes from learner-initial-iframe.fdx. You can do this by starting with an empty database and loading these two files.

These are the functions you can use to talk to the system from the FDscript command prompt or from Emacs if you are running FDscript within Emacs:

show-topics
show-topic-summary
say-topic
say
say-Iframe!
find-answers

See Learner Interface API, for definitions of these functions.

At this point, you know enough to start talking to the system directly.

As you say more things to the system, it uses them to make better guesses about what is and what is not true. So, you're making the system smarter with every little bit of knowledge you put in. Every little bit really helps - so you can make this effort a success while just having fun!


Node:adding Iframes, Next:, Previous:30-second tutorial, Up:Command Line Interface

Teaching Learner "Thinking Rules"

A more direct, powerful way to make the system smarter is to teach the system "Thinking Rules" in addition to just facts.

For example, in response to the system guessing that a car has a tail (or for any other reason), you may tell the system

(say-Iframe! '(("a car is not an animal")) '("a car has a tail" 0))

creating an Iframe with the 'what being "a ?car? is an animal <not> => a ?car? has a tail".


Node:querying, Previous:adding Iframes, Up:Command Line Interface

Asking Learner Questions

A way to ask the system about what it already knows is using the find-answers command.

LEARNER can handle both simple interrogatives (e.g. What do cats eat?) and fill-in-the-blank questions.

Here are some examples:

(find-answers "What do cats eat?")
produces:
;; Finding answers for "cats eat ?X?"
;; Found an answer:
"cats eat mice" is true
(find-answers "?Xes? eat ?Y?")
produces:
;; Finding answers for: "?Xes? eat ?Y?"
;; Found these answers:
"dogs eat mice" is true
"cats eat mice" is true
"dogs eat meat" is true
(find-answers "turtles eat mice")
produces:
;; Finding answers for: "turtles eat mice"
;; Found an answer:
"turtles eat mice" is not true


Node:Learner Algorithms, Next:, Previous:Command Line Interface, Up:Top

How Does Learner Work?

The LEARNER has a set of features that allow it to ask relevant follow-up questions when obtaining information from the user. This set can be expanded as needed; the system itself ships with a core set of plug-ins. We demonstrate what these plug-ins allow LEARNER to do in a series of examples.


Node:similarity, Next:, Previous:Learner Algorithms, Up:Learner Algorithms

Making Similarity Judgments

If two topics are similar, similar statements will be true about them. For example, a "spoon" and a "fork" are similar, and it is true that "you can eat with a spoon" and "you can eat with a fork", and "forks are usually made of metal" and "spoons are usually made of metal".

Furthermore, similarity drops off gradually. For example, a "spoon" and a "shovel" are both inanimate objects that are tools. In some general sense, they are more similar than, say, a "spoon" and a "rabbit". On the level of assertions, we observe that you can say "you can eat with a spoon" and "you can dig with a shovel", but it is awkward to phrase something similar about a "rabbit".

Based on these two observations, it should be possible to go the other way. That is, similarity of topics can be derived from similarity of assertions about them!

That is what LEARNER does.

Similarity is (or should be) used throughout the system to drive creation of new hypotheses, estimate plausibility of an answer or a newly acquired fact, retrieve most relevant information, organize how retrieved information is presented, and so on.

In our experience, most of the things LEARNER needs to do can be reduced to some combination of similarity and inference computation. So, having a good similarity function is pretty important. similar-topics-hash is the function that implements similarity in the LEARNER.

Topics similar to a given topic (the source topic) are computed by taking the following steps:

  1. For the source topic, identify all the assertions about it (the source assertions).

    See scored-assertions-on-topic.

  2. For each source assertion, find all assertions that exceed a certain similarity threshold.

    See similar-Uframes-set.

  3. In each of the "similar assertions", find the topic that is in the same role as the source topic is in the corresponding source assertion.

    Fore example, for a source topic "dog", "dogs eat mice" may be an assertion about dogs (a source assertion). Then, "cats eat mice" would be a "similar assertion", and "cats" in it would be in the same role as "dogs" is in "dogs eat mice". That would add weight to similarity of "cat" and "dog".

    See corresponding-item.

  4. All of the similarity scores are added up, arriving at a hashtable of topics, each similar to the source topic, and each associated with a similarity weight.

Step number 2 above was to find the most similar assertions given an assertion. That in itself is a multi-step process (that can be improved be a willing contributor!). Currently, similar assertions to a given source assertion (uframe) are computed by taking the following steps:

  1. Given the uframe, compute its significant keys (words) and their significant links (parse links).
  2. Retrieve all uframes that have at least one significant word in common with the source frame, making the candidate frames.
  3. Score each candidate frame comparing significant atoms in the source frame with the corresponding significant atoms in the candidate frame as follows:

Explicit statements about what is similar do not currently affect the internal similarity measure, but they could, so that saying "a spoon is similar to a fork" (or finding that out from BRICO or from mining the World Wide Web) would prompt exploration of how they are similar.

Relevant functions:

scored-assertions-on-topic topic Function
Given a topic, returns conses of assertions and strengths. The strength for an assertion is what strength of the topic would be if we called assertion-topics on the assertion.

corresponding-item topic uframe similar-uframe Function
Given a topic, a uframe containing the topic, and a similar-uframe, returns the topic from similar-uframe that is in same role, i.e. corresponds to topic in the uframe by its parsing role.

Does not base-formify the topic it returns.

similar-Uframes-set uframe threshold . exclude-topics Function
Given a uframe, a threshold, and an optional choice of exclude-topics, returns the choice of uframes at least threshold similar to uframe; excludes any frames that mention one or more of the exclude-topics.

similar-uframes-of->string uframe Function
Given a uframe, returns a string describing uframes at least similarity-threshold (hardcoded to a certain value) similar to it. Accounts for probability classes of the frames
(lineout
  (similar-uframes-of->string (the 'uframe "dogs eat mice")))
;; Related to what you said:
  ("cats eat mice" . 26) ("cat <-> dog")
  ("dogs eat meat" . 13)

On each line, the number associated with the string is the similarity score, and the optional parenthetical argument lists which things in equivalent positions in the two assertions were found to be BRICO-similar.


Node:analogy, Next:, Previous:similarity, Up:Learner Algorithms

Making Analogies

The key task of the system is to pose plausible questions on the topic of conversation. How can we come up with these?

There are a few options. One is to re-use what other users have put in, verifying it verbatim. While perhaps needed, it does not lead to a very interesting system, as there is no expansion of the knowledge base.

Another is to seed LEARNER's questions with the statements that seem plausible based on, for example, mining the World Wide Web. That could be a very interesting direction, especially if what was extracted was sufficiently clean.

Yet another approach would be to rely on the LEARNER's inference mechanism to generate new statements. The problem here is that we often do not have enough inferences in the new to LEARNER areas to make this the "weight bearing" mechanism.

Finally, there is a very good avenue, the one the system actually uses. This avenue is to generate new statements about a topic by analogy from known statements about similar topics.

So, fundamentally, the analogy mechanism resides on top of the similarity mechanism. Analogy is potentially a very powerful and sophisticated tool. Here, we describe its current implementation in the LEARNER.

The preceding section on Making Similarity Judgments (see Making Similarity Judgments) showed how to compute topics similar to a given topic. Let's call such topics friends of the source topic.

To analogize the statements from similar topics (friends) to the target topic, we basically take all the assertions that are true about the friends, change them to be about the target topic, and sum up assertions from all the friends, giving more weight to statements that came from better friends (more similar topics), and letting assertions that are true about some friends and not true about others partially cancel each other out.

More formally, we take the following steps:

  1. For each friend, retrieve all assertions that mention it, omitting the uncertain ones and the ones that already mention the target topic.7
  2. In each of the retrieved assertions, substitute the source topic with the target topic, conjugating nouns to be plural or singular as needed. This forms the analogized assertions.

    Each analogized assertion has a score associated with it - the "better the friend" (i.e. the more similar the source topic), the more weight assertions analogized from it get.

  3. Collapse the identical analogized assertions that were formed from different source topics, adding up their weights and accounting for probability-classes being the same or opposite.

Furthermore, it is an upcoming feature that assertions that are true about more things get more weight - so that the system progresses from the more general to the more specific questions in its learning about a topic.

More modules for generating plausible things to ask can, and will, with time, be added.

Once the assertions are computed, they get "cleaned up" to exclude asking what is already known, what is inferable from other things we are already asking, etc. See finalizing questions, for details.


Node:inference, Previous:analogy, Up:Learner Algorithms

Making Inferences

This capability is built on top of Iframes. See Iframes, for details on working with inferences.

Inference is used in several places.


Node:Learner Data Flow, Next:, Previous:Learner Algorithms, Up:Top

Learner Data Flow Architecture

LEARNER has a plug-in architecture for generating questions. That means that separate users can experiment with their own question-formulating modules and take advantage of the overall framework to organize and present questions for them.

Fundamentally, LEARNER's "read-eval-print loop" takes the following steps:

  1. accept the current input,
  2. identify the current set of "hot" topics being discussed,
  3. create questions for the current topics and assertions,
  4. finalize and present the questions to prompt the next input.

We present in more detail at the steps of creating the questions and finalizing them.


Node:creating questions, Next:, Previous:Learner Data Flow, Up:Learner Data Flow

Generating Questions

When you say things to the system, you are making it more knowledgeable. Through its built-in analogy and generalization mechanisms, it can also think of (hypothesize) new things. Fundamentally, these new hypotheses form the basis of the questions the system is asking.

Here, we describe the several specific mechanisms for creating new questions that ship with the LEARNER distribution.


Node:finalizing questions, Previous:creating questions, Up:Learner Data Flow

Pruning and Ordering the Questions

The LEARNER is set up to support multiple independent question generators. This is to allow independent groups to experiment with question-generating and to make it simple to increase LEARNER's question-asking prowess.

To make question-generation easy, a lot of the "clean-up" functionality is offloaded into a common finalizing stage. All question-generators feed their outputs into the finalizing stage and this stage generates and outputs the final set of questions.

Finalization consists of the following actions:

Combining

Several questions that are asking the same (or nearly the same) thing are combined into a single question. This is useful when separate mechanisms generate the same question. The scores of the questions are combined, paying attention to the probability classes of the statements (a positive and a negative conjecture cancel each other's scores out).

Pruning

A question is dropped altogether. There may be several reasons for this:


Ordering

Given a great many questions, in which order do we ask them? The question generators produce questions already paired with scores, but the other work done by the finalizer can further alter the scores.

The pruning of questions to "those which do not follow from answers to other questions" deserves further explanation.

For example, if LEARNER is considering asking about the truth of the following statements:

Question 1: "cats have paws"
Question 2: "cats have sharp claws on their paws"

and it already knows that "?snakes? have ?paws? <not> => ?snakes? have sharp claws on their ?paws? <not>" (i.e. if you do not have X, you cannot have sharp claws on X), then it will not ask Question 2 together with Question 1.

This is an important feature. If the interface makes it easy to enter "because" reasons (Iframes), then filtering out of the dependent questions will lead to spontaneous structuring of the dialogue. That is, the system will evolve from a mass of questions towards a decision tree type dialogue. The initial questions will be the more general ones, and depending on their answers, the more specific ones will become relevant.

This mechanism helps make LEARNER a powerful knowledge acquisition tool that improves as it accumulates more knowledge.


Node:Learner Interface API, Next:, Previous:Learner Data Flow, Up:Top

Learner Interface API

Learner Interface API covers the functions you should use to interact with the LEARNER, both from a command line and from a front end.

The subset that should be used from the command line was described in the tutorial on using the command line interface (see The 30-second tutorial).

The front-end API functions are a subset of the interface API. The functions exported to front-ends are as follows:

functions for adding to the system:
add-to-kb-start
add-to-kb-with-string-output
add-to-kb-end-with-string-output
say-iframe!-with-string-output
say-topic-with-string-output

functions for browsing:
find-answers-with-string-output
similar-uframes-of->string
show-topics-with-string-output
show-topic-summary-with-string-output

These functions allow you to build front ends for the LEARNER without the need to understand its architecture or algorithms.

The file learner-client-exports.txt dictates which functions are exported.


Node:terms, Next:, Previous:Learner Interface API, Up:Learner Interface API

The Syntax of a Term

In describing the API functions, we introduce the notion of a term. A term is the input you can give to the system to describe a frame - either to retrieve a frame that already exists or a frame you would like to create. The function that parses these terms is term->protoframe; it is defined below. Interface functions rely on it to parse the terms they receive as arguments.

A term can be one of the following:

term->protoframe term . other-slots-values Function
Given a user-inputtable description of a frame term, creates and returns a protoframe described by term and including other-slots-values.

This function is verbose, i.e. it will output lines describing the problem if the inputted expression cannot be parsed.


Node:interface API functions, Previous:terms, Up:Learner Interface API

The Learner Interface API Functions

This section defines all the commands that constitute the interface API of the LEARNER.

add-to-kb . term Function
Given a term, this function adds it to the knowledge base, producing lineouts if there are any problems (e.g. can't parse, the system already believes the opposite, etc.)

Registers the assertion in this user's history

If term has a 'user slot, makes that user the 'source of the frame that is added.

See also add-to-kb-with-string-output.

Please note: When specifying slotted frames, if the atoms mentioned in the term do not exist, neither the atom nor the frame get created. This is because parsing provides a part of speech and some other useful information, so we prefer for all atoms to be created as a result of parsing and then used in slotted frames as needed.

add-to-kb-with-string-output . term Function
A version of add-to-kb that returns its output as a string rather than printing it out.

Examples:

(add-to-kb-with-string-output "a cat has a tail" 'user "test")
(add-to-kb-with-string-output "cat" "have" "tail" 0.9 'user "test")

add-to-kb-start . slots-values Function
This function should be called before adding zero or more assertions to the system. See also add-to-kb-end-with-string-output. The set of assertions enclosed between these two commands is treated as one input session to which LEARNER responds.

slots-values should contain the 'user slot and value when running under multiple users.

add-to-kb-end . slots-values Function
This function should be called after all new things the user has said in one submission have been added to the knowledge base. This function produces the output to present as the reaction to what the user has said.

slots-values should contain the 'user slot and value when running under multiple users.

add-to-kb-end-with-string-output . slots-values Function

find-answers . term Function
Given a term, interprets it as a question to the system and tries to find or infer knowledge that would constitute an answer to this question.

If term has a 'user slot and a frame is created to be answered later, makes that user the 'source of the frame.

See The Syntax of a Term, for an explanation of how to specify terms.

For example, if you ask (find-answers "?Xes? have tails") or (find-answers "a ?X? has a tail"), with just the initial database loaded, you will see output similar to:

;; Finding answers for "?Xes? have tails"
;; Found these answers:
a dog has a tail
a cat has a tail
;; Nothing (void) was returned

find-answers-with-string-output Function
A version of find-answers that returns its output as a string rather than printing it out.

say . term Function
This is a wrapper for asserting a single assertion and getting a reply the system. Given a term, adds it to the KB, registers it in the history, and poses the relevant questions.

See also:

add-to-kb-start
add-to-kb-with-string-output
add-to-kb-end-with-string-output

See Iframe Functions, for the definition of say-iframe! and say-iframe!-with-string-output.

say-topic arg Function
This is the main way a user can set the current topic of conversation. Given an arg (an atom frame or a string) this sets the current topic to be the base-form of arg.

say-topic-with-string-output Function
A version of say-topic that returns its output as a string rather than printing it out.

show-topic-summary string . [slots] Function
Shows the summary of the topic indicated by the string and the optional slots.

show-topic-summary-with-string-output topic-str Function
A version of show-topic-summary that returns its output as a string rather than printing it out.

show-topics min max Function
given a min and a max, this top-level command outputs topics about which at least min and at most max uframes are known. If max is -1, no upper limit is used.

show-topics-with-string-output Function
A version of show-topics that returns its output as a string rather than printing it out.

similar-uframes-of->string uframe Function
Given a uframe, returns a string describing uframes at least similarity-threshold (hardcoded to a certain value) similar to it.

Accounts for probability classes of the frames.


Node:FramerD and Link Grammar Parser, Next:, Previous:Learner Interface API, Up:Top

FramerD and Link Grammar Parser

LEARNER depends on two major pieces of software: FramerD is a Scheme interpreter married to a flexible database implementation, and the Link Grammar Parser is an English parser from CMU.


Node:FramerD, Next:, Previous:FramerD and Link Grammar Parser, Up:FramerD and Link Grammar Parser

FramerD

FramerD is a distributed object-oriented database used by LEARNER. FramerD is available under the LGPL and includes persistent storage and indexing facilities that can scale to very large database sizes, as well as a language FDscript, a superset of Scheme.

LEARNER is written in FDscript.

FramerD introduces the concepts of frames, slots, and slotmaps which we use in describing how LEARNER works.

FramerD also comes with a version of the WordNet lexical database and a released part of the CYC ontology combined and converted into the FramerD format (the database is called BRICOLAGE or BRICO). For now, LEARNER uses the WordNet component only.

FramerD also has many attractive features:

To do the more advanced things with the LEARNER, you will need to understand FDscript.

FramerD documentation, covering the database implementation and the FDscript language, was available at the time of writing at <http://framerd.org>.


Node:Link Grammar Parser, Previous:FramerD, Up:FramerD and Link Grammar Parser

Link Grammar Parser

The Link Grammar Parser is a constraint-based English-language parser that tries to assign a consistent set of linkages between all words in a sentence.

The Link Grammar Parser is an impressive system in its own right. The parser is written in C and source code is freely available for non-commercial purposes.

Complete distribution and documentation of the link grammar parser was available at the time of writing at http://www.link.cs.cmu.edu/link.

Here is an example of how the parser would parse the sentence "cats eat mice":

   +-Sp-+--Op-+
   |    |     |
cats.n eat mice.n

The above parsing contains the following information about the word "cats":

LEARNER currently only accepts sentences that can be parsed completely. When multiple parsings can be found by the parser, the LEARNER uses the first one.

This can lead to some unexpected results. For example in the sentence cats have sharp claws, sharp gets parsed as a noun in the first returned parsing. This does not, however, cause any known difficulties with the operation of the LEARNER.

According to the authors of the Link Grammar Parser, a "statistical version of the parser" is under development and may become available sometime in the future to address this.


Node:Frames, Next:, Previous:FramerD and Link Grammar Parser, Up:Top

How Frames Work

All frames in LEARNER, be they for representing atoms, assertions, or parser links, have some fundamental mechanisms (such as inheritance) and policies (such as rules on mutating frames) that apply to them. We start by overviewing these mechanisms and policies and then go on to describe how more specific types of frames work.

Frames in LEARNER have an inheritance mechanism. Namely, the ifget command works the same way FDscript's built-in fget command does, except it will recursively follow a frame's 'inherits-from slot until it fails or gets to a frame that has a value for that slot.

It is a policy that frames which are OIDs are not to be mutated. Mutating them would lead to difficulties with the need to update their indexing and with knowing what the original author asserted. Rather, protoframes that inherit from oids are created. Mutation of protoframes is permitted where appropriate, as in committing scheduled changes (see Variables). To effect this policy, use a LEARNER function fset-safe!.

Relevant functions: ifget


Node:atoms and topics, Next:, Previous:Frames, Up:Frames

Representing Atoms and Topics

LEARNER's knowledge consists of assertions, but to work with these effectively, we organize assertions around topics. Currently, only nouns in their base form can be topics, but gerunds ("skating" in "some people like skating") and noun phrases ("beach chair") can also be added.

Interactions with LEARNER revolve around topics and similarity is also measured between a pair of topics (the similarity of sentences helps compute similarity of topics).

Assertions are said to have the main topic (sometimes not present) and, more generally, topics.

Related functions:

assertion-main-topic assertion Function
Given an assertion, returns its main topic, if any

assertion-topics assertion Function
Given an assertion, returns a choice of conses, first element of each is a topic, and second is a score of how much the assertion is about the topic

topic-total-mentions Function
Given a topic, returns how many assertions are indexed by it

topic-frequency topic Function
Given a topic, returns how many certain-yes? assertions mention it

topic-absolute-frequency-weight topic Function
Given a topic, returns returns its weight (decreases with frequency, i.e. is less for common words)


Node:assertion-level frames, Previous:atoms and topics, Up:Frames

Representing English and Internal Knowledge

The preceding chapter explained how frames are used to represent individual atoms and "topics". This chapter explains how frames are used to represent compound structures to hold assertions.

There are two types of frames for holding assertion-level information. One is a Uframe (for "Utterance frame"), used to hold information in a natural language, and the other is a slotted frame, to hold information for internal processing.

In this section, we explain the common features of all frames representing assertion-level inforamation. More details about the specifics of each type are available in the sections that follow.


Node:Uframes, Next:, Previous:assertion-level frames, Up:assertion-level frames

Representing English - Uframes

A Uframe (for an "Utterance frame") is a structure for holding utterances in English (or, potentially, any natural language).

Uframes are created from the output of the link grammar parser and roughly mirror it, although they have additional slots.

A Uframe contains the following slots:

'parsed
'simplified
'utterance-type
'significant-keys-links-hash

For efficient processing, we envision that Uframes will be recognized into internal data structures - slotted frames.


Node:slotted frames, Next:, Previous:Uframes, Up:assertion-level frames

Representing Derived Knowledge - Slotted Frames

An SRVframe is an example of a slotted frame. SRV stands for Subject-Relation-Value. Accordingly, an SRVframe has three slots: the 'subject, 'relation, and 'value.

See Variables, for a description of frames with and without substitutions.


Node:assertion functions, Previous:slotted frames, Up:assertion-level frames

Functions Related to Assertions

add-to-kb, add-to-kb-with-string-output, say are used to add assertions.

similarity: similar-uframes


Node:Variables, Next:, Previous:Frames, Up:Top

Variables

Frames may be assertions if they have no substitutable atoms, or templates if they do. When a frame has a substitutable atom (a variable), the atom is shown surrounded with question marks. Here is a quick example of making the atom "cat" variable:

(let ((frame1 (frame-atom-is-substitution
                 (the 'uframe "cats eat mice")
                 (a "cat"))))
   (frame-finalize! frame1)
   frame1)

The above example returns a protoframe with the 'what being "?cats? eat mice".

Whether a frame is an assertion also determines which index it is indexed in (see Indexing).

Note that for efficiency reasons, LEARNER has a system of scheduling updates to a frame, keeping a log of changes to be made. The log may contain directives such as:

You need to commit a log before examining the slots that the scheduled changes affect. frame-update! and frame-finalize! do that. The ! functions (such as frame-vaiables!, ifget! are the same as their non-! counterparts, but they update the frames they operate on. Generally, updating a frame that has previously been updated is a low-cost operation that does not mutate the frame.


Node:substitution functions, Previous:Variables, Up:Variables

Substitution Functions

variable? frame Function
Returns #t for a template and #f for an assertion.

frame-atom-is-substitution frame atom Function
Given a frame and an atom, this returns a proto-frame where atom is now a variable (i.e. candidate for substitution).

frame-atoms-are-substitutions frame atoms Function
Given a frame and a list of atoms, this returns a proto-frame where atoms are now variables (i.e. candidates for substitution).

frame-atom-is-not-substitution frame atom Function
given a frame and an atom, this returns a proto-frame where atom are no longer candidates for substitution.

frame-atoms-are-not-substitutions frame atoms Function
given a frame and a list of atoms, this returns a proto-frame where atoms are no longer candidates for substitution.

frame-substitute-atom frame atom-from args Function
Given a frame and an atom-from and the substitution args, (which in simplest case is atom-to. Returns a proto-frame where that substitution has been scheduled.

frame-substitute-atoms frame atom-map Function
Given a frame and an atom-map, returns a proto-frame where these substitution has been scheduled.


Node:Indexing, Next:, Previous:Variables, Up:Top

Indexing

The LEARNER currently uses six index files:

assertions.index
Indexes the assertions (Uframes and slotted frames) that LEARNER knows about. This includes uncertain assertions (those with 'probability close to .5).
inferences-lhs.index
Indexes Iframes by their left-hand sides.
inferences-rhs.index
Indexes Iframes by their right-hand sides.
memoization.index
Holds results of computation that never change but that may take a while to compute.
templates.index
Indexes "templates" - Uframes and slotted frames that have variables in them.
types.index
Indexes all frames by their 'type and 'what.

Uframes are indexed by the pairs (Wframe . #t) and by the (Wframe . link-with-direction). For example, the Uframe (the 'uframe "cats eat mice") will be indexed by9:

(@3af3492b/57802a"WFRAME: eat" . #t)
(@3af3492b/57802a"WFRAME: eat" . S-)
(@3af3492b/57802a"WFRAME: eat" . O+)
(@3af3492b/57802d"WFRAME: mouse" . #t)
(@3af3492b/57802d"WFRAME: mouse" . O-)
(@3af3492b/578014"WFRAME: cat" . #t)
(@3af3492b/578014"WFRAME: cat" . W-)
(@3af3492b/578014"WFRAME: cat" . S+)

A pair such as (@3af3492b/57802a"WFRAME: eat" . S-) means that this frame can be retrieved by a low-level fdscript call (find-frames assertions-index (the 'wframe "eat") 'S-) or by a higher-level LEARNER call (all-relevant-assertions (the 'wframe "eat") 'S-).

Slotted frames are indexed by the pair (slot-name . wframe). For example, and SRVframe "[cat|have|tail]" will have ('subject . @3af3492b/578014"WFRAME: cat") as one of its indices.


Node:Iframes, Next:, Previous:Indexing, Up:Top

Iframes

An Iframe is basically a "rule" of a rule-based system. It has the two structure slots 'left-terms and 'right-term. These hold its LHS (left hand side, preconditions) and RHS (right hand side, postcondition).

An Iframe's LHS is indexed in iframes-lhs.index and RHS in iframes-rhs.index

The 'left-terms value is a list of frame templates, and the 'right-term value is a single frame, typically a template (having an assertion is allowed, but special provisions not to index it in assertions need to be made).

All of the terms on the left hand side must contain "substitutions" - atoms where anything can be plugged in (see Variables).

For a rule, when the LHS is satisfied, the RHS should also hold true.

Here is a 'what of a sample Iframe:

"a ?cat? is sleeping => a ?cat? is awake <not>"

This Iframe implies that if X is sleeping, X is not awake.

This frame can be instantiated with a call

(frame-plug-in-atom
  (the 'Iframe "a ?cat? is sleeping => a ?cat? is awake [p=0]")
  (the 'Wframe "cat")
  (the 'Wframe "dog"))

See Asking Learner Questions, for more on inputting Iframes.


Node:Iframe functions, Previous:Iframes, Up:Iframes

Iframe Functions

LEARNER provides functions for creating, retrieving, and doing inference with Iframes.

say-Iframe! lhs-expr rhs-expr . [other-slots-values] Function
Given lhs-terms-lst, rhs-term, and optional other-slots-values, adds and returns the Iframe.
Deciding what the variables are:
All variables indicated in any term (LHS or RHS) become the Iframe's variables. If lhs-terms-lst and rhs-term indicate no variables, variables are created by taking all the nouns that are both in LHS and RHS. For example, asserting (say-Iframe! '(("a cat is not a dog")) '("a cat cannot bark")) creates an Iframe such as:

"IFRAME: a ?cat? is a dog <not> => a ?cat? can bark <not>"

Controlling whether constituent frames are also added to kb:
When 'assert-constituents is in other-slots-values with value of #t or #f, will assert (or not assert) the constituent frames to db accordingly. Otherwise, (when 'assert-constituents has another value or is not present), will decide whether to assert the constitutents as follows:
  • If either LHS or RHS mention any of X, Y, Z, Xes, Ys, Zs, will not assert.
  • otherwise, will assert.

For example, asserting (say-Iframe! '(("cats eat mice")) '("cats are carnivores")) will add both assertions "cats eat mice" and "cats are carnivores" to the KB.

User:
If other-slots-values have a 'user slot, makes that user the 'source of the Iframe (and constituents) added.

relevant-iframes side frame Function
Searches the database for all the Iframes that contain a frame matching frame on the specified side, which must be either 'lhs or 'rhs. The function relevant-iframes returns a choice of frames, and may return frames that do not match frame, but look like they would.

say-iframe!-with-string-output Function
A version of say-iframe! that returns its output as a string rather than printing it out.

infer-one-iframe iframe premise-frame Function
Given an iframe and a premise-frame, returns the frame inferred on the RHS. If iframe may have an additional test (arbitrary fdscript expression) specified. If present, that test has to evaluate to a true value in order for the inference to happen. This also computes the RHS frame's 'ask-string if such a slot is present.

infer-from premise-frame [iframe-filtering-predicate] Function
Given a premise-frame and an optional iframe-filtering-predicate, applies all relevant iframes (that pass the one-place iframe-filtering-predicate if it is specified).

Returns 'right-term of matched iframes with the substitutions that created the match made.


Node:Learner and the Community, Next:, Previous:Iframes, Up:Top

Learner and the Community


Node:contributors, Next:, Previous:Learner and the Community, Up:Learner and the Community

Population Contributing to the Learner

LEARNER is, in a way, a bet on people. Much of LEARNER's power comes from having access to an "oracle" - the contributing population collectively holds an enormous body of diverse knowledge. As people contribute it, LEARNER hangs on to it, ever rising in its sophistication.

The opportunity to solve hard problems with incremental contributions of many has not existed historically. One of the project's goals is to explore and to learn more about the collaborative approach to solving problems.

The demands of the project are diverse, and successful growth of the LEARNER requires contributions on different levels.

Basically, the needs form a pyramid:

                                   core

                              write plug-ins

                        contribute inference rules

                 contribute and verify simple assertions

Luckily, we can expect the contributors to naturally be distributed in roughly such a pyramid as well.

This is because the prior experience required is an inverse pyramid:

some lisp / scheme experience, knowledge representation / AI background

      knowledge rep / ai background or interest, some programming

           analytical skills, familiarity with reasoning

                       posses common sense

The amount of effort required to contribute to the lowest level of the pyramid is also much less (and has smaller overall effect) than to contribute to a higher level.

To put it another way, the natural distribution of contributors is likely to roughly match the need.


Node:wishlist, Next:, Previous:contributors, Up:Learner and the Community

Wishlist

An upcoming feature of the LEARNER is persistent (on disk) storage of the assertions LEARNER would like to find out. This repository will bear the dramatic name "the purgatory". There are several reasons a frame can be added to the purgatory:


Node:bug reports, Previous:wishlist, Up:Learner and the Community

Bug Reports

You can contact the maintainers at timc@alum.mit.edu.

When reporting a problem, please include as much of the following as is relevant so that we can address it:

Your patches are also welcome and will be incorporated in a timely fashion.


Node:Related Work, Next:, Previous:Learner and the Community, Up:Top

Related Work

There are some simpler systems on the web for your enjoyment.


Node:History, Next:, Previous:Related Work, Up:Top

History and Acknowledgements

The concept of the LEARNER was originally developed by Tim and Anatoli Chklovski, as were the similarity, analogy, and question-pruning algorithms.

Matthew Fredette and Tim Chklovski have cooperated on the LEARNER implementation.

We are grateful to Alex Vasserman, who has contributed some Link Grammar Parser glue code.

We are grateful to Push Singh for his feedback and sharing his experiences on the OpenMind Commonsense project with us.

We are also grateful to Kenneth Haase for his continued support and development of FramerD and his responsiveness in personal communication.


Node:Function Index, Next:, Previous:History, Up:Top

Function Index


Node:Concept Index, Previous:Function Index, Up:Top

Concept Index


Footnotes

  1. In line with Marvin Minsky's general thesis in his seminal Society of Mind.

  2. It is possible to override this with the --with-brico option to LEARNER's configure script - see the installation item for the LEARNER for more information.

  3. see fdscript documentation for more details

  4. This is a simplified interface for entering inference frames. See Teaching Learner "Thinking Rules", for the full set of options in specifying Iframes.

  5. This said that a car does not have a tail because a car is not an animal. If you prefer, you can enter the equivalent a car does not have a tail <Yes> because a car
    is an animal <No>
    - negating the statement or using the radio button do the same thing.

  6. This statement is stored negated as "a cat can bark" with 'probability 0.
    Negated statements can also be entered as (say "a cat can bark" 0).

  7. The latter filter prevents creating strange assertions such as "mice eat mice".

  8. The term substitution is used interchangeably with variable.

  9. The exact numbers after the @ signs will vary.