--- The Detailed Node Listing ---
Using the Command Line Interface
How Does Learner Work?
Learner Data Flow Architecture
Learner Interface API
FramerD and Link Grammar Parser
How Frames Work
Representing English and Internal Knowledge
Variables
Iframes
Learner and the Community
LEARNER allows ordinary Web users ("teachers") to teach a computer things they know and allows anyone to ask it questions about what others have taught it.
The home page for the project is at http://www.media.mit.edu/~timc/learner . It contains a link to a running LEARNER that you teach and ask questions.
We hope that by capturing things people know and capturing "how they conclude things from other things" we can pave the way to smarter, easier to use information repositories, computers, and other devices.
To put it another way, LEARNER is a system for capturing declarative knowledge and inference rules. A fundamental feature of the system is to "tease" the knowledge out of the user by posing plausible questions.
Importantly, the number and quality of follow-up questions LEARNER can ask goes up as the amount of knowledge in the system goes up. Our approach is to intelligently re-use the knowledge that users put in to get more good information.
However, the mechanisms of "intelligently re-using inputted knowledge" are necessarily diverse.1
From the outset, LEARNER has been designed to be able to capture the large diversity of reasoning methods. It is open source, has a general plug-in architecture for question-posing modules, and an extensive and well documented API.
Even though it does not get sensory information the way people do, it can
accumulate and use a vast amount of knowledge and rules. With plenty of
luck, LEARNER can become a collaborative creation of mankind that rivals
any other artifact in its usefulness.
Simply put, there currently are no good tools for dealing with
assertion-level unstructured data.
This is generally an acknowledged problem for any knowedge-intensive business.
For example, DARPA recognizes the importance of being able to
gather and reason with heterogeneous, unstructured knowledge in its recent
Rapid Knowledge Formation (RKF) initiative.
However, the enabling technologies such as:
have matured over the past 20 years to a point where robust, new
applications can be built on top of them.
Furthermore, the project of collecting knowledge and even the
project of developing the set of methods to process such knowledge
cannot be effectively tackled by a small team. Thus, it almost
inevitably must be the "distributed" or, more specifically,
"community-based" approach. Such approaches have become feasible only
recently thanks to the World Wide Web and the advent of the Open-Source
movement.
It is the goal of the LEARNER project to break new ground in
collecting large repositories of unstructured assertions and enable
reasoning in them. The underlying technologies and approaches exist.
What is needed now is an organized effort of individuals skilled in this
field.
The LEARNER technology is still very young. However, there are many
huge opportunities that it could, with maturity, address very effectively.
Here are some examples of potentially very successful and lucrative
applications:
Generally, when speaking to each other, people can communicate effectively
because the listener is assumed to have a lot of commonsense knowledge and
some reasoning ability. Computers and other devices can potentially join
the class of intelligent listeners if they are equipped with a large
common sense knowledge base.
The rest of the manual describes how LEARNER works so you can use and
extend the project.
Motivation
Node:Installing Learner,
Next:Running Learner,
Previous:Introduction,
Up:Top
Before installing any LEARNER software, you must download and install a large database containing the WordNet lexical database and a released part of the CYC ontology. This combined database is called BRICOLAGE or BRICO, and it is in the format used by the Framerd Scheme interpreter.
As of this writing, you can find this database at
<http://framerd.org/download.html>
by clicking on the .tar.gz link after the words "BRICO
ontology".
Now unpack this .tar.gz file, preferably under a new
/usr/local/share/brico/ directory. The rest of these
installation notes assume that you used this specific directory, such
that the file
/usr/local/share/brico/brico/brico.pool
exists.2
Once the BRICO database has been installed, you can install the LEARNER software. As of this writing, the LEARNER software is available for download at
<http://mit.edu/fredette/www/learner/>
To run the LEARNER, you need to download and install a total of four software packages from this site, in the following order:
link-4.1+learner0.1.tar.gz> is a modified version of
the Link Grammar Parser,
an English language parser developed at Carnegie-Mellon University.
Our modifications include some additional functionality, and the ability to be integrated with FramerD.
Unpack this .tar.gz file and follow the instructions in the
INSTALL file to compile and install this package.
framerd-2.2preA+learner0.1.tar.gz> is a modified
version of a distributed object-oriented database
FramerD, which includes an extended
Scheme implementation and was developed at the Media Lab of the
Massachusetts Institute of Technology.
Our modifications include new functions to integrate the Link Grammar Parser.
Unpack this .tar.gz file and follow the instructions in the
INSTALL file to compile and install this package.
learnerdict-0.1.tar.gz> is an English dictionary used
by the LEARNER software.
Unpack this .tar.gz file under the exact same directory under
which you unpacked the BRICO database.
learner-0.1.tar.gz> is the LEARNER software itself.
Unpack this .tar.gz file and follow the instructions in the
INSTALL file to build this package. The LEARNER does not
need to be installed.
Please note: If you did not install the BRICO database and
English dictionary under the /usr/local/share/brico directory,
(for example, to store it on a large second disk or on the network), you
will need to tell LEARNER's configure script where to find the
database. For example, if you installed BRICO under
/usr/bigdisk/brico, you would do:
./configure --with-brico=/usr/bigdisk/brico
For these early releases of the LEARNER, you interact with it entirely at a FramerD command line. You can reach this command line by running the command fdscript at your shell prompt, or, if FramerD found an Emacs when it was installed, you can get a captive command line buffer in Emacs by doing M-x fdscript.
The first fdscript command you should run should change the
working directory to where you built the LEARNER. If you unpacked
and built the LEARNER under the directory
/usr/home/test/learner, do:
(cd "/usr/home/test/learner")
There are a few parameters you can set to control how the LEARNER runs:
| %keep-pools-and-indices | Variable |
This is set in learner-init-server.fdx. If it is #f,
all the pools and indices LEARNER uses will be wiped out if they are
present (and empty ones will be created in their place).
Otherwise, the LEARNER assumes indices and pools are in place and uses those. |
| %prefetch-brico | Variable |
When defined and not #f, will prefetch every topic mentioned in
LEARNER's database from BRICO. This takes some time but
speeds up future operation. It is convenient to turn this off if you
are relaunching the system frequently.
If running in client-server configuration, see |
| %run-as-standalone | Variable |
When #f, loads the system as the client assuming a server process
has been started separately. The sever process can be started with the
command fdserver learner-server.fdz --local.
|
| %use-generalization | Variable |
When not #f, enables another mechanism for generating questions.
Roughly, given cats have tails and dogs have tails, uses
BRICO's hierarchy to arrive at the hypothesis that (and ask
whether) all pets have tails. This feature is still in tuning
and should probably be left "off" for beginner users.
|
| %user | Variable |
This is the default username to use. If not set, "unknown" will be
used. When using with a web interface in a multi-user situation, the username
should be provided in the 'user slot of some API calls.
See The Learner Interface API Functions, for
an explanation of which API calls need the 'user slot.
|
| %verbose | Variable |
When not #f, will cause printing out of various information.
Furthermore, if (contains? 2 %verbose), more information (the
level 2 printouts) will be outputted. The most output is obtained by
setting %verbose as follows: (define %verbose (choice 1 2 3
4 5).
|
For example, you may set the variables and load the system as follows:
(define %run-as-standalone #t)
(define %user "guest")
(define %prefetch-brico #t)
(define %verbose #t)
(define %allow-slotted-frame-questions #t)
(define %use-generalization #f)
(define launch-learner
(lambda ()
(if %run-as-standalone
(load-library "learner-init-server.fdx")
;; run as client of client-server
;; NOTE: the server should be started before running the
;; following line. Server can be started with, approx.:
;; 'fdserver learner-server.fdz --local'
(load-library "learner-use-server.fdx"))
(load-library "learner-init-client.fdx")))
(launch-learner)
If you would like to run it in the client-server configuration, you will need to start the server in a separate shell with
fdserver learner-server.fdz --local
(you may wish to set %prefetch-brico and %verbose in learner-init-server.fdx). When launching the client (inside Emacs if you wish) make sure to set:
(define %run-as-standalone #f)
You can have several client processes talking to the same server. All the database updates and retrievals are done by the server to avoid cache coherency problems.
The client exports the API functions, as listed in the
learner-client-exports.txt. Those are accessible by issuing
dtcall commands.3 .
Next, you can make sure that things are functional by taking the following steps:
learner-initial-assertions.fdx,
learner-initial-iframes.fdx,
and
learner-tests.fdx.
Or, you can go on to the 30-second tutorial (see Command Line Interface).
Note that as one of the first users of the LEARNER, your contribution makes a difference. The knowledge you contribute at this formative stage will shape the future direction of the development of the LEARNER.
We hope you enjoy experimenting with and extending the LEARNER. LEARNER can achieve its potential only with the contributions of many, so we strongly encourage you to contribute any significant changes back to the LEARNER community. See Wishlist, for possible directions.
Should you accumulate a significant knowledge base with your copy
of the LEARNER, we strongly encourage you to share all or (if you
must) part of it with the LEARNER community by contacting us.
Node:Web Interface,
Next:Command Line Interface,
Previous:Running Learner,
Up:Top
You can start by asserting something (in the "say something interesting" box) or by clicking on one of the "hot topics"
You can also get more information on any specific topic by clicking on it when it is a hyperlink, or by entering it in the "Summarize the topic blank" field.
Finally, you can make any existing topic the topic of conversation by entering it in the blank in the "Make blank the new topic" section.
Here are a few tips on how to (and how not to) enter knowledge so LEARNER can make the best use of it:
Yes,
and for the wrong ones, change it to No. It's OK to leave
some as Don't know.
?.
? will be stripped off from your replies.
Yes or No reply, you can (and, when it makes sense,
you are encouraged to) provide a reason in the "because" field.4
For example, when presented with a question a car has a tail?,
you are encouraged to respond with:
a car has a tail? <No> because a car is not an animal <Yes>5
Please note: only enter "because" reasons that can stand on their own. Do not rely on the words it or they.
Wrong:
a car has a tail? <No> because it is not an animal <Yes>
<Yes> or <No>, correct the system
when possible. For example, if the system says:
a car uses electricity?
You are encouraged to edit the line to read:
a car uses gasoline... and select the
<Yes> radio button.
Ford makes minivansBut there is no capitalization in:
sun rays are sometimes blocked by clouds
Do not forget to select the appropriate <Yes> or <No>
checkbox.
To use the interface most effectively, it is best to have the guess and its "because" field fit on a single line. You may need to resize your browser window and/or your browser's font sizes to achieve this.
The interface also allows you to query what LEARNER knows in the input box labeled "try to answer a question". See Asking Learner Questions, for an explanation (with examples) of how to pose the questions.
To run the examples given here, start up FDscript, load the LEARNER code, and make sure the database contains only the assertions from learner-initial-assertions.fdx and the Iframes from learner-initial-iframe.fdx. You can do this by starting with an empty database and loading these two files.
These are the functions you can use to talk to the system from the FDscript command prompt or from Emacs if you are running FDscript within Emacs:
show-topics show-topic-summary say-topic say say-Iframe! find-answers
See Learner Interface API, for definitions of these functions.
show-topics command to see what the LEARNER knows about. For example, (show-topics 4 4) shows the topics the system knows exactly four things about. On the initial database, this command produces the following output:
;; There are 3 results
("cup" . 4)
("elephant" . 4)
("fork" . 4)
;; Nothing (void) was returned
(show-topics 2 -1) shows the topics the system knows at least two things about.
show-topic-summary command.
For example, (show-topic-summary "cat") produces:
(show-topic-summary "cat")
;; Topic: cat (8 'yes' uframes, 0 'no' uframes, 0 srvframes known)
;; Similar topics:
("dog" . 5)
("bear" . 2)
("car" . 1)
("elephant" . 1)
;; Statements that have 'cat' as a topic:
("a cat has a tail" . 10)
("a cat is a pet" . 10)
("a cat is an animal" . 10)
("cats can scratch with their paws" . 10)
("cats drink milk" . 10)
("cats eat mice" . 10)
("cats have paws" . 10)
("cats have sharp claws on their paws" . 10)
;; Nothing (void) was returned
The number next to each "similar topic" is the similarity strength. The more the system knows, the closer to human judgment these will generally be.
The number next to each "statement that has 'topic' as a topic" is the strength of the topic in the assertion. For example, having "cat" in the subject position merits a higher score than having it the object position.
say-topic command.
For example, (say-topic "cat") produces something like:
;; I would like to know: (sc: 3) Please confirm, deny, or correct: a cat barks when it is angry? (sc: 3) Please confirm, deny, or correct: a cat can bark? (sc: 3) Please confirm, deny, or correct: a cat can bite you? (sc: 3) Please confirm, deny, or correct: cats chew on bones? (sc: 3) Please confirm, deny, or correct: cats drink water? (sc: 3) Please confirm, deny, or correct: cats eat meat? (sc: 2) Please confirm, deny, or correct: a Porsche is a fast cat? (sc: 2) Please confirm, deny, or correct: a cat can be fast? (sc: 2) Please confirm, deny, or correct: a cat is an object? (sc: 2) Please confirm, deny, or correct: a cat is brown? (sc: 2) Please confirm, deny, or correct: a cat is dangerous? (sc: 2) Please confirm, deny, or correct: a cat is similar to a wolf? (sc: 2) Please confirm, deny, or correct: a fast cat has a big engine? (sc: 2) Please confirm, deny, or correct: a fast cat has large tires? (sc: 2) Please confirm, deny, or correct: a wolf is similar to a cat? (sc: 2) Please confirm, deny, or correct: an cat has tusks? (sc: 2) Please confirm, deny, or correct: an cat is very heavy? (sc: 2) Please confirm, deny, or correct: an cat is very large? ;; Nothing (void) was returned
This is the system guessing at what may be true about cats. Based on
similarity with dogs, the system has correctly guessed that
"cats drink water", but incorrectly that "a cat can bark".
say command. You can type:
(say "cats drink water") (say "a cat can not bark")6
At this point, you know enough to start talking to the system directly.
As you say more things to the system, it uses them to make better guesses about what is and what is not true. So, you're making the system smarter with every little bit of knowledge you put in. Every little bit really helps - so you can make this effort a success while just having fun!
A more direct, powerful way to make the system smarter is to teach the system "Thinking Rules" in addition to just facts.
For example, in response to the system guessing that a car has a tail
(or for any other reason), you may tell the system
(say-Iframe! '(("a car is not an animal")) '("a car has a tail" 0))
creating an Iframe with the 'what being "a ?car? is an animal <not> => a ?car? has a tail".
A way to ask the system about what it already knows is using the
find-answers command.
LEARNER can handle both simple interrogatives (e.g.
What do cats eat?) and fill-in-the-blank questions.
Here are some examples:
(find-answers "What do cats eat?") produces: ;; Finding answers for "cats eat ?X?" ;; Found an answer: "cats eat mice" is true
(find-answers "?Xes? eat ?Y?") produces: ;; Finding answers for: "?Xes? eat ?Y?" ;; Found these answers: "dogs eat mice" is true "cats eat mice" is true "dogs eat meat" is true
(find-answers "turtles eat mice") produces: ;; Finding answers for: "turtles eat mice" ;; Found an answer: "turtles eat mice" is not true
The LEARNER has a set of features that allow it to ask relevant follow-up questions when obtaining information from the user. This set can be expanded as needed; the system itself ships with a core set of plug-ins. We demonstrate what these plug-ins allow LEARNER to do in a series of examples.
If two topics are similar, similar statements will be true about them. For example, a "spoon" and a "fork" are similar, and it is true that "you can eat with a spoon" and "you can eat with a fork", and "forks are usually made of metal" and "spoons are usually made of metal".
Furthermore, similarity drops off gradually. For example, a "spoon" and a "shovel" are both inanimate objects that are tools. In some general sense, they are more similar than, say, a "spoon" and a "rabbit". On the level of assertions, we observe that you can say "you can eat with a spoon" and "you can dig with a shovel", but it is awkward to phrase something similar about a "rabbit".
Based on these two observations, it should be possible to go the other way. That is, similarity of topics can be derived from similarity of assertions about them!
That is what LEARNER does.
Similarity is (or should be) used throughout the system to drive creation of new hypotheses, estimate plausibility of an answer or a newly acquired fact, retrieve most relevant information, organize how retrieved information is presented, and so on.
In our experience, most of the things LEARNER needs to do can be
reduced to some combination of similarity and inference computation. So,
having a good similarity function is pretty important.
similar-topics-hash is the function that implements similarity in the
LEARNER.
Topics similar to a given topic (the source topic) are computed by taking the following steps:
See scored-assertions-on-topic.
See similar-Uframes-set.
Fore example, for a source topic "dog", "dogs eat mice" may be an assertion about dogs (a source assertion). Then, "cats eat mice" would be a "similar assertion", and "cats" in it would be in the same role as "dogs" is in "dogs eat mice". That would add weight to similarity of "cat" and "dog".
See corresponding-item.
Step number 2 above was to find the most similar assertions given an assertion. That in itself is a multi-step process (that can be improved be a willing contributor!). Currently, similar assertions to a given source assertion (uframe) are computed by taking the following steps:
Explicit statements about what is similar do not currently affect the internal
similarity measure, but they could, so that saying
"a spoon is similar to a fork" (or finding that out from
BRICO or from mining the World Wide Web) would prompt exploration of
how they are similar.
Relevant functions:
| scored-assertions-on-topic topic | Function |
Given a topic, returns conses of assertions and strengths. The strength
for an assertion is what strength of the topic would be if we called
assertion-topics on the assertion.
|
| corresponding-item topic uframe similar-uframe | Function |
|
Given a topic, a uframe containing the topic, and a
similar-uframe, returns the topic from similar-uframe that
is in same role, i.e. corresponds to topic in the uframe by
its parsing role.
Does not |
| similar-Uframes-set uframe threshold . exclude-topics | Function |
| Given a uframe, a threshold, and an optional choice of exclude-topics, returns the choice of uframes at least threshold similar to uframe; excludes any frames that mention one or more of the exclude-topics. |
| similar-uframes-of->string uframe | Function |
Given a uframe, returns a string describing uframes at least
similarity-threshold (hardcoded to a certain value) similar to it.
Accounts for probability classes of the frames
(lineout
(similar-uframes-of->string (the 'uframe "dogs eat mice")))
;; Related to what you said:
("cats eat mice" . 26) ("cat <-> dog")
("dogs eat meat" . 13)
On each line, the number associated with the string is the similarity score, and the optional parenthetical argument lists which things in equivalent positions in the two assertions were found to be BRICO-similar. |
The key task of the system is to pose plausible questions on the topic of conversation. How can we come up with these?
There are a few options. One is to re-use what other users have put in, verifying it verbatim. While perhaps needed, it does not lead to a very interesting system, as there is no expansion of the knowledge base.
Another is to seed LEARNER's questions with the statements that seem plausible based on, for example, mining the World Wide Web. That could be a very interesting direction, especially if what was extracted was sufficiently clean.
Yet another approach would be to rely on the LEARNER's inference mechanism to generate new statements. The problem here is that we often do not have enough inferences in the new to LEARNER areas to make this the "weight bearing" mechanism.
Finally, there is a very good avenue, the one the system actually uses. This avenue is to generate new statements about a topic by analogy from known statements about similar topics.
So, fundamentally, the analogy mechanism resides on top of the similarity mechanism. Analogy is potentially a very powerful and sophisticated tool. Here, we describe its current implementation in the LEARNER.
The preceding section on Making Similarity Judgments (see Making Similarity Judgments) showed how to compute topics similar to a given topic. Let's call such topics friends of the source topic.
To analogize the statements from similar topics (friends) to the target topic, we basically take all the assertions that are true about the friends, change them to be about the target topic, and sum up assertions from all the friends, giving more weight to statements that came from better friends (more similar topics), and letting assertions that are true about some friends and not true about others partially cancel each other out.
More formally, we take the following steps:
Each analogized assertion has a score associated with it - the "better the friend" (i.e. the more similar the source topic), the more weight assertions analogized from it get.
Furthermore, it is an upcoming feature that assertions that are true about more things get more weight - so that the system progresses from the more general to the more specific questions in its learning about a topic.
More modules for generating plausible things to ask can, and will, with time, be added.
Once the assertions are computed, they get "cleaned up" to exclude asking what is already known, what is inferable from other things we are already asking, etc. See finalizing questions, for details.
This capability is built on top of Iframes. See Iframes, for details on working with inferences.
Inference is used in several places.
find-answers command. Currently, it simply tries to look it up in the
knowledge base.
LEARNER has a plug-in architecture for generating questions. That means that separate users can experiment with their own question-formulating modules and take advantage of the overall framework to organize and present questions for them.
Fundamentally, LEARNER's "read-eval-print loop" takes the following steps:
We present in more detail at the steps of creating the questions and finalizing them.
When you say things to the system, you are making it more knowledgeable. Through its built-in analogy and generalization mechanisms, it can also think of (hypothesize) new things. Fundamentally, these new hypotheses form the basis of the questions the system is asking.
Here, we describe the several specific mechanisms for creating new questions that ship with the LEARNER distribution.
The LEARNER is set up to support multiple independent question generators. This is to allow independent groups to experiment with question-generating and to make it simple to increase LEARNER's question-asking prowess.
To make question-generation easy, a lot of the "clean-up" functionality is offloaded into a common finalizing stage. All question-generators feed their outputs into the finalizing stage and this stage generates and outputs the final set of questions.
Finalization consists of the following actions:
Several questions that are asking the same (or nearly the same)
thing are combined into a single question. This is useful when separate
mechanisms generate the same question. The scores of the questions are
combined, paying attention to the probability classes of the statements
(a positive and a negative conjecture cancel each other's scores out).
A question is dropped altogether. There may be several reasons for this:
Given a great many questions, in which order do we ask them? The question generators produce questions already paired with scores, but the other work done by the finalizer can further alter the scores.
The pruning of questions to "those which do not follow from answers to other questions" deserves further explanation.
For example, if LEARNER is considering asking about the truth of the following statements:
Question 1: "cats have paws" Question 2: "cats have sharp claws on their paws"
and it already knows that "?snakes? have ?paws? <not> => ?snakes? have sharp claws on their ?paws? <not>" (i.e. if you do not have X, you cannot have sharp claws on X), then it will not ask Question 2 together with Question 1.
This is an important feature. If the interface makes it easy to enter "because" reasons (Iframes), then filtering out of the dependent questions will lead to spontaneous structuring of the dialogue. That is, the system will evolve from a mass of questions towards a decision tree type dialogue. The initial questions will be the more general ones, and depending on their answers, the more specific ones will become relevant.
This mechanism helps make LEARNER a powerful knowledge acquisition tool that improves as it accumulates more knowledge.
Learner Interface API covers the functions you should use to interact with the LEARNER, both from a command line and from a front end.
The subset that should be used from the command line was described in the tutorial on using the command line interface (see The 30-second tutorial).
The front-end API functions are a subset of the interface API. The functions exported to front-ends are as follows:
functions for adding to the system: add-to-kb-start add-to-kb-with-string-output add-to-kb-end-with-string-output say-iframe!-with-string-output say-topic-with-string-output functions for browsing: find-answers-with-string-output similar-uframes-of->string show-topics-with-string-output show-topic-summary-with-string-output
These functions allow you to build front ends for the LEARNER without the need to understand its architecture or algorithms.
The file learner-client-exports.txt dictates which functions are
exported.
In describing the API functions, we introduce the notion of a term. A
term is the input you can give to the system to describe a frame
- either to retrieve a frame that already exists or a frame you would
like to create. The function that parses these terms is term->protoframe; it is defined below. Interface functions rely on it to parse the
terms they receive as arguments.
A term can be one of the following:
(list string [probability] [slot value]*)
The string may contain variables, which are denoted by surrounding them in
?. For example:
'("cats eat mice")
(list "cats eat ?mice?")
(list "cats can bark" 0)
(list "pigs cannot fly" 'slot1 'value1 'slot2 'value2)
are all valid terms denoting Uframes.
For example,
'("cats" "eat" "mice")
is a valid term denoting an SRVframe.
| term->protoframe term . other-slots-values | Function |
|
Given a user-inputtable description of a frame term, creates and returns
a protoframe described by term and including other-slots-values.
This function is verbose, i.e. it will output lines describing the problem if the inputted expression cannot be parsed. |
This section defines all the commands that constitute the interface API of the LEARNER.
| add-to-kb . term | Function |
|
Given a term, this function adds it to the knowledge base, producing
lineouts if there are any problems (e.g. can't parse, the system already
believes the opposite, etc.)
Registers the assertion in this user's history If term has a See also Please note: When specifying slotted frames, if the atoms mentioned in the term do not exist, neither the atom nor the frame get created. This is because parsing provides a part of speech and some other useful information, so we prefer for all atoms to be created as a result of parsing and then used in slotted frames as needed. |
| add-to-kb-with-string-output . term | Function |
A version of add-to-kb that returns its output as a string rather
than printing it out.
Examples: (add-to-kb-with-string-output "a cat has a tail" 'user "test") (add-to-kb-with-string-output "cat" "have" "tail" 0.9 'user "test") |
| add-to-kb-start . slots-values | Function |
This function should be called before adding zero or more assertions to the
system. See also add-to-kb-end-with-string-output. The set of
assertions enclosed between these two commands is treated as one input
session to which LEARNER responds.
slots-values should contain the |
| add-to-kb-end . slots-values | Function |
|
This function should be called after all new things the user has said in one
submission have been added to the knowledge base. This function produces the
output to present as the reaction to what the user has said.
slots-values should contain the |
| add-to-kb-end-with-string-output . slots-values | Function |
| find-answers . term | Function |
|
Given a term, interprets it as a question to the system and tries
to find or infer knowledge that would constitute an answer to this question.
If term has a See The Syntax of a Term, for an explanation of how to specify terms. For example, if you ask (find-answers "?Xes? have tails") or (find-answers "a ?X? has a tail"), with just the initial database loaded, you will see output similar to: ;; Finding answers for "?Xes? have tails" ;; Found these answers: a dog has a tail a cat has a tail ;; Nothing (void) was returned |
| find-answers-with-string-output | Function |
A version of find-answers that returns its output as a string rather
than printing it out.
|
| say . term | Function |
|
This is a wrapper for asserting a single assertion and getting a reply the
system. Given a term, adds it to the KB, registers it in
the history, and poses the relevant questions.
See also: add-to-kb-start add-to-kb-with-string-output add-to-kb-end-with-string-output |
See Iframe Functions, for the definition of
say-iframe! and say-iframe!-with-string-output.
| say-topic arg | Function |
| This is the main way a user can set the current topic of conversation. Given an arg (an atom frame or a string) this sets the current topic to be the base-form of arg. |
| say-topic-with-string-output | Function |
A version of say-topic that returns its output as a string rather than
printing it out.
|
| show-topic-summary string . [slots] | Function |
| Shows the summary of the topic indicated by the string and the optional slots. |
| show-topic-summary-with-string-output topic-str | Function |
A version of show-topic-summary that returns its output as a string
rather than printing it out.
|
| show-topics min max | Function |
given a min and a max, this top-level command outputs topics about
which at least min and at most max uframes are known. If max
is -1, no upper limit is used.
|
| show-topics-with-string-output | Function |
A version of show-topics that returns its output as a string rather
than printing it out.
|
| similar-uframes-of->string uframe | Function |
|
Given a uframe, returns a string describing uframes at least
similarity-threshold (hardcoded to a certain value) similar to it.
Accounts for probability classes of the frames. |
LEARNER depends on two major pieces of software: FramerD is a Scheme interpreter married to a flexible database implementation, and the Link Grammar Parser is an English parser from CMU.
FramerD is a distributed object-oriented database used by LEARNER. FramerD is available under the LGPL and includes persistent storage and indexing facilities that can scale to very large database sizes, as well as a language FDscript, a superset of Scheme.
LEARNER is written in FDscript.
FramerD introduces the concepts of frames, slots, and slotmaps which we use in describing how LEARNER works.
FramerD also comes with a version of the WordNet lexical database and a released part of the CYC ontology combined and converted into the FramerD format (the database is called BRICOLAGE or BRICO). For now, LEARNER uses the WordNet component only.
FramerD also has many attractive features:
To do the more advanced things with the LEARNER, you will need to understand FDscript.
FramerD documentation, covering the database implementation and the
FDscript language, was available at the time of writing at
<http://framerd.org>.
The Link Grammar Parser is a constraint-based English-language parser that tries to assign a consistent set of linkages between all words in a sentence.
The Link Grammar Parser is an impressive system in its own right. The parser is written in C and source code is freely available for non-commercial purposes.
Complete distribution and documentation of the link grammar parser was available at the time of writing at http://www.link.cs.cmu.edu/link.
Here is an example of how the parser would parse the sentence
"cats eat mice":
+-Sp-+--Op-+ | | | cats.n eat mice.n
The above parsing contains the following information about the word "cats":
cats is a noun - cats.n,
cats has the subject role in the sentence - it is on the left side
of an S* link; another way to say this is that cats forms an
S- link with the word eat,
cats and eat are plural - they are linked by the
Sp link in which the lowercase p denotes plurality.
LEARNER currently only accepts sentences that can be parsed completely. When multiple parsings can be found by the parser, the LEARNER uses the first one.
This can lead to some unexpected results. For example in the sentence
cats have sharp claws, sharp gets parsed as a noun in the first
returned parsing. This does not, however, cause any known difficulties with
the operation of the LEARNER.
According to the authors of the Link Grammar Parser, a "statistical version of the parser" is under development and may become available sometime in the future to address this.
All frames in LEARNER, be they for representing atoms, assertions, or parser links, have some fundamental mechanisms (such as inheritance) and policies (such as rules on mutating frames) that apply to them. We start by overviewing these mechanisms and policies and then go on to describe how more specific types of frames work.
Frames in LEARNER have an inheritance mechanism. Namely, the ifget
command works the same way FDscript's built-in fget command does,
except it will recursively follow a frame's 'inherits-from slot until
it fails or gets to a frame that has a value for that slot.
It is a policy that frames which are OIDs are not to be mutated. Mutating them
would lead to difficulties with the need to update their indexing and with
knowing what the original author asserted. Rather, protoframes that inherit
from oids are created. Mutation of protoframes is permitted where appropriate,
as in committing scheduled changes (see Variables). To effect this policy,
use a LEARNER function fset-safe!.
Relevant functions: ifget
LEARNER's knowledge consists of assertions, but to work with these effectively, we organize assertions around topics. Currently, only nouns in their base form can be topics, but gerunds ("skating" in "some people like skating") and noun phrases ("beach chair") can also be added.
Interactions with LEARNER revolve around topics and similarity is also measured between a pair of topics (the similarity of sentences helps compute similarity of topics).
Assertions are said to have the main topic (sometimes not present) and, more generally, topics.
Related functions:
| assertion-main-topic assertion | Function |
| Given an assertion, returns its main topic, if any |
| assertion-topics assertion | Function |
| Given an assertion, returns a choice of conses, first element of each is a topic, and second is a score of how much the assertion is about the topic |
| topic-total-mentions | Function |
| Given a topic, returns how many assertions are indexed by it |
| topic-frequency topic | Function |
Given a topic, returns how many certain-yes? assertions mention it
|
| topic-absolute-frequency-weight topic | Function |
| Given a topic, returns returns its weight (decreases with frequency, i.e. is less for common words) |
The preceding chapter explained how frames are used to represent individual atoms and "topics". This chapter explains how frames are used to represent compound structures to hold assertions.
There are two types of frames for holding assertion-level information. One is a Uframe (for "Utterance frame"), used to hold information in a natural language, and the other is a slotted frame, to hold information for internal processing.
In this section, we explain the common features of all frames representing assertion-level inforamation. More details about the specifics of each type are available in the sections that follow.
A Uframe (for an "Utterance frame") is a structure for holding utterances in English (or, potentially, any natural language).
Uframes are created from the output of the link grammar parser and roughly mirror it, although they have additional slots.
A Uframe contains the following slots:
'parsed
'simplified
'utterance-type
'significant-keys-links-hash
For efficient processing, we envision that Uframes will be recognized into internal data structures - slotted frames.
An SRVframe is an example of a slotted frame. SRV stands for
Subject-Relation-Value. Accordingly, an SRVframe has three slots: the
'subject, 'relation, and 'value.
See Variables, for a description of frames with and without substitutions.
add-to-kb, add-to-kb-with-string-output, say are used
to add assertions.
similarity: similar-uframes
Frames may be assertions if they have no substitutable atoms, or templates if they do. When a frame has a substitutable atom (a variable), the atom is shown surrounded with question marks. Here is a quick example of making the atom "cat" variable:
(let ((frame1 (frame-atom-is-substitution
(the 'uframe "cats eat mice")
(a "cat"))))
(frame-finalize! frame1)
frame1)
The above example returns a protoframe with the 'what being "?cats? eat mice".
Whether a frame is an assertion also determines which index it is indexed in (see Indexing).
Note that for efficiency reasons, LEARNER has a system of scheduling updates to a frame, keeping a log of changes to be made. The log may contain directives such as:
cat a variable",
dog for cat", and
dog as a variable"8
You need to commit a log before examining the slots that the scheduled
changes affect. frame-update! and frame-finalize! do that. The
! functions (such as frame-vaiables!, ifget! are the
same as their non-! counterparts, but they update the frames they
operate on. Generally, updating a frame that has previously been updated is
a low-cost operation that does not mutate the frame.
| variable? frame | Function |
Returns #t for a template and #f for an assertion.
|
| frame-atom-is-substitution frame atom | Function |
| Given a frame and an atom, this returns a proto-frame where atom is now a variable (i.e. candidate for substitution). |
| frame-atoms-are-substitutions frame atoms | Function |
| Given a frame and a list of atoms, this returns a proto-frame where atoms are now variables (i.e. candidates for substitution). |
| frame-atom-is-not-substitution frame atom | Function |
| given a frame and an atom, this returns a proto-frame where atom are no longer candidates for substitution. |
| frame-atoms-are-not-substitutions frame atoms | Function |
| given a frame and a list of atoms, this returns a proto-frame where atoms are no longer candidates for substitution. |
| frame-substitute-atom frame atom-from args | Function |
| Given a frame and an atom-from and the substitution args, (which in simplest case is atom-to. Returns a proto-frame where that substitution has been scheduled. |
| frame-substitute-atoms frame atom-map | Function |
| Given a frame and an atom-map, returns a proto-frame where these substitution has been scheduled. |
The LEARNER currently uses six index files:
assertions.index
'probability
close to .5).
inferences-lhs.index
inferences-rhs.index
memoization.index
templates.index
types.index
'type and 'what.
Uframes are indexed by the pairs (Wframe . #t) and by the
(Wframe . link-with-direction). For example, the
Uframe (the 'uframe "cats eat mice") will be indexed
by9:
(@3af3492b/57802a"WFRAME: eat" . #t) (@3af3492b/57802a"WFRAME: eat" . S-) (@3af3492b/57802a"WFRAME: eat" . O+) (@3af3492b/57802d"WFRAME: mouse" . #t) (@3af3492b/57802d"WFRAME: mouse" . O-) (@3af3492b/578014"WFRAME: cat" . #t) (@3af3492b/578014"WFRAME: cat" . W-) (@3af3492b/578014"WFRAME: cat" . S+)
A pair such as (@3af3492b/57802a"WFRAME: eat" . S-) means that this
frame can be retrieved by a low-level fdscript call
(find-frames assertions-index (the 'wframe "eat") 'S-)
or by a higher-level LEARNER call
(all-relevant-assertions (the 'wframe "eat") 'S-).
Slotted frames are indexed by the pair (slot-name . wframe).
For example, and SRVframe "[cat|have|tail]" will have
('subject . @3af3492b/578014"WFRAME: cat") as one of its indices.
An Iframe is basically a "rule" of a rule-based system. It has
the two structure slots 'left-terms and 'right-term. These
hold its LHS (left hand side, preconditions) and RHS
(right hand side, postcondition).
An Iframe's LHS is indexed in iframes-lhs.index and
RHS in iframes-rhs.index
The 'left-terms value is a list of frame templates, and the
'right-term value is a single frame, typically a template (having
an assertion is allowed, but special provisions not to index it in
assertions need to be made).
All of the terms on the left hand side must contain "substitutions" - atoms where anything can be plugged in (see Variables).
For a rule, when the LHS is satisfied, the RHS should also hold true.
Here is a 'what of a sample Iframe:
"a ?cat? is sleeping => a ?cat? is awake <not>"
This Iframe implies that if X is sleeping, X is not awake.
This frame can be instantiated with a call
(frame-plug-in-atom (the 'Iframe "a ?cat? is sleeping => a ?cat? is awake [p=0]") (the 'Wframe "cat") (the 'Wframe "dog"))
See Asking Learner Questions, for more on inputting Iframes.
LEARNER provides functions for creating, retrieving, and doing inference with Iframes.
| say-Iframe! lhs-expr rhs-expr . [other-slots-values] | Function |
Given lhs-terms-lst, rhs-term, and optional other-slots-values,
adds and returns the Iframe.
|
| relevant-iframes side frame | Function |
Searches the database for all the Iframes that contain a frame matching
frame on the specified side, which must be either 'lhs
or 'rhs. The function relevant-iframes returns a choice of
frames, and may return frames that do not match frame, but look like
they would.
|
| say-iframe!-with-string-output | Function |
A version of say-iframe! that returns its output as a string rather
than printing it out.
|
| infer-one-iframe iframe premise-frame | Function |
Given an iframe and a premise-frame, returns the frame inferred on
the RHS. If iframe may have an additional test (arbitrary
fdscript expression) specified. If present, that test has to evaluate to a
true value in order for the inference to happen. This also computes the
RHS frame's 'ask-string if such a slot is present.
|
| infer-from premise-frame [iframe-filtering-predicate] | Function |
|
Given a premise-frame and an optional iframe-filtering-predicate,
applies all relevant iframes (that pass the one-place
iframe-filtering-predicate if it is specified).
Returns |
LEARNER is, in a way, a bet on people. Much of LEARNER's power comes from having access to an "oracle" - the contributing population collectively holds an enormous body of diverse knowledge. As people contribute it, LEARNER hangs on to it, ever rising in its sophistication.
The opportunity to solve hard problems with incremental contributions of many has not existed historically. One of the project's goals is to explore and to learn more about the collaborative approach to solving problems.
The demands of the project are diverse, and successful growth of the LEARNER requires contributions on different levels.
Basically, the needs form a pyramid:
core
write plug-ins
contribute inference rules
contribute and verify simple assertions
Luckily, we can expect the contributors to naturally be distributed in roughly such a pyramid as well.
This is because the prior experience required is an inverse pyramid:
some lisp / scheme experience, knowledge representation / AI background
knowledge rep / ai background or interest, some programming
analytical skills, familiarity with reasoning
posses common sense
The amount of effort required to contribute to the lowest level of the pyramid is also much less (and has smaller overall effect) than to contribute to a higher level.
To put it another way, the natural distribution of contributors is likely to
roughly match the need.
Node:wishlist,
Next:bug reports,
Previous:contributors,
Up:Learner and the Community
An upcoming feature of the LEARNER is persistent (on disk) storage of the assertions LEARNER would like to find out. This repository will bear the dramatic name "the purgatory". There are several reasons a frame can be added to the purgatory:
You can contact the maintainers at timc@alum.mit.edu.
When reporting a problem, please include as much of the following as is relevant so that we can address it:
Your patches are also welcome and will be incorporated in a timely fashion.
There are some simpler systems on the web for your enjoyment.
The concept of the LEARNER was originally developed by Tim and Anatoli Chklovski, as were the similarity, analogy, and question-pruning algorithms.
Matthew Fredette and Tim Chklovski have cooperated on the LEARNER implementation.
We are grateful to Alex Vasserman, who has contributed some Link Grammar Parser glue code.
We are grateful to Push Singh for his feedback and sharing his experiences on the OpenMind Commonsense project with us.
We are also grateful to Kenneth Haase for his continued support and development of FramerD and his responsiveness in personal communication.
add-to-kb: interface API functions
add-to-kb-end: interface API functions
add-to-kb-end-with-string-output: interface API functions
add-to-kb-start: interface API functions
add-to-kb-with-string-output: interface API functions
assertion-main-topic: atoms and topics
assertion-topics: atoms and topics
corresponding-item: similarity
find-answers: interface API functions
find-answers-with-string-output: interface API functions
frame-atom-is-not-substitution: substitution functions
frame-atom-is-substitution: substitution functions
frame-atoms-are-not-substitutions: substitution functions
frame-atoms-are-substitutions: substitution functions
frame-substitute-atom: substitution functions
frame-substitute-atoms: substitution functions
infer-from: Iframe functions
infer-one-iframe: Iframe functions
relevant-iframes: Iframe functions
say: interface API functions
say-Iframe!: Iframe functions
say-iframe!-with-string-output: Iframe functions
say-topic: interface API functions
say-topic-with-string-output: interface API functions
scored-assertions-on-topic: similarity
show-topic-summary: interface API functions
show-topic-summary-with-string-output: interface API functions
show-topics: interface API functions
show-topics-with-string-output: interface API functions
similar-uframes-of->string: interface API functions, similarity
similar-Uframes-set: similarity
term->protoframe: terms
topic-absolute-frequency-weight: atoms and topics
topic-frequency: atoms and topics
topic-total-mentions: atoms and topics
variable?: substitution functions
In line with Marvin Minsky's general thesis in his seminal Society of Mind.
It is possible to override this with the
--with-brico option to LEARNER's configure script - see
the installation item for the LEARNER for more information.
see fdscript documentation for more details
This is a simplified interface for entering inference frames. See Teaching Learner "Thinking Rules", for the full set of options in specifying Iframes.
This said that a car does not have a tail because a car is not an animal. If you prefer, you can enter the equivalent a car does not have a tail <Yes> because a car - negating the statement or using the radio button do the same thing.
is an animal <No>
This statement is stored negated
as "a cat can bark" with 'probability 0.
Negated statements can also be entered as (say "a cat can bark" 0).
The latter filter prevents creating strange assertions such as "mice eat mice".
The term substitution is used interchangeably with variable.
The exact numbers after the @ signs will vary.