2. Position
InfoSleuth is being used as a significant component of the EDEN, a
collaborative effort of several government and non-govern-ment
agencies to allow common access to their combined store of
environmental data related to remediation and control techniques.
This problem is cast as a typical multidatabase problem. The
InfoSleuth agents use focused ontologies as a semantic framework for
integrating information from multiple sources. Useful general-purpose
agents include those that extract information from individual
databases, relational query processors, and domain-specific value
mappers. The current EDEN demonstration enables multiple
environmental databases throughout the US and Europe to be accessed
via a web browser.
Under a joint project with the US Department of Agriculture (USDA),
MCC has developed an application that supports the laboratory
protocols required to interpret genetic material taken from
livestock. DNA sequencing machines produce imaged sequence files that
must undergo a series of analysis steps, such as conversion to a
sequence of ATGC bases, extraction of component sequences (vectors),
and comparison with other sequences or genes from livestock and other
species that have been entered into genomic databases from the
worldwide genetic research community. InfoSleuth agents automate this
process. This application includes a "workflow" or "planning"
component not yet found in the EDEN application described above, and
this requirement places stronger requirements on the longevity and
cooperative behaviors of agents.
A third application of InfoSleuth has been that of acquiring,
integrating, and monitoring technical competitive intelligence (CI)
information from open sources. A primary activity in the CI domain is
to correlate information from open sources, discover trends and
associations across these sources, and detect significant shifts in
trends over time. Thus, this application focuses more on analysis and
extraction of information.
2.2 Dynamic Systems and Instability
2.3 `e'Properties for Agent-Based Systems
References
2.1 Background
Agent systems are ideal venues for enabling the integration and
interoperation of diverse information sources, as evidenced by our
work in InfoSleuth. InfoSleuth [1] has been
successfully used for integration of heterogeneous databases. Its
dynamic architecture, based on semantic brokering and a global
ontology, is well-suited to this type of application, even when the
component databases may leave the system at unpredictable
times. InfoSleuth has also been successfully deployed for applications
such as business intelligence and patent validation which require
analysis of large volumes of text data. However, as InfoSleuth has
matured, we have migrated the InfoSleuth technology to support
increasingly long-lived and demanding information-oriented tasks, with
mixed success. These applications are typically less tolerant of
occasional system failures.
Dynamic agent systems, whether information-oriented or not, operate in
a very unstable environment. On the agent side, we have agents with
varying capabilities entering and leaving the agent community,
potentially during the execution of some relevant tasks. These
arrivals and departures may be deliberate, but there may also be
unexpected events or faults. For instance, the hardware, operating
system or virtual machine may fail or operate incorrectly.
Alternatively, an agent may enter the system which implements a new or
improved service, which may be useful in improving the quality of the
current tasks. Also, a remote agent may respond with some unexpected or
incorrect result.
On the user side, we have a situation where the users are not only
encumbered with their usual flakiness, but also the agent system has
compounded this by blinding them to some degree to the actual
capabilities of the system and its internal operation. For
example, users may under-specify requests, resulting in long processing
times. Alternatively, users may specify inappropriate requests, due
to a lack of understanding of the current capabilities of the agent
system. Also, users may change their minds on exactly what they want.
The agent system may not give the users adequate feedback to be able
to correct these problems.
We present several design considerations with respect to the
reliability issues discussed above. An agent-oriented information
system should be eclectic, in that different pieces must
be able to be put together in different ways to satisfy different
needs. It must be ergonomic in that the adjustment of the
agent-based system to fit different needs should be easy and
comfortable. It should be exposed in that the internal
operation of the agent-based system should be explainable to the
users of the agent-based system as needed.
Eclectic: A typical agent-oriented information system should be
able to service a wide variety of information tasks, including
short-term queries over multiple and diverse information sources,
subscriptions to classes of information with personalized information
filtering, and ongoing comparisons and trend analysis. As many
information-type functions are usable across one or more of
these areas, agents should be able to fit themselves together in
different ways to satisfy different needs as specified by different
user requests.
One issue to be addressed is that tasks follow dynamic patterns
of interaction throughout an agent system -- the system
must be able to reassemble itself in different ways to satisfy
different needs. This requires some level of planning and/or process
enactment within the agent system, either explicit or implied.
The second issue is that, since different agents will fill similar
tasks at different times, the agent-based system must provide a fairly
sophisticated ability to match agents to required tasks. This may be
combined with an ability within certain agents to negotiate over
the terms for executing specific tasks.
Ergonomic: For an agent system to be agile, it must allow for
easy adjustment of the agents to fit the individualized needs of the
user. This in turn means that agent systems must conform to common
paradigms of interaction so that they are `good fits' with respect
to each other, and do not cause unnecessary stress on individual
agents. This includes both fit when the agents are interacting when
there are no faults and when there are faults.
At least two issues need to be addressed consistently by the agents in
the agent community. One is when agents keep their results in-memory,
and when they persistently store intermediate results. In the
situations where agents operates primarily in-memory, then both an
agent that generates a result and the agent that uses that result as
input must be up at the time the result is transmitted. Also, it makes
sense to consider various wait-and-retry strategies. These issues do
not impact more distributed systems that make the intermediate results
persistent, e.g. by posting them to a virtual warehouse that other
agents can access.
The second issue is that there can be `transactional groups' of
activities. If there is no semantic requirement that a set of
operations is atomic, then it is relatively easy to design a model in
which the task can be restarted to pick up where it left off by
looking at the warehouse contents. Conversely, if there are
transactional requirements (e.g. several groups of inserts must all be
successfully completed) then the agents' control model will also
require transaction management, if only to deal with a situation where
it goes down in the middle of a transaction.
Exposed: Users must be exposed at an appropriate level to what
functionality is currently available to them, both before, during and
after the execution of their tasks.
With this, there are also two issues that need to be addressed. One is
that the users may be unaware of the exact nature of the currently-
and potentially- available agents in the agent community, and may need
this before he can specify tasks to the system in a meaningful
way. This awareness is crucial, especially in a dynamic agent system,
to keep the users from under-specifying or inappropriately specifying
their requests. Also, it serves as a good venue for notifying the user
of meaningful "improvements" to the system.
The second issue is that the user may receive a response concerning
some task, but that response may make no sense to them. Dealing with
this in an agent system is complicated because the system itself has
managed the whens, wheres and hows of the actual processing of the
request, leaving the user in the dark. If the agent system can explain
what happened, this facilitates both the ability of the users to make
intelligent use of the system, and the users' comfort level with the
system.
In light of the previous discussion, we propose the following research
questions:
Eclectic: What are the best methods for planning and/or
process enactment?
What are good paradigms for describing and matching information
processing services?
What impact does the need to fit agents together have on agent
communication languages and conversations?
Ergonomic: How do you handle issues involved with
longer-running tasks that may outlive any specific agent involved in
the task? How do you deal gracefully with failure? What types of
transaction/recovery paradigms work in this environment?
Under what circumstances is it best to communicate intermediate
results directly between agents, as opposed to making the results
persistent?
Exposed: At what level is providing the user an
understanding of the agent system behavior helpful, and how is this
information best presented to the user.
What information needs to be maintained during task execution to
provide the user with adequate explanation of what happened?
[1] M.Nodine et al, Active information gathering in InfoSleuth,
International Journal of Cooperative Information Systems 9(1/2):3-28, 2000.