Note: This document is relevant for the version of Twisted that were current previous to IPC10. Even at the time of its release, there were errata issued to make it current. It is remaining unaltered for historical purposes but it is no longer accurate.
Twisted is a framework for writing asynchronous,
event-driven networked programs in Python -- both clients and
servers. In addition to abstractions for low-level system calls
like select(2)
and socket(2)
, it also
includes a large number of utility functions and classes, which
make writing new servers easy. Twisted includes support for
popular network protocols like HTTP and SMTP, support for GUI
frameworks like GTK+
/GNOME
and
Tk
and many other classes designed to make network
programs easy. Whenever possible, Twisted uses Python's
introspection facilities to save the client programmer as much
work as possible. Even though Twisted is still work in
progress, it is already usable for production systems -- it can
be used to bring up a Web server, a mail server or an IRC
server in a matter of minutes, and require almost no
configuration.
Keywords: internet, network, framework, event-based, asynchronous
Python lends itself to writing frameworks. Python has a simple class model, which facilitates inheritance. It has dynamic typing, which means code needs to assume less. Python also has built-in memory management, which means application code does not need to track ownership. Thus, when writing a new application, a programmer often finds himself writing a framework to make writing this kind of application easier. Twisted evolved from the need to write high-performance interoperable servers in Python, and making them easy to use (and difficult to use incorrectly).
There are three ways to write network programs:
When dealing with many connections in one thread, the scheduling is the responsibility of the application, not the operating system, and is usually implemented by calling a registered function when each connection is ready to for reading or writing -- commonly known as event-driven, or callback-based, programming.
Since multi-threaded programming is often tricky, even with
high level abstractions, and since forking Python processes has
many disadvantages, like Python's reference counting not
playing well with copy-on-write and problems with shared state,
it was felt the best option was an event-driven framework. A
benefit of such approach is that by letting other event-driven
frameworks take over the main loop, server and client code are
essentially the same - making peer-to-peer a reality. While
Twisted includes its own event loop, Twisted can already
interoperate with GTK+
's and Tk
's
mainloops, as well as provide an emulation of event-based I/O
for Jython (specific support for the Swing toolkit is planned).
Client code is never aware of the loop it is running under, as
long as it is using Twisted's interface for registering for
interesting events.
Some examples of programs which were written using the
Twisted framework are twisted.web
(a web server),
twisted.mail
(a mail server, supporting both SMTP
and POP3, as well as relaying), twisted.words
(a
chat application supporting integration between a variety of IM
protocols, like IRC, AOL Instant Messenger's TOC and
Perspective Broker, a remote-object protocol native to
Twisted), im
(an instant messenger which connects
to twisted.words) and faucet
(a GUI client for the
twisted.reality
interactive-fiction framework).
Twisted can be useful for any network or GUI application
written in Python.
However, event-driven programming still contains some tricky aspects. As each callback must be finished as soon as possible, it is not possible to keep persistent state in function-local variables. In addition, some programming techniques, such as recursion, are impossible to use. Event-driven programming has a reputation of being hard to use due to the frequent need to write state machines. Twisted was built with the assumption that with the right library, event-driven programming is easier then multi-threaded programming. Twisted aims to be that library.
Twisted includes both high-level and low-level support for protocols. Most protocol implementation by twisted are in a package which tries to implement "mechanisms, not policy". On top of those implementations, Twisted includes usable implementations of those protocols: for example, connecting the abstract HTTP protocol handler to a concrete resource-tree, or connecting the abstract mail protocol handler to deliver mail to maildirs according to domains. Twisted tries to come with as much functionality as possible out of the box, while not constraining a programmer to a choice between using a possibly-inappropriate class and rewriting the non-interesting parts himself.
Twisted also includes Perspective Broker, a simple remote-object framework, which allows Twisted servers to be divided into separate processes as the end deployer (rather then the original programmer) finds most convenient. This allows, for example, Twisted web servers to pass requests for specific URLs with co-operating servers so permissions are granted according to the need of the specific application, instead of being forced into giving all the applications all permissions. The co-operation is truly symmetrical, although typical deployments (such as the one which the Twisted web site itself uses) use a master/slave relationship.
Twisted is not alone in the niche of a Python network framework. One of the better known frameworks is Medusa. Medusa is used, among other things, as Zope's native server serving HTTP, FTP and other protocols. However, Medusa is no longer under active development, and the Twisted development team had a number of goals which would necessitate a rewrite of large portions of Medusa. Twisted seperates protocols from the underlying transport layer. This seperation has the advantages of resuability (for example, using the same clients and servers over SSL) and testability (because it is easy to test the protocol with a much lighter test harness) among others. Twisted also has a very flexible main-loop which can interoperate with third-party main-loops, making it usable in GUI programs too.
Python comes out of the box with "batteries included".
However, it seems that many Python projects rewrite some basic
parts: logging to files, parsing options and high level
interfaces to reflection. When the Twisted project found itself
rewriting those, it moved them into a separate subpackage,
which does not depend on the rest of the twisted framework.
Hopefully, people will use twisted.python
more and
solve interesting problems instead. Indeed, it is one of
Twisted's goals to serve as a repository for useful Python
code.
One useful module is twisted.python.reflect
,
which has methods like prefixedMethods
, which
returns all methods with a specific prefix. Even though some
modules in Python itself implement such functionality (notably,
urllib2
), they do not expose it as a function
usable by outside code. Another useful module is
twisted.python.hook
, which can add pre-hooks and
post-hooks to methods in classes.
# Add all method names beginning with opt_ to the given # dictionary. This cannot be done with dir(), since # it does not search in superclasses dct = {} reflect.addMethodNamesToDict(self.__class__, dct, "opt_") # Sum up all lists, in the given class and superclasses, # which have a given name. This gives us "different class # semantics": attributes do not override, but rather append flags = [] reflect.accumulateClassList(self.__class__, 'optFlags', flags) # Add lock-acquire and lock-release to all methods which # are not multi-thread safe for methodName in klass.synchronized: hook.addPre(klass, methodName, _synchPre) hook.addPost(klass, methodName, _synchPost)Listing 1: Using
twisted.python.reflect
andtwisted.python.hook
The twisted.python
subpackage also contains a
high-level interface to getopt which supplies as much power as
plain getopt while avoiding long
if
/elif
chains and making many common
cases easier to use. It uses the reflection interfaces in
twisted.python.reflect
to find which options the
class is interested in, and constructs the argument to
getopt
. Since in the common case options' values
are just saved in instance attributes, it is very easy to
indicate interest in such options. However, for the cases
custom code needs to be run for an option (for example,
counting how many -v
options were given to
indicate verbosity level), it will call a method which is named
correctly.
class ServerOptions(usage.Options): # Those are (short and long) options which # have no argument. The corresponding attribute # will be true iff this option was given optFlags = [['nodaemon','n'], ['profile','p'], ['threaded','t'], ['quiet','q'], ['no_save','o']] # This are options which require an argument # The default is used if no such option was given # Note: since options can only have string arguments, # putting a non-string here is a reliable way to detect # whether the option was given optStrings = [['logfile','l',None], ['file','f','twistd.tap'], ['python','y',''], ['pidfile','','twistd.pid'], ['rundir','d','.']] # For methods which can be called multiple times # or have other unusual semantics, a method will be called # Twisted assumes that the option needs an argument if and only if # the method is defined to accept an argument. def opt_plugin(self, pkgname): pkg = __import__(pkgname) self.python = os.path.join(os.path.dirname( os.path.abspath(pkg.__file__)), 'config.tac') # Most long options based on methods are aliased to short # options. If there is only one letter, Twisted knows it is a short # option, so it is "-g", not "--g" opt_g = opt_plugin try: config = ServerOptions() config.parseOptions() except usage.error, ue: print "%s: %s" % (sys.argv[0], ue) sys.exit(1)Listing 2:
twistd
's Usage Code
Unlike getopt
, Twisted has a useful abstraction
for the non-option arguments: they are passed as arguments to
the parsedArgs
method. This means too many
arguments, or too few, will cause a usage error, which will be
flagged. If an unknown number of arguments is desired,
explicitly using a tuple catch-all argument will work.
The formats of configuration files have shown two visible trends over the years. On the one hand, more and more programmability has been added, until sometimes they become a new language. The extreme end of this trend is using a regular programming language, such as Python, as the configuration language. On the other hand, some configuration files became more and more machine editable, until they become a miniature database formates. The extreme end of that trend is using a generic database tool.
Both trends stem from the same rationale -- the need to use a powerful general purpose tool instead of hacking domain specific languages. Domain specific languages are usually ad-hoc and not well designed, having neither the power of general purpose languages nor the predictable machine editable format of generic databases.
Twisted combines these two trends. It can read the configuration either from a Python file, or from a pickled file. To some degree, it integrates the approaches by auto-pickling state on shutdown, so the configuration files can migrate from Python into pickles. Currently, there is no way to go back from pickles to equivalent Python source, although it is planned for the future. As a proof of concept, the RPG framework Twisted Reality already has facilities for creating Python source which evaluates into a given Python object.
from twisted.internet import main from twisted.web import proxy, server site = server.Site(proxy.ReverseProxyResource('www.yahoo.com', 80, '/')) application = main.Application('web-proxy') application.listenOn(8080, site)Listing 3: The configuration file for a reverse web proxy
Twisted's main program, twistd
, can receive
either a pickled twisted.internet.main.Application
or a Python file which defines a variable called
application
. The application can be saved at any
time by calling its save
method, which can take an
optional argument to save to a different file name. It would be
fairly easy, for example, to have a Twisted server which saves
the application every few seconds to a file whose name depends
on the time. Usually, however, one settles for the default
behavior which saves to a shutdown
file. Then, if
the shutdown configuration proves suitable, the regular pickle
is replaced by the shutdown file. Hence, on the fly
configuration changes, regardless of complexity, can always
persist.
There are several client/server protocols which let a
suitably privileged user to access to application variable and
change it on the fly. The first, and least common denominator,
is telnet. The administrator can telnet into twisted, and issue
Python statements to her heart's content. For example, one can
add ports to listen on to the application, reconfigure the web
servers and various other ways by simple accessing
__main__.application
. Some proof of concepts for a
simple suite of command-line utilities to control a Twisted
application were written, including commands which allow an
administrator to shut down the server or save the current state
to a tap file. These are especially useful on Microsoft
Windows(tm) platforms, where the normal UNIX way of
communicating shutdown requests via signals are less
reliable.
If reconfiguration on the fly is not necessary, Python
itself can be used as the configuration editor. Loading the
application is as simple as unpickling it, and saving it is
done by calling its save
method. It is quite easy
to add more services or change existing ones from the Python
interactive mode.
A more sophisticated way to reconfigure the application on
the fly is via the manhole service. Manhole is a client/server
protocol based on top of Perspective Broker, Twisted's
translucent remote-object protocol which will be covered later.
Manhole has a graphical client called gtkmanhole
which can access the server and change its state. Since Twisted
is modular, it is possible to write more services for user
friendly configuration. For example, through-the-web
configuration is planned for several services, notably
mail.
For cases where a third party wants to distribute both the
code for a server and a ready to run configuration file, there
is the plugin configuration. Philosophically similar to the
--python
option to twistd
, it
simplifies the distribution process. A plugin is an archive
which is ready to be unpacked into the Python module path. In
order to keep a clean tree, twistd
extends the
module path with some Twisted-specific paths, like the
directory TwistedPlugins
in the user's home
directory. When a plugin is unpacked, it should be a Python
package which includes, alongside __init__.py
a
file named config.tac
. This file should define a
variable named application
, in a similar way to
files loaded with --python
. The plugin way of
distributing configurations is meant to reduce the temptation
to put large amount of codes inside the configuration file
itself.
Putting class and function definition inside the configuration files would make the persistent servers which are auto-generated on shutdown useless, since they would not have access to the classes and functions defined inside the configuration file. Thus, the plugin method is intended so classes and functions can still be in regular, importable, Python modules, but still allow third parties distribute powerful configurations. Plugins are used by some of the Twisted Reality virtual worlds.
Port
is the Twisted class which represents a
socket listening on a port. Currently, twisted supports both
internet and unix-domain sockets, and there are SSL classes
with identical interface. A Port
is only
responsible for handling the transfer layer. It calls
accept
on the socket, checks that it actually
wants to deal with the connection and asks its factory for a
protocol. The factory is usually a subclass of
twisted.protocols.protocol.Factory
, and its most
important method is buildProtocol
. This should
return something that adheres to the protocol interface, and is
usually a subclass of
twisted.protocols.protocol.Protocol
.
from twisted.protocols import protocol from twisted.internet import main, tcp class Echo(protocol.Protocol): def dataReceived(self, data): self.transport.write(data) factory = protocol.Factory() factory.protocol = Echo port = tcp.Port(8000, factory) app = main.Application("echo") app.addPort(port) app.run()Listing 4: A Simple Twisted Application
The factory is responsible for two tasks: creating new protocols, and keeping global configuration and state. Since the factory builds the new protocols, it usually makes sure the protocols have a reference to it. This allows protocols to access, and change, the configuration. Keeping state information in the factory is the primary reason for keeping an abstraction layer between ports and protocols. Examples of configuration information is the root directory of a web server or the user database of a telnet server. Note that it is possible to use the same factory in two different Ports. This can be used to run the same server bound to several different addresses but not to all of them, or to run the same server on a TCP socket and a UNIX domain sockets.
A protocol begins and ends its life with
connectionMade
and connectionLost
;
both are called with no arguments. connectionMade
is called when a connection is first established. By then, the
protocol has a transport
attribute. The
transport
attribute is a Transport
-
it supports write
and loseConnection
.
Both these methods never block: write
actually
buffers data which will be written only when the transport is
signalled ready to for writing, and loseConnection
marks the transport for closing as soon as there is no buffered
data. Note that transports do not have a
read
method: data arrives when it arrives, and the
protocol must be ready for its dataReceived
method, or its connectionLost
method, to be
called. The transport also supports a getPeer
method, which returns parameters about the other side of the
transport. For TCP sockets, this includes the remote IP and
port.
# A tcp port-forwarder # A StupidProtocol sends all data it gets to its peer. # A StupidProtocolServer connects to the host/port, # and initializes the client connection to be its peer # and itself to be the client's peer from twisted.protocols import protocol class StupidProtocol(protocol.Protocol): def connectionLost(self): self.peer.loseConnection();del self.peer def dataReceived(self, data): self.peer.write(data) class StupidProtocolServer(StupidProtocol): def connectionMade(self): clientProtocol = StupidProtocol() clientProtocol.peer = self.transport self.peer = tcp.Client(self.factory.host, self.factory.port, clientProtocol) # Create a factory which creates StupidProtocolServers, and # has the configuration information they assume def makeStupidFactory(host, port): factory = protocol.Factory() factory.host, factory.port = host, port factory.protocol = StupidProtocolServer return factoryListing 5: TCP forwarder code
While Twisted has the ability to let other event loops take
over for integration with GUI toolkits, it usually uses its own
event loop. The event loop code uses global variables to
maintain interested readers and writers, and uses Python's
select()
function, which can accept any object
which has a fileno()
method, not only raw file
descriptors. Objects can use the event loop interface to
indicate interest in either reading to or writing from a given
file descriptor. In addition, for those cases where time-based
events are needed (for example, queue flushing or periodic POP3
downloads), Twisted has a mechanism for repeating events at
known delays. While far from being real-time, this is enough
for most programs' needs.
Unfortunately, handling arbitrary data chunks is a hard way
to code a server. This is why twisted has many classes sitting
in submodules of the twisted.protocols package which give
higher level interface to the data. For line oriented
protocols, LineReceiver
translates the low-level
dataReceived
events into lineReceived
events. However, the first naive implementation of
LineReceiver
proved to be too simple. Protocols
like HTTP/1.1 or Freenet have packets which begin with header
lines that include length information, and then byte streams.
LineReceiver
was rewritten to have a simple
interface for switching at the protocol layer between
line-oriented parts and byte-stream parts.
Another format which is gathering popularity is Dan J.
Bernstein's netstring format. This format keeps ASCII text as
ASCII, but allows arbitrary bytes (including nulls and
newlines) to be passed freely. However, netstrings were never
designed to be used in event-based protocols where over-reading
is unavoidable. Twisted makes sure no user will have to deal
with the subtle problems handling netstrings in event-driven
programs by providing NetstringReceiver
.
For even higher levels, there are the protocol-specific protocol classes. These translate low-level chunks into high-level events such as "HTTP request received" (for web servers), "approve destination address" (for mail servers) or "get user information" (for finger servers). Many RFCs have been thus implemented for Twisted (at latest count, more then 12 RFCs have been implemented). One of Twisted's goals is to be a repository of event-driven implementations for various protocols in Python.
class DomainSMTP(SMTP): def validateTo(self, helo, destination): try: user, domain = string.split(destination, '@', 1) except ValueError: return 0 if domain not in self.factory.domains: return 0 if not self.factory.domains[domain].exists(user, domain, self): return 0 return 1 def handleMessage(self, helo, origin, recipients, message): # No need to check for existence -- only recipients which # we approved at the validateTo stage are passed here for recipient in recipients: user, domain = string.split(recipient, '@', 1) self.factory.domains[domain].saveMessage(origin, user, message, domain)Listing 6: Implementation of virtual domains using the SMTP protocol class
Copious documentation on writing new protocol abstraction exists, since this is the largest amount of code written -- much like most operating system code is device drivers. Since many different protocols have already been implemented, there are also plenty of examples to draw on. Usually implementing the client-side of a protocol is particularly challenging, since protocol designers tend to assume much more state kept on the client side of a connection then on the server side.
twisted.tap
Package and
mktap
Since one of Twisted's configuration formats are pickles,
which are tricky to edit by hand, Twisted evolved a framework
for creating such pickles. This framework is contained in the
twisted.tap
package and the mktap
script. New servers, or new ways to configure existing servers,
can easily participate in the twisted.tap framework by creating
a twisted.tap
submodule.
All twisted.tap
submodules must conform to a
rigid interface. The interface defines functions to accept the
command line parameters, and functions to take the processed
command line parameters and add servers to
twisted.main.internet.Application
. Existing
twisted.tap
submodules use
twisted.python.usage
, so the command line format
is consistent between different modules.
The mktap
utility gets some generic options,
and then the name of the server to build. It imports a
same-named twisted.tap
submodule, and lets it
process the rest of the options and parameters. This makes sure
that the process configuring the main.Application
is agnostic for where it is used. This allowed
mktap
to grow the --append
option,
which appends to an existing pickle rather then creating a new
one. This option is frequently used to post-add a telnet server
to an application, for net-based on the fly configuration
later.
When running mktap
under UNIX, it saves the
user id and group id inside the tap. Then, when feeding this
tap into twistd
, it changes to this user/group id
after binding the ports. Such a feature is necessary in any
production-grade server, since ports below 1024 require root
privileges to use on UNIX -- but applications should not run as
root. In case changing to the specified user causes difficulty
in the build environment, it is also possible to give those
arguments to mktap
explicitly.
from twisted.internet import tcp, stupidproxy from twisted.python import usage usage_message = """ usage: mktap stupid [OPTIONS] Options are as follows: --port <#>, -p: set the port number to <#>. --host <host>, -h: set the host to <host> --dest_port <#>, -d: set the destination port to <#> """ class Options(usage.Options): optStrings = [["port", "p", 6666], ["host", "h", "localhost"], ["dest_port", "d", 6665]] def getPorts(app, config): s = stupidproxy.makeStupidFactory(config.host, int(config.dest_port)) return [(int(config.port), s)]Listing 7:
twisted.tap.stupid
The twisted.tap
framework is one of the reasons
servers can be set up with little knowledge and time. Simply
running mktap
with arguments can bring up a web
server, a mail server or an integrated chat server -- with
hardly any need for maintainance. As a working
proof-on-concept, the tap2deb
utility exists to
wrap up tap files in Debian packages, which include scripts for
running and stopping the server and interact with
init(8)
to make sure servers are automatically run
on start-up. Such programs can also be written to interface
with the Red Hat Package Manager or the FreeBSD package
management systems.
% mktap --uid 33 --gid 33 web --static /var/www --port 80 % tap2deb -t web.tap -m 'Moshe Zadka <moshez@debian.org>' % su password: # dpkg -i .build/twisted-web_1.0_all.debListing 8: Bringing up a web server on a Debian system
Sometimes, threads are unavoidable or hard to avoid. Many
legacy programs which use threads want to use Twisted, and some
vendor APIs have no non-blocking version -- for example, most
database systems' API. Twisted can work with threads, although
it supports only one thread in which the main select loop is
running. It can use other threads to simulate non-blocking API
over a blocking API -- it spawns a thread to call the blocking
API, and when it returns, the thread calls a callback in the
main thread. Threads can call callbacks in the main thread
safely by adding those callbacks to a list of pending events.
When the main thread is between select calls, it searches
through the list of pending events, and executes them. This is
used in the twisted.enterprise
package to supply
an event driven interfaces to databases, which uses Python's DB
API.
Twisted tries to optimize for the common case -- no threads.
If there is need for threads, a special call must be made to
inform the twisted.python.threadable
module that
threads will be used. This module is implemented differently
depending on whether threads will be used or not. The decision
must be made before importing any modules which use threadable,
and so is usually done in the main application. For example,
twistd
has a command line option to initialize
threads.
Twisted also supplies a module which supports a threadpool, so the common task of implementing non-blocking APIs above blocking APIs will be both easy and efficient. Threads are kept in a pool, and dispatch requests are done by threads which are not working. The pool supports a maximum amount of threads, and will throw exceptions when there are more requests than allowable threads.
One of the difficulties about multi-threaded systems is
using locks to avoid race conditions. Twisted uses a mechanism
similar to Java's synchronized methods. A class can declare a
list of methods which cannot safely be called at the same time
from two different threads. A function in threadable then uses
twisted.python.hook
to transparently add
lock/unlock around these methods. This allows Twisted classes
to be written without thought about threading, except for one
localized declaration which does not entail any performance
penalty for the single-threaded case.
Mail servers have a history of security flaws. Sendmail is by now the poster boy of security holes, but no mail servers, bar maybe qmail, are free of them. Like Dan Bernstein of qmail fame said, mail cannot be simply turned off -- even the simplest organization needs a mail server. Since Twisted is written in a high-level language, many problems which plague other mail servers, notably buffer overflows, simply do not exist. Other holes are avoidable with correct design. Twisted Mail is a project trying to see if it is possible to write a high quality high performance mail server entirely in Python.
Twisted Mail is built on the SMTP server and client protocol classes. While these present a level of abstraction from the specific SMTP line semantics, they do not contain any message storage code. The SMTP server class does know how to divide responsibility between domains. When a message arrives, it analyzes the recipient's address, tries matching it with one of the registered domain, and then passes validation of the address and saving the message to the correct domain, or refuses to handle the message if it cannot handle the domain. It is possible to specify a catch-all domain, which will usually be responsible for relaying mails outwards.
While correct relaying is planned for the future, at the moment we have only so-called "smarthost" relaying. All e-mail not recognized by a local domain is relayed to a single outside upstream server, which is supposed to relay the mail further. This is the configuration for most home machines, which are Twisted Mail's current target audience.
Since the people involved in Twisted's development were reluctant to run code that runs as a super user, or with any special privileges, it had to be considered how delivery of mail to users is possible. The solution decided upon was to have Twisted deliver to its own directory, which should have very strict permissions, and have users pull the mail using some remote mail access protocol like POP3. This means only a user would write to his own mail box, so no security holes in Twisted would be able to adversely affect a user.
Future plans are to use a Perspective Broker-based service to hand mail to users to a personal server using a UNIX domain socket, as well as to add some more conventional delivery methods, as scary as they may be.
Because the default configuration of Twisted Mail is to be an integrated POP3/SMTP servers, it is ideally suited for the so-called POP toaster configuration, where there are a multitude of virtual users and domains, all using the same IP address and computer to send and receive mails. It is fairly easy to configure Twisted as a POP toaster. There are a number of deployment choices: one can append a telnet server to the tap for remote configuration, or simple scripts can add and remove users from the user database. The user database is saved as a directory, where file names are keys and file contents are values, so concurrency is not usually a problem.
% mktap mail -d foobar.com=$HOME/Maildir/ -u postmaster=secret -b \ -p 110 -s 25 % twistd -f mail.tapBringing up a simple mail-server
Twisted's native mail storage format is Maildir, a format
that requires no locking and is safe and atomic. Twisted
supports a number of standardized extensions to Maildir,
commonly known as Maildir++. Most importantly, it supports
deletion as simply moving to a subfolder named
Trash
, so mail is recoverable if accessed through
a protocol which allows multiple folders, like IMAP. However,
Twisted itself currently does not support any such protocol
yet.
Twisted was originally designed to support multi-player games; a simulated "real world" environment. Experience with game systems of that type is enlightening as to the nature of computing on the whole. Almost all services on a computer are modeled after some simulated real-world activity. For example, e-"mail", or "document publishing" on the web. Even "object-oriented" programming is based around the notion that data structures in a computer simulate some analogous real-world objects.
All such networked simulations have a few things in common. They each represent a service provided by software, and there is usually some object where "global" state is kept. Such a service must provide an authentication mechanism. Often, there is a representation of the authenticated user within the context of the simulation, and there are also objects aside from the user and the simulation itself that can be accessed.
For most existing protocols, Twisted provides these
abstractions through twisted.internet.passport
.
This is so named because the most important common
functionality it provides is authentication. A simulation
"world" as described above -- such as an e-mail system,
document publishing archive, or online video game -- is
represented by subclass of Service
, the
authentication mechanism by an Authorizer
(which
is a set of Identities
), and the user of the
simulation by a Perspective
. Other objects in the
simulation may be represented by arbitrary python objects,
depending upon the implementation of the given protocol.
New problem domains, however, often require new protocols, and re-implementing these abstractions each time can be tedious, especially when it's not necessary. Many efforts have been made in recent years to create generic "remote object" or "remote procedure call" protocols, but in developing Twisted, these protocols were found to require too much overhead in development, be too inefficient at runtime, or both.
Perspective Broker is a new remote-object protocol designed
to be lightweight and impose minimal constraints upon the
development process and use Python's dynamic nature to good
effect, but still relatively efficient in terms of bandwidth
and CPU utilization. twisted.spread.pb
serves as a
reference implementation of the protocol, but implementation of
Perspective Broker in other languages is already underway.
spread
is the twisted
subpackage
dealing with remote calls and objects, and has nothing to do
with the spread
toolkit.
Perspective Broker extends
twisted.internet.passport
's abstractions to be
concrete objects rather than design patterns. Rather than
having a Protocol
implementation translate between
sequences of bytes and specifically named methods (as in the
other Twisted Protocols
), Perspective Broker
defines a direct mapping between network messages and
quasi-arbitrary method calls.
In a server application where a large number of clients may
be interacting at once, it is not feasible to have an
arbitrarily large number of OS threads blocking and waiting for
remote method calls to return. Additionally, the ability for
any client to call any method of an object would present a
significant security risk. Therefore, rather than attempting to
provide a transparent interface to remote objects,
twisted.spread.pb
is "translucent", meaning that
while remote method calls have different semantics than local
ones, the similarities in semantics are mirrored by
similarities in the syntax. Remote method calls impose as
little overhead as possible in terms of volume of code, but "as
little as possible" is unfortunately not "nothing".
twisted.spread.pb
defines a method naming
standard for each type of remotely accessible object. For
example, if a client requests a method call with an expression
such as myPerspective.doThisAction()
, the remote
version of myPerspective
would be sent the message
perspective_doThisAction
. Depending on the manner
in which an object is accessed, other method prefixes may be
observe_
, view_
, or
remote_
. Any method present on a remotely
accessible object, and named appropriately, is considered to be
published -- since this is accomplished with
getattr
, the definition of "present" is not just
limited to methods defined on the class, but instances may have
arbitrary callable objects associated with them as long as the
name is correct -- similarly to normal python objects.
Remote method calls are made on remote reference objects
(instances of pb.RemoteReference
) by calling a
method with an appropriate name. However, that call will not
block -- if you need the result from a remote method call, you
pass in one of the two special keyword arguments to that method
-- pbcallback
or pberrback
.
pbcallback
is a callable object which will be
called when the result is available, and pberrback
is a callable object which will be called if there was an
exception thrown either in transmission of the call or on the
remote side.
In the case that neither pberrback
or
pbcallback
is provided,
twisted.spread.pb
will optimize network usage by
not sending confirmations of messages.
# Server Side class MyObject(pb.Referenceable): def remote_doIt(self): return "did it" # Client Side ... def myCallback(result): print result # result will be 'did it' def myErrback(stacktrace): print 'oh no, mr. bill!' print stacktrace myRemoteReference.doIt(pbcallback=myCallback, pberrback=myErrback)Listing 9: A remotely accessible object and accompanying call
Considering the problem of remote object access in terms of a simulation demonstrates a requirement for the knowledge of an actor with certain actions or requests. Often, when processing message, it is useful to know who sent it, since different results may be required depending on the permissions or state of the caller.
A simple example is a game where certain an object is invisible, but players with the "Heightened Perception" enchantment can see it. When answering the question "What objects are here?" it is important for the room to know who is asking, to determine which objects they can see. Parallels to the differences between "administrators" and "users" on an average multi-user system are obvious.
Perspective Broker is named for the fact that it does not
broker only objects, but views of objects. As a user of the
twisted.spread.pb
module, it is quite easy to
determine the caller of a method. All you have to do is
subclass Viewable
.
Before any arguments sent by the client, the actor (specifically, the Perspective instance through which this object was retrieved) will be passed as the first argument to any# Server Side class Greeter(pb.Viewable): def view_greet(self, actor): return "Hello %s!\n" % actor.perspectiveName # Client Side ... remoteGreeter.greet(pbcallback=sys.stdout.write) ...Listing 10: An object responding to its calling perspective
view_xxx
methods.
In a simulation of any decent complexity, client and server will wish to share structured data. Perspective Broker provides a mechanism for both transferring (copying) and sharing (caching) that state.
Whenever an object is passed as an argument to or returned
from a remote method call, that object is serialized using
twisted.spread.jelly
; a serializer similar in some
ways to Python's native pickle
. Originally,
pickle
itself was going to be used, but there were
several security issues with the pickle
code as it
stands. It is on these issues of security that
pickle
and twisted.spread.jelly
part
ways.
While twisted.spread.jelly
handles a few basic
types such as strings, lists, dictionaries and numbers
automatically, all user-defined types must be registered both
for serialization and unserialization. This registration
process is necessary on the sending side in order to determine
if a particular object is shared, and whether it is shared as
state or behavior. On the receiving end, it's necessary to
prevent arbitrary code from being run when an object is
unserialized -- a significant security hole in
pickle
for networked applications.
On the sending side, the registration is accomplished by
making the object you want to serialize a subclass of one of
the "flavors" of object that are handled by Perspective Broker.
A class may be Referenceable
,
Viewable
, Copyable
or
Cacheable
. These four classes correspond to
different ways that the object will be seen remotely.
Serialization flavors are mutually exclusive -- these 4 classes
may not be mixed in with each other.
Referenceable
: The remote side will refer to
this object directly. Methods with the prefix
remote_
will be callable on it. No state will be
transferred.Viewable
: The remote side will refer to a
proxy for this object, which indicates what perspective
accessed this; as discussed above. Methods with the prefix
view_
will be callable on it, and have an
additional first argument inserted (the perspective that
called the method). No state will be transferred.Copyable
: Each time this object is
serialized, its state will be copied and sent. No methods are
remotely callable on it. By default, the state sent will be
the instance's __dict__
, but a method
getStateToCopyFor(perspective)
may be defined
which returns an arbitrary serializable object for
state.Cacheable
: The first time this object is
serialized, its state will be copied and sent. Each
subsequent time, however, a reference to the original object
will be sent to the receiver. No methods will be remotely
callable on this object. By default, again, the state sent
will be the instance's __dict__
but a method
getStateToCacheAndObserveFor(perspective,
observer)
may be defined to return alternative state.
Since the state for this object is only sent once, the
observer
argument is an object representative of
the receiver's representation of the Cacheable
after unserialization -- method calls to this object will be
resolved to methods prefixed with observe_
,
on the receiver's RemoteCache
of this
object. This may be used to keep the receiver's cache
up-to-date as relevant portions of the Cacheable
object change.The previous samples of code have shown how an individual object will interact over a previously-established PB connection. In order to get to that connection, you need to do some set-up work on both the client and server side; PB attempts to minimize this effort.
There are two different approaches for setting up a PB server, depending on your application's needs. In the simplest case, where your application does not deal with the abstractions above -- services, identities, and perspectives -- you can simply publish an object on a particular port.
from twisted.spread import pb from twisted.internet import main class Echoer(pb.Root): def remote_echo(self, st): print 'echoing:', st return st if __name__ == '__main__': app = main.Application("pbsimple") app.listenOn(8789, pb.BrokerFactory(Echoer())) app.run()Listing 11: Creating a simple PB server
Listing 11 shows how to publish a simple object which responds to a single message, "echo", and returns whatever argument is sent to it. There is very little to explain: the "Echoer" class is a pb.Root, which is a small subclass of Referenceable designed to be used for objects published by a BrokerFactory, so Echoer follows the same rule for remote access that Referenceable does. Connecting to this service is almost equally simple.
from twisted.spread import pb from twisted.internet import main def gotObject(object): print "got object:",object object.echo("hello network", pbcallback=gotEcho) def gotEcho(echo): print 'server echoed:',echo main.shutDown() def gotNoObject(reason): print "no object:",reason main.shutDown() pb.getObjectAt("localhost", 8789, gotObject, gotNoObject, 30) main.run()Listing 12: A client for Echoer objects.
The utility function pb.getObjectAt
retrieves
the root object from a hostname/port-number pair and makes a
callback (in this case, gotObject
) if it can
connect and retrieve the object reference successfully, and an
error callback (gotNoObject
) if it cannot connect
or the connection times out.
gotObject
receives the remote reference, and
sends the echo
message to it. This call is
visually noticeable as a remote method invocation by the
distinctive pbcallback
keyword argument. When the
result from that call is received, gotEcho
will be
called, notifying us that in fact, the server echoed our input
("hello network").
While this setup might be useful for certain simple types of applications where there is no notion of a "user", the additional complexity necessary for authentication and service segregation is worth it. In particular, re-use of server code for things like chat (twisted.words) is a lot easier with a unified notion of users and authentication.
from twisted.spread import pb from twisted.internet import main class SimplePerspective(pb.Perspective): def perspective_echo(self, text): print 'echoing',text return text class SimpleService(pb.Service): def getPerspectiveNamed(self, name): return SimplePerspective(name, self) if __name__ == '__main__': import pbecho app = main.Application("pbecho") pbecho.SimpleService("pbecho",app).getPerspectiveNamed("guest")\ .makeIdentity("guest") app.listenOn(pb.portno, pb.BrokerFactory(pb.AuthRoot(app))) app.save("start")Listing 13: A PB server using twisted's "passport" authentication.
In terms of the "functionality" it offers, this server is identical. It provides a method which will echo some simple object sent to it. However, this server provides it in a manner which will allow it to cooperate with multiple other authenticated services running on the same connection, because it uses the central Authorizer for the application.
On the line that creates the SimpleService
,
several things happen.
Application
instance.getPerspectiveNamed
method.SimplePerspective
has an
Identity
generated for it, and persistently
added to the Application
's
Authorizer
. The created identity will have the
same name as the perspective ("guest"), and the password
supplied (also, "guest"). It will also have a reference to
the service "pbecho" and a perspective named "guest", by
name. The Perspective.makeIdentity
utility
method prevents having to deal with the intricacies of the
passport Authorizer
system when one doesn't
require strongly separate Identity
s and
Perspective
s.Also, this server does not run itself, but instead persists to a file which can be run with twistd, offering all the usual amenities of daemonization, logging, etc. Once the server is run, connecting to it is similar to the previous example.
from twisted.spread import pb from twisted.internet import main def success(message): print "Message received:",message main.shutDown() def failure(error): print "Failure...",error main.shutDown() def connected(perspective): perspective.echo("hello world", pbcallback=success, pberrback=failure) print "connected." pb.connect(connected, failure, "localhost", pb.portno, "guest", "guest", "pbecho", "guest", 30) main.run()Listing 14: Connecting to an Authorized Service
This introduces a new utility -- pb.connect
.
This function takes a long list of arguments and manages the
handshaking and challenge/response aspects of connecting to a
PB service perspective, eventually calling back to indicate
either success or failure. In this particular example, we are
connecting to localhost on the default PB port (8787),
authenticating to the identity "guest" with the password
"guest", requesting the perspective "guest" from the service
"pbecho". If this can't be done within 30 seconds, the
connection will abort.
In these examples, I've attempted to show how Twisted makes
event-based scripting easier; this facilitates the ability to
run short scripts as part of a long-running process. However,
event-based programming is not natural to procedural scripts;
it is more generally accepted that GUI programs will be
event-driven whereas scripts will be blocking. An alternative
client to our SimpleService
using GTK illustrates
the seamless meshing of Twisted and GTK.
from twisted.internet import main, ingtkernet from twisted.spread.ui import gtkutil import gtk ingtkernet.install() class EchoClient: def __init__(self, echoer): l.hide() self.echoer = echoer w = gtk.GtkWindow(gtk.WINDOW_TOPLEVEL) vb = gtk.GtkVBox(); b = gtk.GtkButton("Echo:") self.entry = gtk.GtkEntry(); self.outry = gtk.GtkEntry() w.add(vb) map(vb.add, [b, self.entry, self.outry]) b.connect('clicked', self.clicked) w.connect('destroy', gtk.mainquit) w.show_all() def clicked(self, b): txt = self.entry.get_text() self.entry.set_text("") self.echoer.echo(txt, pbcallback=self.outry.set_text) l = gtkutil.Login(EchoClient, None, initialService="pbecho") l.show_all() gtk.mainloop()Listing 15: A Twisted GUI application
Although PB will be interesting to those people who wish to
write custom clients for their networked applications, many
prefer or require a web-based front end. Twisted's built-in web
server has been designed to accommodate this desire, and the
presentation framework that one would use to write such an
application is twisted.web.widgets
. Web.Widgets
has been designed to work in an event-based manner, without
adding overhead to the designer or the developer's
work-flow.
Surprisingly, asynchronous web interfaces fit very well into the normal uses of purpose-built web toolkits such as PHP. Any experienced PHP, Zope, or WebWare developer will tell you that separation of presentation, content, and logic is very important. In practice, this results in a "header" block of code which sets up various functions which are called throughout the page, some of which load blocks of content to display. While PHP does not enforce this, it is certainly idiomatic. Zope enforces it to a limited degree, although it still allows control structures and other programmatic elements in the body of the content.
In Web.Widgets, strict enforcement of this principle coincides very neatly with a "hands-free" event-based integration, where much of the work of declaring callbacks is implicit. A "Presentation" has a very simple structure for evaluating Python expressions and giving them a context to operate in. The "header" block which is common to many templating systems becomes a class, which represents an enumeration of events that the template may generate, each of which may be responded to either immediately or latently.
For the sake of simplicity, as well as maintaining
compatibility for potential document formats other than HTML,
Presentation widgets do not attempt to parse their template as
HTML tags. The structure of the template is "HTML Text
%%%%python_expression()%%%% more HTML Text"
. Every set
of 4 percent signs (%%%%) switches back and forth between
evaluation and printing.
No control structures are allowed in the template. This was originally thought to be a potentially major inconvenience, but with use of the Web.Widgets code to develop a few small sites, it has seemed trivial to encapsulate any table-formatting code within a method; especially since those methods can take string arguments if there's a need to customize the table's appearance.
The namespace for evaluating the template expressions is obtained by scanning the class hierarchy for attributes, and getting each of those attributes from the current instance. This means that all methods will be bound methods, so indicating "self" explicitly is not required. While it is possible to override the method for creating namespaces, using this default has the effect of associating all presentation code for a particular widget in one class, along with its template. If one is working with a non-programmer designer, and the template is in an external file, it is always very clear to the designer what functionality is available to them in any given scope, because there is a list of available methods for any given class.
A convenient event to register for would be a response from
the PB service that we just implemented. We can use the
Deferred
class in order to indicate to the widgets
framework that certain work has to be done later. This is a
Twisted convention which one can currently use in PB as well as
webwidgets; any framework which needs the ability to defer a
return value until later should use this facility. Elements of
the page will be rendered from top to bottom as data becomes
available, so the page will not be blocked on rendering until
all deferred elements have been completed.
from twisted.spread import pb from twisted.python import defer from twisted.web import widgets class EchoDisplay(widgets.Presentation): template = """<H1>Welcome to my widget, displaying %%%%echotext%%%%.</h1> <p>Here it is: %%%%getEchoPerspective()%%%%</p>""" echotext = 'hello web!' def getEchoPerspective(self): d = defer.Deferred() pb.connect(d.callback, d.errback, "localhost", pb.portno, "guest", "guest", "pbecho", "guest", 1) d.addCallbacks(self.makeListOf, self.formatTraceback) return ['<b>',d,'</b>'] def makeListOf(self, echoer): d = defer.Deferred() echoer.echo(self.echotext, pbcallback=d.callback, pberrback=d.errback) d.addCallbacks(widgets.listify, self.formatTraceback) return [d] if __name__ == "__main__": from twisted.web import server from twisted.internet import main a = main.Application("pbweb") gdgt = widgets.Gadget() gdgt.widgets['index'] = EchoDisplay() a.listenOn(8080, server.Site(gdgt)) a.run()Listing 16: an event-based web widget.
Each time a Deferred is returned as part of the page, the
page will pause rendering until the deferred's
callback
method is invoked. When that callback is
made, it is inserted at the point in the page where rendering
left off.
If necessary, there are options within web.widgets to allow a widget to postpone or cease rendering of the entire page -- for example, it is possible to write a FileDownload widget, which will override the rendering of the entire page and replace it with a file download.
The final goal of web.widgets is to provide a framework which encourages the development of usable library code. Too much web-based code is thrown away due to its particular environment requirements or stylistic preconceptions it carries with it. The goal is to combine the fast-and-loose iterative development cycle of PHP with the ease of installation and use of Zope's "Product" plugins.
It is unfortunately well beyond the scope of this paper to cover all the functionality that Twisted provides, but it serves as a good overview. It may seem as though twisted does anything and everything, but there are certain features we never plan to implement because they are simply outside the scope of the project.
Despite the multiple ways to publish and access objects, Twisted does not have or support an interface definition language. Some developers on the Twisted project have experience with remote object interfaces that require explicit specification of all datatypes during the design of an object's interface. We feel that such interfaces are in the spirit of statically-typed languages, and are therefore suited to the domain of problems where statically-typed languages excel. Twisted has no plans to implement a protocol schema or static type-checking mechanism, as the efficiency gained by such an approach would be quickly lost again by requiring the type conversion between Python's dynamic types and the protocol's static ones. Since one of the key advantages of Python is its extremely flexible dynamic type system, we felt that a dynamically typed approach to protocol design would share some of those advantages.
Twisted does not assume that all data is stored in a
relational database, or even an efficient object database.
Currently, Twisted's configuration state is all stored in
memory at run-time, and the persistent parts of it are pickled
at one go. There are no plans to move the configuration objects
into a "real" database, as we feel it is easier to keep a naive
form of persistence for the default case and let
application-specific persistence mechanisms handle persistence.
Consequently, there is no object-relational mapping in Twisted;
twisted.enterprise
is an interface to the
relational paradigm, not an object-oriented layer over it.
There are other things that Twisted will not do as well, but these have been frequently discussed as possibilities for it. The general rule of thumb is that if something will increase the required installation overhead, then Twisted will probably not do it. Optional additions that enhance integration with external systems are always welcome: for example, database drivers for Twisted or a CORBA IDL for PB objects.
Twisted is still a work in progress. The number of protocols in the world is infinite for all practical purposes, and it would be nice to have a central repository of event-based protocol implementations. Better integration with frameworks and operating systems is also a goal. Examples for integration opportunities are automatic creation of installer for "tap" files (for Red Hat Packager-based distributions, FreeBSD's package management system or Microsoft Windows(tm) installers), and integration with other event-dispatch mechanisms, such as win32's native message dispatch.
A still-nascent feature of Twisted, which this paper only
touches briefly upon, is twisted.enterprise
: it is
planned that Twisted will have first-class database support
some time in the near future. In particular, integration
between twisted.web and twisted.enterprise to allow developers
to have SQL conveniences that they are used to from other
frameworks.
Another direction that we hope Twisted will progress in is standardization and porting of PB as a messaging protocol. Some progress has already been made in that direction, with XEmacs integration nearly ready for release as of this writing.
Tighter integration of protocols is also a future goal, such an FTP server that can serve the same resources as a web server, or a web server that allows users to change their POP3 password. While Twisted is already a very tightly integrated framework, there is always room for more integration. Of course, all this should be done in a flexible way, so the end-user will choose which components to use -- and have those components work well together.
As shown, Twisted provides a lot of functionality to the
Python network programmer, while trying to be in his way as
little as possible. Twisted gives good tools for both someone
trying to implement a new protocol, or someone trying to use an
existing protocol. Twisted allows developers to prototype and
develop object communication models with PB, without designing
a byte-level protocol. Twisted tries to have an easy way to
record useful deployment options, via the
twisted.tap
and plugin mechanisms, while making it
easy to generate new forms of deployment. And last but not
least, even Twisted is written in a high-level language and
uses its dynamic facilities to give an easy API, it has
performance which is good enough for most situations -- for
example, the web server can easily saturate a T1 line serving
dynamic requests on low-end machines.
While still an active project, Twisted can already used for production programs. Twisted can be downloaded from the main Twisted site (http://www.twistedmatrix.com) where there is also documentation for using and programming Twisted.
We wish to thank Sean Riley, Allen Short, Chris Armstrong, Paul Swartz, Jürgen Hermann, Benjamin Bruheim, Travis B. Hartwell, and Itamar Shtull-Trauring for being a part of the Twisted development team with us.
Thanks also to Jason Asbahr, Tommi Virtanen, Gavin Cooper, Erno Kuusela, Nick Moffit, Jeremy Fincher, Jerry Hebert, Keith Zaback, Matthew Walker, and Dan Moniz, for providing insight, commentary, bandwidth, crazy ideas, and bug-fixes (in no particular order) to the Twisted team.