|
|
Here are some general suggestions/notes about improving Lisp
programming style, readability, correctness and efficiency. These are written by
Mark Kantrowitz and Barry
Margolin and come from the Lisp FAQ.
In addition, Hallvard Tretteberg's Lisp
Style Guide covers some of the same material.
There are also several books that cover Lisp programming style.
General Programming Style Rules
Often Misused Operators
Readability
Lisp Idioms
Documentation
Macros
File Modularization
Stylistic Preferences
Correctness and Efficiency Issues
The following operators often abused or misunderstood by novices.
Think twice before using any of these functions.
- MAPCAN is used with a function to return a variable number of
items to be included in an output list. When the function returns zero
or one items, the function serves as a filter. For example,
(mapcan #'(lambda (x) (when (and (numberp x) (evenp x)) (list x)))
'(1 2 3 4 x 5 y 6 z 7))
- Comment your code. Use three semicolons in the left margin before
the definition for major explanations. Use two semicolons that
float with the code to explain the routine that follows. Two
semicolons may also be used to explain the following line when the
comment is too long for the single semicolon treatment. Use
a single semicolon to the right of the code to explain a particular
line with a short comment. The number of semicolons used roughly
corresponds with the length of the comment. Put at least one blank
line before and after top-level expressions.
- Include documentation strings in your code. This lets users
get help while running your program without having to resort to
the source code or printed documentation.
- Never use a macro instead of a function for efficiency reasons.
Declaim the function as inline -- for example,
(DECLAIM (INLINE ..))
This is not a magic bullet -- be forewarned that inline
expansions can often increase the code size dramatically. INLINE
should be used only for short functions where the tradeoff is
likely to be worthwhile: inner loops, types that the compiler
might do something smart with, and so on.
- When defining a macro that provides an implicit PROGN, use the
&BODY lambda-list keyword instead of &REST.
- Use gensyms for bindings within a macro, unless the macro lets
the user explicitly specify the variable. For example:
(defmacro foo ((iter-var list) body-form
&body body)
(let ((result (gensym "RESULT")))
`(let ((,result nil))
(dolist (,iter-var ,list ,result)
(setq ,result ,body-form)
(when ,result
,@body)))))
This avoids errors caused by collisions during macro expansion
between variable names used in the macro definition and in the
supplied body.
- Use a DO- prefix in the name of a macro that does some kind of
iteration, WITH- when the macro establishes bindings, and
DEFINE- or DEF- when the macro creates some definitions. Don't
use the prefix MAP- in macro names, only in function names.
- Don't create a new iteration macro when an existing function
or macro will do.
- Don't define a macro where a function definition will work just
as well -- remember, you can FUNCALL or MAPCAR a function but
not a macro.
- The LOOP and SERIES macros generate efficient code. If you're
writing a new iteration macro, consider learning to use one
of them instead.
- If your program involves macros that are used in more than one
file, it is generally a good idea to put such macros in a separate
file that gets loaded before the other files. The same things applies
to primitive functions. If a macro is complicated, the code that
defines the macro should be put into a file by itself. In general, if
a set of definitions form a cohesive and "independent" whole, they
should be put in a file by themselves, and maybe even in their own
package. It isn't unusual for a large Lisp program to have files named
"site-dependent-code", "primitives.lisp", and "macros.lisp". If a file
contains primarily macros, put "-macros" in the name of the
file.
- Use (SETF (CAR ..) ..) and (SETF (CDR ..) ..) in preference to
RPLACA and RPLACD. Likewise (SETF (GET ..) ..) instead of PUT.
- Use INCF, DECF, PUSH and POP instead instead of the corresponding
SETF forms.
- Many programmers religiously avoid using CATCH, THROW, BLOCK,
PROG, GO and TAGBODY. Tags and go-forms should only be necessary
to create extremely unusual and complicated iteration constructs. In
almost every circumstance, a ready-made iteration construct or
recursive implementation is more appropriate.
- Don't use LET* where LET will do. Don't use LABELS where FLET
will do. Don't use DO* where DO will do.
- Don't use DO where DOTIMES or DOLIST will do.
- If you like using MAPCAR instead of DO/DOLIST, use MAPC when
no result is needed -- it's more efficient, since it doesn't
cons up a list. If a single cumulative value is required, use
REDUCE. If you are seeking a particular element, use FIND,
POSITION, or MEMBER.
- If using REMOVE and DELETE to filter a sequence, don't use the
:test-not keyword or the REMOVE-IF-NOT or DELETE-IF-NOT functions.
Use COMPLEMENT to complement the predicate and the REMOVE-IF
or DELETE-IF functions instead.
- Use complex numbers to represent points in a plane.
- Don't use lists where vectors are more appropriate. Accessing the
nth element of a vector is faster than finding the nth element
of a list, since the latter requires pointer chasing while the
former requires simple addition. Vectors also take up less space
than lists. Use adjustable vectors with fill-pointers to
implement a stack, instead of a list -- using a list continually
conses and then throws away the conses.
- When adding an entry to an association list, use ACONS, not
two calls to CONS. This makes it clear that you're using an alist.
- If your association list has more than about 10 entries in it,
consider using a hash table. Hash tables are often more efficient.
(See also [2-2].)
- When you don't need the full power of CLOS, consider using
structures instead. They are often faster, take up less space, and
easier to use.
- Use PRINT-UNREADABLE-OBJECT when writing a print-function.
- Use WITH-OPEN-FILE instead of OPEN and CLOSE.
- When a HANDLER-CASE clause is executed, the stack has already
unwound, so dynamic bindings that existed when the error
occured may no longer exist when the handler is run. Use
HANDLER-BIND if you need this.
- When using CASE and TYPECASE forms, if you intend for the form
to return NIL when all cases fail, include an explicit OTHERWISE
clause. If it would be an error to return NIL when all cases
fail, use ECASE, CCASE, ETYPECASE or CTYPECASE instead.
- Use local variables in preference to global variables whenever
possible. Do not use global variables in lieu of parameter passing.
Global variables can be used in the following circumstances:
- When one function needs to affect the operation of
another, but the second function isn't called by the first.
(For example, *load-pathname* and *break-on-warnings*.)
- When a called function needs to affect the current or future
operation of the caller, but it doesn't make sense to accomplish
this by returning multiple values.
- To provide hooks into the mechanisms of the program.
(For example, *evalhook*, *, /, and +.)
- Parameters which, when their value is changed, represent a
major change to the program.
(For example, *print-level* and *print-readably*.)
- For state that persists between invocations of the program.
Also, for state which is used by more than one major program.
(For example, *package*, *readtable*, *gensym-counter*.)
- To provide convenient information to the user.
(For example, *version* and *features*.)
- To provide customizable defaults.
(For example, *default-pathname-defaults*.)
- When a value affects major portions of a program, and passing
this value around would be extremely awkward. (The example
here is output and input streams for a program. Even when
the program passes the stream around as an argument, if you
want to redirect all output from the program to a different
stream, it is much easier to just rebind the global
variable.)
- Beginning students, especially ones accustomed to programming
in C, Pascal, or Fortran, tend to use global variables to hold or pass
information in their programs. This style is considered ugly by
experienced Lisp programmers. Although assignment statements can't
always be avoided in production code, good programmers take advantage
of Lisp's functional programming style before resorting to SETF and
SETQ. For example, they will nest function calls instead of using a
temporary variable and use the stack to pass multiple values. When
first learning to program in Lisp, try to avoid SETF/SETQ and their
cousins as much as possible. And if a temporary variable is necessary,
bind it to its first value in a LET statement, instead of letting it
become a global variable by default. (If you see lots of compiler
warnings about declaring variables to be special, you're probably
making this mistake. If you intend a variable to be global, it should
be defined with a DEFVAR or DEFPARAMETER statement, not left to the
compiler to fix.)
- In CLtL2, IN-PACKAGE does not evaluate its argument. Use defpackage
to define a package and declare the external (exported)
symbols from the package.
- The ARRAY-TOTAL-SIZE-LIMIT may be as small as 1024, and the
CALL-ARGUMENTS-LIMIT may be as small as 50.
- Novices often mistakenly quote the conditions of a CASE form.
For example, (case x ('a 3) ..) is incorrect. It would return
3 if x were the symbol QUOTE. Use (case x (a 3) ..) instead.
- Avoid using APPLY to flatten lists. Although
(apply #'append list-of-lists)
may look like a call with only two arguments, it becomes a
function call to APPEND, with the LIST-OF-LISTS spread into actual
arguments. As a result it will have as many arguments as there are
elements in LIST-OF-LISTS, and hence may run into problems with the
CALL-ARGUMENTS-LIMIT. Use REDUCE or MAPCAN instead:
(reduce #'append list-of-lists :from-end t)
(mapcan #'copy-list list-of-lists)
The second will often be more efficient (see note below about choosing
the right algorithm). Beware of calls like (apply f (mapcar ..)).
- NTH must cdr down the list to reach the elements you are
interested in. If you don't need the structural flexibility of
lists, try using vectors and the ELT function instead.
- CASE statements can be vectorized if the keys are consecutive
numbers. Such CASE statements can still have OTHERWISE clauses.
To take advantage of this without losing readability, use #. with
symbolic constants:
(eval-when (:compile-toplevel :load-toplevel
:execute)
(defconstant RED 1)
(defconstant GREEN 2)
(defconstant BLUE 3))
(case color
(#.RED ...)
(#.GREEN ...)
(#.BLUE ...)
...)
- Don't use quoted constants where you might later destructively
modify them. For example, instead of writing '(c d) in
(defun foo ()
(let ((var '(c d)))
...))
write (list 'c 'd) instead. Using a quote here can lead to
unexpected results later. If you later destructively modify the
value of var, this is self-modifying code! Some Lisp compilers
will complain about this, since they like to make constants
read-only. Modifying constants has undefined results in ANSI CL.
See also the answer to question [3-13].
Similarly, beware of shared list structure arising from the use
of backquote. Any sublist in a backquoted expression that doesn't
contain any commas can share with the original source structure.
- Don't proclaim unsafe optimizations, such as
(proclaim '(optimize (safety 0) (speed 3)
(space 1)))
since this yields a global effect. Instead, add the
optimizations as local declarations to small pieces of
well-tested, performance-critical code:
(defun well-tested-function ()
(declare (optimize (safety 0) (speed 3)
(space 1)))
...)
Such optimizations can remove run-time type-checking; type-checking
is necessary unless you've very carefully checked your code
and added all the appropriate type declarations.
- Some programmers feel that you shouldn't add declarations to
code until it is fully debugged, because incorrect
declarations can be an annoying source of errors. They recommend
using CHECK-TYPE liberally instead while you are developing the code.
On the other hand, if you add declarations to tell the
compiler what you think your code is doing, the compiler can
then tell you when your assumptions are incorrect.
Declarations also make it easier for another programmer to read
your code.
- Declaring the type of variables to be FIXNUM does not
necessarily mean that the results of arithmetic involving the
fixnums will be a fixnum; it could be a BIGNUM. For example,
(declare (type fixnum x y))
(setq z (+ (* x x) (* y y)))
could result in z being a BIGNUM. If you know the limits of your
numbers, use a declaration like
(declare (type (integer 0 100) x y))
instead, since most compilers can then do the appropriate type
inference, leading to much faster code.
- Don't change the compiler optimization with an OPTIMIZE
proclamation or declaration until the code is fully debugged
and profiled. When first writing code you should say
(declare (optimize (safety 3))) regardless of the speed setting.
- Depending on the optimization level of the compiler, type
declarations are interpreted either as (1) a guarantee from
you that the variable is always bound to values of that type,
or (2) a desire that the compiler check that the variable is
always bound to values of that type. Use CHECK-TYPE if (2) is
your intention.
- If you get warnings about unused variables, add IGNORE
declarations if appropriate or fix the problem. Letting such
warnings stand is a sloppy coding practice.
To produce efficient code,
- choose the right algorithm. For example, consider seven possible
implementations of COPY-LIST:
(defun copy-list (list)
(let ((result nil))
(dolist (item list result)
(setf result (append result (list item))))))
(defun copy-list (list)
(let ((result nil))
(dolist (item list (nreverse result))
(push item result))))
(defun copy-list (list)
(mapcar #'identity list))
(defun copy-list (list)
(let ((result (make-list (length list))))
(do ((original list (cdr original))
(new result (cdr new)))
((null original) result)
(setf (car new) (car original)))))
(defun copy-list (list)
(when list
(let* ((result (list (car list)))
(tail-ptr result))
(dolist (item (cdr list) result)
(setf (cdr tail-ptr) (list item))
(setf tail-ptr (cdr tail-ptr))))))
(defun copy-list (list)
(loop for item in list collect item))
(defun copy-list (list)
(if (consp list)
(cons (car list)
(copy-list (cdr list)))
list))
The first uses APPEND to tack the elements onto the end of the list.
Since APPEND must traverse the entire partial list at each step, this
yields a quadratic running time for the algorithm. The second
implementation improves on this by iterating down the list twice; once
to build up the list in reverse order, and the second time to reverse
it. The efficiency of the third depends on the Lisp implementation,
but it is usually similar to the second, as is the fourth. The fifth
algorithm, however, iterates down the list only once. It avoids the
extra work by keeping a pointer (reference) to the last cons of the
list and RPLACDing onto the end of that. Use of the fifth algorithm
may yield a speedup. Note that this contradicts the earlier dictum to
avoid destructive functions. To make more efficient code one might
selectively introduce destructive operations in critical sections of
code. Nevertheless, the fifth implementation may be less efficient in
Lisps with cdr-coding, since it is more expensive to RPLACD cdr-coded
lists. Depending on the implementation of nreverse, however,
the fifth and second implementations may be doing the same
amount of work. The sixth example uses the Loop macro, which usually
expands into code similar to the third. The seventh example copies
dotted lists, and runs in linear time, but isn't tail-recursive.
There is a long-running discussion of whether pushing items
onto a list and then applying NREVERSE to the result is faster or
slower than the alternatives. According to Richard C. Waters (Lisp
Pointers VI(4):27-34, October-December 1993), the NREVERSE strategy is
slightly faster in most Lisp implementations. But the speed difference
either way isn't much, so he argues that one should pursue the option
that yields the clearest and simplest code, namely using NREVERSE.
Here's code for a possible implementation of NREVERSE. As is
evident, most of the alternatives to using NREVERSE involve
essentially the same code, just reorganized.
(defun nreverse (list)
;; REVERSED is the partially reversed list,
;; CURRENT is the current cons cell, which will be
;; reused, and
;; REMAINING are the cons cells which have not yet
;; been reversed.
(do* ((reversed nil)
(current list remaining)
(remaining (cdr current) (cdr current)))
((null current)
reversed)
;; Reuse the cons cell at the head of the list:
;; reversed := ((car remaining) . reversed)
(setf (cdr current) reversed)
(setf reversed current)))
- use type declarations liberally in time-critical code, but
only if you are a seasoned Lisp programmer. Appropriate type
declarations help the compiler generate more specific and
optimized code. It also lets the reader know what assumptions
were made. For example, if you only use fixnum arithmetic,
adding declarations can lead to a significant speedup. If you
are a novice Lisp programmer, you should use type declarations
sparingly, as there may be no checking to see if the
declarations are correct, and optimized code can be harder to
debug. Wrong declarations can lead to errors in otherwise
correct code, and can limit the reuse of code in other
contexts. Depending on the Lisp compiler, it may also
be necessary to declare the type of results using THE, since
some compilers don't deduce the result type from the inputs.
- check the code produced by the compiler by using the
disassemble function
|