rc: hacking notes

input.c cleanup

With rc-1.7.1 released, I'm starting to look at wider issues. Top of my list is to improve the way rc interacts with line editing libraries. Partly this is because I want to support the BSD editline library. Partly I want to improve the support for readline, which I have a nagging feeling is still buggy in places. Partly I want to clean up some of the ugly code in input.c: there are only three appearances of #if READLINE || EDITLINE, which isn't too bad, but one of these surrounds part of an if / else clause in a way that triggers my "yeuch!" detectors.

As a first step, and as part of my longer term goal of improving the modularization of rc, I have created input.h, the interface file for the functions in input.c, now separate from rc.h. Thus, these functions are only known in those other source files which actually need them (which is a fairly disparate bunch). Time permitting, I will eventually remove almost everything from rc.h.

Now, as well as function interfaces, the new input.h defines a couple of global variables. One is rcrc, which is only used to suppress error messages when builtins.c::b_dot() is called from input.c::doit() to source the user's .rcrc file. Although it might seem like good code reuse to call b_dot() here, I'm unconvinced. It means that we have to fake up an argv array in doit(). And b_dot() does things like pushing an exception to save $*, and making various checks on whether we're interactive or not: all totally pointless in this case.

So now, doit() simply opens the .rcrc file directly (and calls itself to parse it). This, in my opinion, tidies up both doit() and b_dot() slightly, as well as removing the global variable. As a bonus, I changed the code so that only ENOENT is silently ignored. If, for example, the user has no read permission on their .rcrc, they now get an error message.

Anyone who can find a use for the newly available $* in .rcrc will receive the dubious distinction of it being detailed here!

On further reflection, this code doesn't belong in input.c anyway. Unless there's something subtle going on that I'm missing, it makes more sense in main.c. This also eliminates another global variable - dashell.

2003-09-00

rc-1.7.1 released

2003-07-17

history bug fixes

Such a small program; so many bugs.

Callum Gibson sent a patch to fix a core dump in his previous patch for "stuttering" colons (the core dump was triggered by specifying more colons than there are possible substitutions).

While testing that patch, I came across the following fork() bomb:

; > $history
; -

The code in history that skips prior invocations of the history program itself works most of the time, but if all the entries in the history file are history itself, it returns the first one anyway.

While fixing that, I noticed the horrid code in history.c::getcommand(), which would merrily read and write the character before the allocated hist buffer. The least disruptive change I could make was to add a "sentinel", so these accesses are now valid. (It's a rather unusual sentinel: its value is immaterial, just its location is used.)

2002-11-27

sigaction() tidy up

Jeremy Fitzhardinge pointed out a couple of errors around sigaction() that were reported by the remarkable valgrind.

While looking at signal.c, and SUSv3, I noticed the curious use of SA_INTERRUPT. Notwithstanding that I wrote this code, I actually have no idea where SA_INTERRUPT comes from: both SUSv3 and BSD have SA_RESTART instead (with the inverted semantics), and Linux comments SA_INTERRUPT as a "historical no-op". Memo to self: must smoke better drugs.

Since the configury checked for both sigaction() and SA_INTERRUPT before enabling the sigaction() code, we were using the old signal(), and the expensive test for restartable syscalls, and the grody code to make syscalls not restart, in many places where they were not necessary. For example, on NetBSD/i386, the development version of rc runs its configure script in 3.3s as opposed to 9.3s for rc-1.7. In addition, the binary is 175 bytes smaller. Oh, and the configure script itself is over 1k smaller.

2002-08-15

rc-1.7 released

2002-06-18

version tidy up

Erik Quanstrom brought this up; Paul Haahr and Carlo Strozzi made useful comments.

I'd made the version variable far too magical. I'd added it to var.c::varlookup(), next to apids and status. It was also marked "not for export" in hash.c::var_exportable().

Initially I planned simply to give it a default value on startup (in main.c::main()), but this means that any descendant rc processes inherit the value, which is not useful. So, there's a new array of variable names and flags (struct nameflag maybeexport[] in hash.c). It contains entries only for version and prompt (which seemed like it would benefit from the same treatment). The flag is set to FALSE in main.c::assigndefault(), and to TRUE in var.c::varassign().

While I was looking at this, I noticed that bqstatus and status were exportable. I changed them to be not exportable.

2002-04-03

limit code tidy up

Thanks to Chris Siebenmann for pointing out the flaws in the old code. The fundamental problem was that builtins.c::parselimit() used negative values for error returns, but on some systems RLIM_INFINITY is negative (e.g. -1 on Linux; -3 on Solaris.) The result is that on these systems, any attempt to set a resource to unlimited fails:

; limit stacksize unlimited
bad limit

The parselimit() code has other flaws. For example, an attempt to parse futhark results in a return value of -1024; parsing boring gives -1073741824. (In each case, the final character is taken as a multiplier.) All of these are considered bad limits. And 1:fifteen parses as 59!

Clearly, overloading the return value from parselimit() was a mistake. I followed the obvious route, changing parselimit() into a predicate that returns TRUE or FALSE according to whether it could parse its argument or not, with the result of the parse returned in a pointer argument. Along the way I noticed that the return value of parselimit() is stuffed into an int variable. This error doubtless goes back to my autoconfiscation of rc, when I introduced the use of rlim_t.

I also taught rc about some new limits. RLIMIT_AS is the SUSv2 spelling of RLIMIT_VMEM; in Solaris, they are synonymous (despite being documented separately). Linux has both RLIMIT_AS and RLIMIT_RSS, which apparently do have subtly different meanings. Linux also adds RLIMIT_NPROC, RLIMIT_MEMLOCK, and RLIMIT_LOCKS.

There are no standard English names for the newer limits. Here's a table which shows what I used, and compares it to several other popular shells. "N/A" means that the limit is not available in this operating system; "unimp" means that the limit is not implemented in this shell.

macro rc Linux csh NetBSD csh Solaris csh bash
CPU cputime -t
FSIZE filesize -f
DATA datasize -d
STACK stacksize -s
CORE coredumpsize -c
NOFILE descriptors openfiles descriptors -n
AS (or VMEM) memoryuse unimp N/A memorysize -v
RSS memoryrss memoryuse N/A -m
NPROC maxproc N/A -u
MEMLOCK memorylocked N/A -l
LOCKS filelocks unimp N/A N/A unimp

So, at the time of writing, rc has one feature that bash doesn't!

2001-11-23

Large file support

Thanks to Scott Schwartz for pointing out the need for this, and Chris Siebenmann for helping me with the implementation. Seems to be trivial, using the AC_SYS_LARGEFILE macro available in recent versions of autoconf.

There was just one wrinkle: under Linux at least, it is vital to define _FILE_OFFSET_BITS before including (any header file which includes) <code class="file">features.h``. At one point, I had the moral equivalent of this, broken, code:

#include <assert.h>
#define _FILE_OFFSET_BITS 64
#include <fcntl.h>

2001-10-25

fn prompt tailspin

Amazingly, I'm not aware of this having been reported as a bug before. I came across it while doing something completely different.

If there is a semantic error in fn prompt, rc goes into a tailspin, and must be killed. For example:

; fn prompt { echo (a b)^(c d e) }
bad concatenation
bad concatenation
... [ ad infinitum ]

A semantic error is anything that throws an eError exception. Considerably more insidious cases can be devised: fn prompt { echo $a^$b } may work fine till a and b take on unfortunate values.

My initial fix was to push a new exception on the stack before calling fn prompt, but on reflection I realised that a simple flag variable achieves the same goal.

2001-10-05

The noisy lists bug

Again, this isn't exactly a bug, more of a misfeature:

; fn x { echo (a b c d e) }
; whatis x
fn x {echo ((((a b) c) d) e)}

This was easy to fix. Simply keep a state variable inside footobar.c::Tconv() so that we only print the parentheses once.

You might be wondering why I chose a static variable here, but introduced the "alternate form" for the very similar case of quoting within variable names. To be honest, I didn't realise the similarity of the two cases at the time, and just coded them with whatever came to hand. Ex post facto, it seems to me that there is an important difference: in this case, only the handling of nLappend is affected, and the scope of the inlist variable can thus be made extremely tight. In the other case, there is an interaction between nVar and nWord processing, so a state variable would need to have the function scope of dollar anyway.

2001-10-03

The sneaky parens bug

On output (Tconv()), rc would quote a string if and only if it was quoted on input. Wolfgang Zekoll discovered that it's possible to sneak a quote-worthy string in by other means: specifically, by surrounding a variable name with parentheses. (I suspect there may be other ways, but I haven't discovered any.) Like this:

; (a.b) = foo
; fn x { echo $(a.b) }
; x
foo
; whatis x
fn x {echo $a.b}

Note that the last line is wrong: on input this is parsed as echo $a^.b. This means that the function x doesn't work in descendant rc processes.

The fix is in two parts. First, rc now scans strings for metacharacters to decide if they need quoting or not. This is handled by quotep() (which I put in lex.c, since it accesses nw[] and dnw[] which, I claim, should be static to that file&mdash;at present they're not, but maybe one day.) This is made uglier by the different quoting rules needed for variable names and other words, which is all down to free carets. I'm afraid I introduced an "alternate form" for Tconv() (i.e. fmtprint(... "%#T" ...)) which turns on the variable name quoting rules.

Now there is almost no need for the distinction between the Node types nWord and nQword, so I abolished it. (Note that in 7 out of 9 places, they were treated identically. The eighth was in Tconv(), which was fixed as described above. The ninth we shall come to shortly.)

Removing this distinction tidied up the lexer and parser slightly: previously, the lexer returned a quoted string as 'foo, which resulted in an unnatural do... while loop in the lexer, and a mysterious +1 in the parser. Unfortunately (here comes the ninth case) it also broke here documents: <<'EOF' is a very different thing from <<EOF. Darn.

So I added a q field to struct Word to hold this information. The lexer sets q if the string was quoted; the only place it is tested is in heredoc.c::qdoc().

And that's it. On reflection, the bug could probably have been fixed simply by resetting dollar when the lexer hits (. However, I think the new code is cleaner (and the compiler agrees: the binary is slightly smaller). It also means that rc now uses "minimal quoting" on output:

; fn x { echo 'foo' }
; whatis x
fn x {echo foo}

Earlier versions of rc would have maintained the quotes on 'foo'.

Incidentally, the "hilarious quoting bug", mentioned in trip.rc comes from here. In tree.c::mk(), I added the new member to nWord, but omitted to update the size of it. Hilarity ensues: on my development machine, the result was that alternate members of a list were quoted:

; x=('#' '#' '#')
; whatis x
x=('#' # '#')

I don't suppose this particular bug will ever occur again, but if it does, the new test will catch it.

2001-10-02

The Ctrl-A bug

(Actually it's a misfeature, not a bug.)

This is relatively straightforward. I stole Wconv() from es to output a list with ^A and ^B quoting. The functions are short enough that I don't think it would be worth trying to make Lconv() more generic, to handle all possible cases of outputting lists.

I rewrote footobar.c::parse_var() completely, removing the "Hacky routine... [that] scans backwards". The new code is straightforward, and I don't believe it's any less efficient than the old: both basically examine the string twice.

2001-09-27

Return to the Top

Problems? Comments? Questions? Contact us by email!