< Zurück | Inhalt | Weiter >

1.3.6 Seeing Stars

We need to describe shell pattern matching for those new to it. It’s one of the more powerful things that the shell (the command processor) does for the user—and it makes all the other commands seem that much more powerful.

When you type a command like we did previously:

mv /usr/oldproject/*.java .

the asterisk character (called a “star” for short) is a shorthand to match any characters, which in combination with the .java will then match any file in the /usr/oldproject directory whose name ends with .java.

There are two significant things to remember about this feature. First, the star and the other shell pattern matching characters (described below) do not mean the same as the regular expressions in vi or other programs or languages. Shell pattern matching is similar in concept, but quite different in specifics.

Second, the pattern matching is done by the shell, the command inter- preter, before the arguments are handed off to the specific command. Any text with these special characters is replaced, by the shell, with one or more filenames that match the pattern. This means that all the other Linux commands (mv, cp, ls, and so on) never see the special characters—they don’t do the pattern matching, the shell does. The shell just hands them a list of filenames.

The significance here is that this functionality is available to any and every command, including shell scripts and Java programs that you write, with no extra effort on your part. It also means that the syntax for specifying multiple files doesn’t change between commands—since the commands don’t implement that syntax; it’s all taken care of in the shell before they ever see it. Any com- mand that can handle multiple filenames on a command line can benefit from this shell feature.

If you’re familiar with MS-DOS commands, consider the way pattern matching works (or doesn’t work) there. The limited pattern matching you have available for a dir command in MS-DOS doesn’t work with other com- mands—unless the programmer who wrote that command also implemented the same pattern matching feature.

What are the other special characters for pattern matching with filenames? Two other constructs worth knowing are the question mark and the square brackets. The “?” will match any single character.

The [...] construct is a bit more complicated. In its simplest form, it matches any of the characters inside; for example, [abc] matches any of a or b or c. So Version[123].java would match a file called Version2.java but not those called Version12.java or VersionC.java. The pattern Version*.java would match all of those. The pattern Version?.java would match all except Version12.java, since it has two characters where the ? matches only one.

The brackets can also match a range of characters, as in [a-z] or [0-9]. If the first character inside the brackets is a “^” or a “!”, then (think “not”) the meaning is reversed, and it will match anything but those characters. So Version[^0-9].java will match VersionC.java but not Version1.java. How would you match a “-”, without it being taken to mean a range? Put it first inside the brackets. How would you match a “^” or “!” without it being understood as the “not”? Don’t put it first.

Some sequences are so common that a shorthand syntax is included. Some other sequences are not sequential characters and are not easily expressed as a range, so a shorthand is included for those, too. The syntax for these special

sequences is [:name:] where name is one of: alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit. The phrase [:alpha:] matches any alphabetic character. The phrase [:punct:] matches any punctuation character. We think you got the idea. Escape at Last


Of course there are always times when you want the special character to be just that character, without its special meaning to the shell. In that case you need to escape the special meaning, either by preceding it with a backslash or by en- closing the expression in single quotes. The commands rm Account\$1.class or rm 'Account$1.class' would remove the file even though it has a dollar sign in its name (which would normally be interpreted by the shell as a vari- able). Any character sequence in single quotes is left alone by the shell; no spe- cial substitutions are done. Double quotes still do some substitutions inside them, such as shell variable substitution, so if you want literal values, use the single quotes.


As a general rule, if you are typing a filename which contains something other than alphanumeric characters, underscores, or periods, you probably want to enclose it in single quotes, to avoid any special shell meaning.