Introducing the Shell
OverviewTeaching: 5 min
Exercises: 0 minQuestions
What is a command shell and why would I use one?Objectives
Explain how the shell relates to the keyboard, the screen, the operating system, and users’ programs.
Explain when and why command-line interfaces should be used instead of graphical interfaces.
We are all familiar with graphical user interfaces (GUI - windows, icons and pointers). They are easy to learn and fantastic for simple tasks where a vocabulary consisting of “click” translates easily into “do the thing I want”. But this magic relies on wanting a simple set of things, and having programs that can do exactly those things.
If you wish to do complex, purpose-specific things it helps to have a richer means of expressing your instructions to the computer. It doesn’t need to be complicated or difficult, just a vocabulary of commands and a simple grammar for using them.
This is what the shell provides - a simple language and a command-line interface to use it through.
The heart of a command-line interface is a read-evaluate-print loop, or REPL, called so because when you type a command and press the Enter (or Return) key, the shell:
- Reads it
- Executes (or “evaluates” it)
- Prints the output
and then prints the prompt and waits for you to enter another command.
A shell is a program like any other. What’s special about it is that its job is to run other programs rather than to do calculations itself. The most popular Unix shell is Bash, the Bourne Again SHell (so-called because it’s derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.
What does it look like?
A typical shell command and output looks something like this:
bash-3.2$ bash-3.2$ ls -F / Applications/ System/ Library/ Users/ Network/ Volumes/ bash-3.2$
The first line shows only a prompt, indicating that the shell is waiting for input. Your shell may use different text for the prompt. Most importantly: when typing commands, either from these lessons or from other sources, do not type the prompt, only the commands that follow it.
The part that you type (in this example
ls -F /)
typically has the following structure: a command,
some flags (also called options or switches) and an argument.
Flags start with a dash (
-), and change the behaviour of a command.
Arguments tell the command what to operate on (e.g. files and directories).
Sometimes flags and arguments are referred to as parameters.
A command can be called with more than one flag and more than one argument: but a
command doesn’t always require an argument or a flag.
In the example above, our command is
ls, with a flag
-F and an
/. Each part is separated by spaces: if you omit the space
-F the shell will look for a command called
doesn’t exist. Also, capitalization matters:
LS is different to
Next we see the output that our command produced. In this case it is a listing
of files and directories in a location called
/ - we’ll cover what all these mean
later today. Those with a Mac might recognize the output in this example.
Finally, the shell again prints the prompt and waits for you to type the next command.
Open a shell window and try entering
ls -F / for yourself (don’t forget that spaces
and capitalization are important!).
How does the shell know what
ls and its flags mean?
Every command is a program stored somewhere on the computer, and the shell keeps a
list of places to search for commands (the list is in a variable called
but those are concepts we’ll meet later and not too important at the moment). Recall
that commands, flags and arguments are separated by spaces.
So let’s look at the REPL (read-evaluate-print loop) in more detail. Notice that the “evaluate” step is made of two parts:
- Read what was typed (
ls -F /in our example) The shell uses the spaces to split the line into the command, flags, and arguments
a. Find a program called
lsb. Execute it, passing it the flags and arguments (
/) to interpret as the program sees fit
- Print the output produced by the program
and then print the prompt and wait for you to enter another command.
Command not found
If the shell can’t find a program whose name is the command you typed, it will print an erorr message like:
$ ls-F -bash: ls-F: command not found
Usually this means that you have mis-typed the command - in this case we omitted the space between
Is it difficult?
It isn’t difficult, but it is a different model of interacting than a GUI, and that will take some effort - and some time - to learn. A GUI presents you with choices and you select one. With a CLI the choices are combinations of commands and parameters, more like words in a language than buttons on a screen. They are not presented to you so you must learn a few, like learning some vocabulary in a new language. But a small number of commands gets you a long way. Most people need to look up documentation or search the web for Unix commands they don’t know off the top of their head.
Flexibility and automation
The grammar of a shell allows you to combine existing tools into powerful pipelines and handle large volumes of data automatically. Sequences of commands can be written into a script, improving the reproducibility of workflows and allowing you to repeat them easily.
In addition, the command line is often the easiest way to interact with remote machines and supercomputers. Familiarity with the shell is essential to use Pawsey resources and will be used in Pawsey training.
As clusters and cloud computing systems become more popular for scientific data crunching, being able to interact with the shell is becoming a necessary skill. We can build on the command-line skills covered here to tackle a wide range of scientific questions and computational challenges.
There are many tricks and techniques that can make using a shell easier or more efficient, in many different situations.
Sometimes we need to stop a command that is running, because it is taking too long, or perhaps we realise that it is not the correct command, or it needs to be run with different arguments. Ctrl-C will send a signal to stop a running command. Press Ctrl-C once, and wait until you are returned to a prompt.
Tab autocompletion is helpful to complete the names of long commands, or even longer complex filenames. To use it, start typing the name and press the tab key. If the shell can unambiguously figure out the command or filename from what you have typed, it will complete it for you. If not, you can press tab twice to display a list of possible options.
To repeat a command that you have previously run in your current shell or even a previous time you ran the shell, you can access the shell history. Press the up arrow key to scroll upward through your most recent commands, right back to the first command you ever typed. Once you have the one you want, you can press Enter to execute it again, or change it to suit what you need this time.
Nelle’s Pipeline: Example used in material
Nelle Nemo, a marine biologist, has just returned from a six-month survey of the North Pacific Gyre, where she has been sampling gelatinous marine life in the Great Pacific Garbage Patch. She has 1520 samples in all and now needs to:
1. Run each sample through an assay machine
that will measure the relative abundance of 300 different proteins.
The machine’s output for a single sample is
a file with one line for each protein.
2. Calculate statistics for each of the proteins separately
using a program her supervisor wrote called
3. Write up results.
Her supervisor would really like her to do this by the end of the month
so that her paper can appear in an upcoming special issue of ‘Aquatic Goo Letters’.
It takes about half an hour for the assay machine to process each sample. The good news is that it only takes two minutes to set each one up. Since her lab has eight assay machines that she can use in parallel, this step will “only” take about two weeks.
The bad news is that if she has to run
goostats by hand,
she’ll have to enter filenames and click “OK” 1520 times.
At 30 seconds per sample,
the whole process will take more than 12 hours
(and that’s assuming the best-case scenario where she is ready to enter the next file name
as soon as the previous sample analysis has finished).
This zero-breaks always-ready scenario is only achieveable by a machine so it would
likely take much longer than 12 hours, not to mention that
the chances of her typing all of those commands correctly are practically zero.
Missing that paper deadline is looking increasingly likely.
The next few lessons will explore what she should do instead. More specifically, they explain how she can use a command shell to automate the repetitive steps in her processing pipeline so that her computer can work 24 hours a day while she writes her paper. As a bonus, once she has put a processing pipeline together, she will be able to use it again whenever she collects more data.
A shell is a program whose primary purpose is to read commands and run other programs.
The shell’s main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access networked machines.
The shell’s main disadvantages are its primarily textual nature and how cryptic its commands and operation can be.
Shell tips a) CTRL+C: to kill / exit current process b) Tab autocomplete: esp. useful to enter long complex filenames c) UP arrow: gives you previous commands entered