When you’re done reading this article, you’ll be ready to become a sed
expert. Yes, you read it right – So let’s hit the ground running.
What is sed
?
Ever wanted to edit a file without having to open it in a text editor? Or maybe you want to modify several files and don’t want to do it manually?
With sed
, you just issue a short command in the terminal and bam!, you’ve
changed the contents of a file. You can see it as CTRL+R
on super steroyds.
sed
is capable of doing simple search and replace (regex based), delete lines,
add lines or even reverse the lines of a file.
FYI: I’ve been using the word “file” here, and that includes stdin
and
stderr
.
The simples usage
Those that have used sed
before might be familiar with this command, which
is a regex based text replacement. It replaces “hello” by “world”:
$ echo 'hello world' | sed 's/hello/world/'
world world
Or clearing all lines that start with a #
(in a horrendous way!):
$ cat input.txt | sed 's/^#.*//'
We should not get satisfied with this! sed
can do so much more. Only knowing
how to “search-and-replace” with sed
is like owning a Ferrari but never taking
it to the racetrack.
So let’s take a step back and look at the anatomy of a sed
expression. It will
be dense reading, but bear with me, it will be totally worth it.
Expressions
sed
expressions take the following form:
[address]command[options]
address
: An optional filter which lines to apply thecommand
to.command
: a mandatory single letter to specify what to do to the selected lines.options
: Optional. Each command has its own options, which can look quite different.
Looking back at our initial example:
` s/hello/world/g`
^^\____________/
|| ^
|| |
|| +- Options of command `s`
|+- Signle letter command `s` that means substitute
+- Address: No address given, fine, it's optional
Here we can see that the s/hello/world
expression ommits the address
, uses
the s
command with /hello/world
as options
.
What the heck is an address
though?
Address
The address itself has its own general syntax:
addr1[,addr2][!]
Where addr
can be either a line number or a regular expression, and the
optional !
(bang) reverses the meaning ;)
Examples using line numbers:
address | meaning |
---|---|
1 | Matches only line number 1 |
54 | Matches only line number 54 |
1,5 | Matches lines from 1 to 5 (included) |
1,5! | Match all lines that are not from 1 to 5 |
Examples using regular expressions:
address | meaning |
---|---|
/abc/ | Matches lines containing “abc”. |
\ra*r | Matches lines containing “abc”. Instead of marking the regular expression with /<regex>/ it uses \r<regex>r , where r can be any character. |
/abc/! | Matches lines that do not contain “abc”. |
Mixed examples:
address | meaning |
---|---|
/abc/,2 | Matches a line containing “abc” and the next 2 lines |
2,/abc/ | Matches the second line until a line containing “abc” |
Pimp up the s
command with addresses
Now you should know what this will do:
sed '2,4s/hello/world'
Yep, it will substitute “hello” by “world” from lines 2 to 4. You can go bananas
combining the addresses, with the s
command.
Tip: use the !
for extra points with your peers – this is barely
documented. I don’t know why the man
pages don’t make it clearly. Ugh!
Command
There are quite a lot of commands to familiarize yourself with. All of them are
a single character, sometimes followed by a \\
. Here are some useful commands
to try out:
s
s/regex/replacement/flags
This is the most useful and most known sed
command. It is also the one with
the most options.
You can refer to the sed
manual
for detailed explanations on all flags. Here I will mention some useful tips for
the s
command:
1. Use groups and references
s/\(hello \)\(world \)/\2\1/
| | | |
| | | +-> \1: reference to group 1
| | +---> \2: referebce to group 2
| +------------> \(world\): group 2
+---------------------> \(hello\): group 1
This example swaps around the words “hello” and “world”
Groups are created by surrounding parts of your regular expression with
escaped parenthesis \( \)
. Then in the replacement
you can refer to a
group using \1
syntax.
2. g
: Apply to all occurences in each line
More often than not, this is what you want. So your expression will usually looke like this:
s/search/replace/g
3. i
: Case insensitive
d
1,5d
Delete lines that were matched by the address
. See how addresses can be super
useful?
p
sed -n '1,5p'
Only print lines from 1 to 5. The -n
flag tells sed
to not print anything by
default.
n
Fast-forward one line. You’ll understand that in the next chapter
n;n;s/a/b/
This command fast replaces “a” by “b” every third line.
Internal workings of sed
Understanding what sed
does under the hood will take your sed
itious work to
the next level!
First, accept this fact: sed
has 2 buffers:
- pattern space
- hold space
Those are simply “variables” that hold some information.
Also accept that sed
runs in cycles. Each cycle does this:
- Read line from the input stream. A line is a sequence of characters ended by
a newline
\n
. - Remove the trailing newline.
- Store the line in the pattern space.
- Check if the line matches the
address
. - If matched, run the commands. The commands may change the contents of the pattern and the hold spaces.
- Print out the content of the pattern space.
- Delete content of pattern space, but keep the hold space untouched.
- Repeat.
g
and h
options
Say we have a simple imput and want to shuffle the lines as follows:
input result
----------------------
line 1 line 2
line 2 line 3
line 3 ----> line 4
line 4 line 1
line 5 line 5
We want to cut out line 1
and paste it after line 4
. Easy.
cat input.txt > sed '1h;1d;4p;4g'
Internally, sed
will perform the following operations:
cycle (line) | command | pattern | hold | output
----------------|-------------------|----------|----------|---------
1 | read one line | line 1 | |
1 | 1h | line 1 | line 1 |
1 | 1d (end cycle) | | line 1 |
1 - end | print pattern | | line 1 | <blank>
2 | read one line | line 2 | line 1 |
2 | 1h;1d;4p;4g | line 2 | line 1 |
2 - end | print pattern | line 2 | line 1 | line 2\n
3 | read one line | line 3 | line 1 |
3 | 1h;1d;4p;4g | line 3 | line 1 |
3 - end | print pattern | line 3 | line 1 | line 3\n
4 | read one line | line 4 | line 1 |
4 | 1h;1d | line 4 | line 1 |
4 | 4p | line 4 | line 1 | line 4\n
4 | 4g | line 1 | line 1 |
1 - end | print pattern | line 1 | line 1 | line 1\n
5 | read one line | line 5 | line 1 |
5 | 1h;1d;4p;4g | line 5 | line 1 |
5 - end | print pattern | line 5 | line 1 | line 5\n
In the step-by-step above we go through each line of input and each command that
is executed on them. The command 1h
stores the content of the first line in the
hold space. In order to prevent line 1\n
from being printed, we delete it
from the pattern space with 1d
. This command also ignores the next commands
and immediately starts a new cycle.
When we reach the 4th cycle we replace the pattern space with the contents of
hold space. But before doing that we print out the initial content with 4p
.
Remember: read into pattern space, operate, print out the pattern space. Simple.