sed and awk#

Concepts#

sed — Stream Editor#

sed processes text line by line, applying transformations. It reads input, applies rules, and writes the result to stdout. The original file is unchanged unless you use the in-place flag (-i).

The most common use of sed is search and replace.

sed Substitution#

The core syntax:

sed 's/pattern/replacement/' file
#    │ │       │            │
#    │ │       │            └─ input
#    │ │       └─ what to replace with
#    │ └─ what to find (regex)
#    └─ s = substitute command

By default, s replaces only the first occurrence on each line. Add g for all occurrences:

sed 's/old/new/' file        # first occurrence per line
sed 's/old/new/g' file       # all occurrences per line (global)
sed 's/old/new/gi' file      # all occurrences, case insensitive
sed 's/old/new/2' file       # second occurrence per line only

sed Examples#

# Replace "foo" with "bar"
echo "foo foo foo" | sed 's/foo/bar/'
# bar foo foo (only first)

echo "foo foo foo" | sed 's/foo/bar/g'
# bar bar bar (all)

# Remove lines starting with # (comments)
sed '/^#/d' config.txt

# Delete empty lines
sed '/^$/d' file.txt

# Remove leading whitespace
sed 's/^[ \t]*//' file.txt

# Remove trailing whitespace
sed 's/[ \t]*$//' file.txt

# Add a prefix to every line
sed 's/^/PREFIX: /' file.txt

# Print only lines 5 through 10
sed -n '5,10p' file.txt

# Print only lines matching a pattern
sed -n '/error/p' log.txt

sed In-Place Editing#

# Edit the file directly (modify the original)
sed -i 's/old/new/g' file.txt

# Create a backup before modifying
sed -i.bak 's/old/new/g' file.txt
# Creates file.txt.bak with the original, modifies file.txt

macOS note: macOS’s sed requires sed -i '' 's/...' (empty string for backup extension). Linux sed uses sed -i 's/...' without the empty string.

sed Delete and Insert#

# Delete line 3
sed '3d' file.txt

# Delete lines 3 through 7
sed '3,7d' file.txt

# Delete lines matching a pattern
sed '/pattern/d' file.txt

# Insert a line before line 3
sed '3i\New line here' file.txt

# Append a line after line 3
sed '3a\New line after' file.txt

sed Using Different Delimiters#

When your pattern contains /, use a different delimiter:

# Instead of escaping: sed 's/\/usr\/local\/bin/\/opt\/bin/g'
# Use a different delimiter:
sed 's|/usr/local/bin|/opt/bin|g' file.txt
sed 's#/usr/local/bin#/opt/bin#g' file.txt

sed Multiple Commands#

# Use -e for multiple operations
sed -e 's/foo/bar/g' -e 's/baz/qux/g' file.txt

# Or use semicolons
sed 's/foo/bar/g; s/baz/qux/g' file.txt

awk — Pattern Scanning and Processing#

awk is a full programming language for text processing, but you mostly use it for extracting and formatting columnar data. awk processes input line by line, splitting each line into fields.

awk Basics#

awk '{print $1}' file.txt
#    │     │
#    │     └─ print the first field
#    └─ action to perform on each line

By default, awk splits lines on whitespace (spaces and tabs). Each piece becomes a numbered field:

$0  = entire line
$1  = first field
$2  = second field
$NF = last field
NR  = line number (record number)
NF  = number of fields on current line

Example:

echo "Alice 30 Engineer" | awk '{print $1}'
# Alice

echo "Alice 30 Engineer" | awk '{print $1, $3}'
# Alice Engineer

echo "Alice 30 Engineer" | awk '{print $NF}'
# Engineer (last field)

awk Field Separator#

# Use -F to set a different delimiter
awk -F: '{print $1}' /etc/passwd           # split on :
awk -F, '{print $2}' data.csv              # split on comma
awk -F'\t' '{print $1}' file.tsv           # split on tab

awk Patterns (Conditions)#

# Print only lines matching a pattern
awk '/error/' log.txt

# Print field 1 where field 3 > 100
awk '$3 > 100 {print $1, $3}' data.txt

# Print lines where field 7 is "/bin/bash"
awk -F: '$7 == "/bin/bash" {print $1}' /etc/passwd

# Print lines longer than 80 characters
awk 'length > 80' file.txt

# Print specific line numbers
awk 'NR >= 5 && NR <= 10' file.txt

awk Built-in Variables#

Variable	Meaning
`$0`	Entire current line
`$1, $2, ...`	Fields
`$NF`	Last field
`NR`	Current line number
`NF`	Number of fields on current line
`FS`	Field separator (default: whitespace)
`OFS`	Output field separator (default: space)
`RS`	Record separator (default: newline)
`ORS`	Output record separator (default: newline)

awk BEGIN and END#

# BEGIN runs before processing, END runs after
awk 'BEGIN {print "=== Report ==="} {print $1} END {print "=== Done ==="}' file.txt

# Calculate sum of a column
awk '{sum += $2} END {print "Total:", sum}' data.txt

# Count lines
awk 'END {print NR, "lines"}' file.txt

# Calculate average
awk '{sum += $3; count++} END {print "Average:", sum/count}' data.txt

awk Formatting#

# printf for formatted output
awk '{printf "%-20s %5d\n", $1, $2}' data.txt
# Left-aligned name (20 chars), right-aligned number (5 digits)

# Change output field separator
awk -F: 'BEGIN {OFS=","} {print $1, $3, $7}' /etc/passwd
# Outputs: username,uid,shell

awk Practical One-Liners#

# Sum a column of numbers
awk '{s+=$1} END {print s}' numbers.txt

# Print unique values of column 1
awk '!seen[$1]++' file.txt

# Swap two columns
awk '{print $2, $1}' file.txt

# Count occurrences of values in a column
awk '{count[$1]++} END {for (k in count) print k, count[k]}' file.txt

# Convert CSV to TSV
awk -F, 'BEGIN {OFS="\t"} {$1=$1; print}' file.csv

Lab#

Exercise 1: sed Substitution#

mkdir -p ~/lab/sedawk
cd ~/lab/sedawk

cat > config.txt << 'EOF'
# Server Configuration
host = localhost
port = 8080
protocol = http
database = myapp_dev
log_level = debug
# End of config
EOF

# Replace localhost with 192.168.1.100
sed 's/localhost/192.168.1.100/' config.txt

# Replace debug with info
sed 's/debug/info/' config.txt

# Multiple replacements
sed -e 's/localhost/192.168.1.100/' -e 's/debug/info/' -e 's/http/https/' config.txt

# Remove comments
sed '/^#/d' config.txt

# Remove comments AND empty lines
sed '/^#/d; /^$/d' config.txt

Exercise 2: sed In-Place Editing#

cd ~/lab/sedawk

# Make a copy to work with
cp config.txt config_edit.txt

# Edit in place with backup
sed -i.bak 's/8080/9090/g' config_edit.txt

# Compare
diff config_edit.txt config_edit.txt.bak

# Verify
cat config_edit.txt

Exercise 3: awk Field Extraction#

cd ~/lab/sedawk

# Extract usernames from /etc/passwd
awk -F: '{print $1}' /etc/passwd | head -10

# Extract username and shell
awk -F: '{print $1, $7}' /etc/passwd | head -10

# With nice formatting
awk -F: '{printf "%-20s %s\n", $1, $7}' /etc/passwd | head -10

# Only users with /bin/bash
awk -F: '$7 == "/bin/bash" {print $1}' /etc/passwd

Exercise 4: awk Calculations#

cd ~/lab/sedawk

# Create sample data
cat > sales.txt << 'EOF'
Alice 150 200 175
Bob 200 180 220
Carol 175 190 210
Dave 220 240 180
Eve 190 170 200
EOF

# Print names and their first column
awk '{print $1, $2}' sales.txt

# Calculate row totals
awk '{total = $2 + $3 + $4; print $1, total}' sales.txt

# Calculate column average
awk '{sum += $2} END {print "Average Q1:", sum/NR}' sales.txt

# Find the maximum value in column 2
awk 'BEGIN {max=0} $2 > max {max=$2; name=$1} END {print "Top Q1:", name, max}' sales.txt

Exercise 5: awk with Patterns#

cd ~/lab/sedawk

# Create a log file
cat > access.log << 'EOF'
192.168.1.100 GET /index.html 200
192.168.1.101 GET /about.html 200
192.168.1.100 POST /login 401
10.0.0.5 GET /admin 403
192.168.1.102 GET /index.html 200
10.0.0.5 GET /admin 403
192.168.1.100 GET /dashboard 200
192.168.1.101 POST /api/data 500
EOF

# Show only errors (status >= 400)
awk '$4 >= 400' access.log

# Count requests per IP
awk '{count[$1]++} END {for (ip in count) print ip, count[ip]}' access.log

# Count requests per status code
awk '{count[$4]++} END {for (code in count) print code, count[code]}' access.log | sort

# Show only POST requests
awk '$2 == "POST"' access.log

# Show unique IPs
awk '{print $1}' access.log | sort -u

Exercise 6: Combining sed and awk#

cd ~/lab/sedawk

# Extract only the values from config.txt (remove comments, get right side of =)
grep -v "^#" config.txt | grep "=" | sed 's/.*= //'

# Same thing with awk
grep -v "^#" config.txt | grep "=" | awk -F' = ' '{print $2}'

# Create a CSV from /etc/passwd (username,uid,shell)
awk -F: 'BEGIN {OFS=","} $3 >= 1000 && $7 != "/usr/sbin/nologin" {print $1,$3,$7}' /etc/passwd

# Clean up
cd ~
rm -rf ~/lab/sedawk

Review#

1. What is the basic sed substitution syntax?

sed 's/pattern/replacement/flags' file — where s is the substitute command, g flag replaces all occurrences per line (not just the first), and i flag makes it case-insensitive.

2. How do you edit a file in place with sed?

sed -i 's/old/new/g' file.txt. Add a backup extension for safety: sed -i.bak 's/old/new/g' file.txt creates a backup before modifying.

3. How does awk split lines into fields?

By default, awk splits on whitespace. $1 is the first field, $2 the second, $NF the last. Use -F to set a different delimiter: awk -F: '{print $1}' /etc/passwd.

4. What do awk's BEGIN and END blocks do?

BEGIN runs once before processing any lines (useful for initialization, headers). END runs once after all lines are processed (useful for summaries, totals, averages).

5. How do you delete lines matching a pattern with sed?

sed '/pattern/d' file.txt. For example, sed '/^#/d' config.txt deletes all comment lines.

6. How do you use a different delimiter in sed when the pattern contains `/`?

Use any other character as the delimiter: sed 's|/old/path|/new/path|g' or sed 's#/old/path#/new/path#g'.

7. How do you count occurrences of a value in a column with awk?

awk '{count[$1]++} END {for (k in count) print k, count[k]}' file.txt — this uses an associative array to tally each value in the first column, then prints the results.

Previous: grep | Next: Sorting, Cutting, and Counting