Pipes and redirection

< directs the contents of a file into a program as stdin.

| pipes data from one program to another, forming a pipeline.

> (or 1>) will redirect a pipeline's stdout to overwrite a given file location.

>> will redirect stdout to append to a given file location.

2> will redirect a pipleline's stderr.

&> will redirect both stdout and stderr to the same file location.

grep

Searches for a given pattern line-by-line in an input file or stdin stream and prints matching lines to stdout. Useful as a filter operation in a pipeline.

sed

Performs a line-by-line regex-based find/replace on an input file or stdin stream and prints output to stdout. Useful as a mapping operation in a pipeline. -i enables in-place find/replace on a file.

awk

Automatically parses line-by-line file or stdin stream input with delimiters (eg a CSV file) into multiple input variables, which can be manipulated with and output to stdout. Useful in pipelines as a mapping operation over tabular/record-based data.

Perl

With the help of specific command line flags, Perl one-liners can reproduce and extend the functionality of grep, sed, or awk.

Perl "micro-scripts" can be particularily useful for easily combining find/replace operations using regular expressions with math operations. For example, to convert 30 fps video timestamps from a frame-based format to one based on milliseconds:

perl -pe 's/(\d\d);(\d\d);(\d\d);(\d\d)/"$1:$2:$3,".(sprintf "%03d", $4 * 1000 \/ 30)/ge'

00;00;03;25 is converted into 00:00:03,833. -p runs the operation repeatedly on each line piped to stdin and sends the result to stdout (like sed or awk would).

read

Reads a line from stdin. Mainly useful in shell scripts for accepting interactive input or piping stream output into a while-loop for more sophisticated processing.

sort

Concatenates and sorts the input stream (or files) and returns the results to stdout. Can take a delimeter and sort on specified columns of tabular data.

uniq

Eliminates adjacent duplicate lines from input stream (or file) and returns results to output stream (or file). Can also prepend the count of adjacent lines (useful for "histogramming" categorical data in combination with sort).

cut

Removes subsets of lines passed in through a file or stream. Useful for filtering out unnessecary columns from tabular datasets.