In a past posting, I asked about commands in Bash to align text columns against one another by row. It has become clear to me that the desired task (i.e., aligning text columns of different size and content by row) is much more complex than initially anticipated and that the proposed answer, while acceptable for the past posting, is insufficient on most empirical data sets. Thus, I would like to query the community on the following pseudocode. Specifically, I would like to know if and in what way the following pseudocode could be optimized.
Assume a file with n columns of strings. Some strings might be missing, others might be duplicated. The longest column may not be the first one listed in the file, but shall be the reference column. The order of the rows of this reference column must be maintained.
> cat file # where n=3; first row contains column headers
CL1 CL2 CL3
foo foo bar
bar baz qux
baz qux
qux foo
bar
Pseudocode attempt 1 (totally inadequate):
Shuffle columns so that columns ordered by size (i.e., longest column is first in matrix)
Rownames = strings of first column (i.e., of longest column)
For rownames
For (colname among columns 2:end)
if (string in current cell == rowname) {keep string in location}
if (string in current cell != rowname) {
if (string in current cell == rowname of next row) {add row to bottom of table; move each string of current column one row down}
if (string in current cell != rowname of next row) {add row to bottom of table; move each string of all other columns one row down}
}
Order columns by size:
> cat file_columns_ordered_by_size
CL2 CL1 CL3
foo foo bar
baz bar qux
qux baz
foo qux
bar
Sought output:
> my_code_here file_columns_ordered_by_size
CL2 CL1 CL3
foo foo
bar bar
baz baz
qux qux qux
foo
bar