Figuring out what number of characters are in a file is straightforward on the Linux command line: use the ls -l command.
However, if you wish to get a rely of what number of instances every character seems in your file, you’re going to want a significantly extra sophisticated command or a script. This publish covers a number of totally different choices.
Counting what number of instances every character seems in a file
To rely what number of of every character are included in a file, you might want to string collectively a sequence of instructions that may take into account every character and use a type command earlier than it counts what number of of every character are included.
To try this, you should use a command like this one:
$ cat myfile | sed 's/(.)/n1/g' | type | uniq -c | column 24 58 c 112 i 132 o 7 T 254 2 C 3 I 2 O 30 u 1 ' 50 d 4 j 29 p 23 v 25 , 163 e 5 okay 1 P 9 w 20 . 2 E 60 l 2 q 4 x 142 a 21 f 48 m 90 r 36 y 5 A 16 g 2 M 1 R 3 z 23 b 1 G 117 n 147 s 1 B 51 h 1 N 119 t
The sed command will separate the file right into a single character chunks. That output is then sorted by the type command. After that, every group of the identical character is counted by the uniq -c command and the column command is used to create the multi-column output. Because the outcomes are primarily based on the file content material, no characters are listed apart from these within the file.
Discover that the output shows the record of characters within the chosen file in alphanumeric order due to the type command. The primary two characters aren’t proven as a result of linefeeds and areas are solely recognizable in context.
If you wish to show the characters in frequency order as a substitute, all you might want to do is add a second type command utilizing the -g (normal numeric).
$ cat myfile | sed 's/(.)/n1/g' | type | uniq -c | type -g | column 1 ' 2 O 9 w 30 u 117 n 1 B 2 q 16 g 36 y 119 t 1 G 3 I 20 . 48 m 132 o 1 N 3 z 21 f 50 d 142 a 1 P 4 j 23 b 51 h 147 s 1 R 4 x 23 v 58 c 163 e 2 C 5 A 24 60 l 254 2 E 5 okay 25 , 90 r 2 M 7 T 29 p 112 i
To reverse the itemizing to indicate essentially the most continuously used characters first, add an r (reverse) choice to that final type command.
$ cat myfile | sed 's/(.)/n1/g' | type | uniq -c | type -gr | column 254 60 l 24 5 A 2 C 163 e 58 c 23 v 4 x 1 R 147 s 51 h 23 b 4 j 1 P 142 a 50 d 21 f 3 z 1 N 132 o 48 m 20 . 3 I 1 G 119 t 36 y 16 g 2 q 1 B 117 n 30 u 9 w 2 O 1 ' 112 i 29 p 7 T 2 M 90 r 25 , 5 okay 2 E
The character on the high of the record is, as I assume you guessed, the area character. The second most frequently used character within the file is an “e”. No shock there both. As well as, capital letters are listed final since they don’t seem to be continuously used.
Be aware that for those who do not need to distinguish between uppercase and lowercase letters you possibly can insert a tr (translate) command into the command string like this:
$ cat myfile | sed 's/(.)/n1/g' | tr '[:upper:]' '[:lower:]' | type | uniq -c | type -gr | column" 254 115 i 36 y 21 f 3 z 165 e 91 r 30 u 20 . 2 q 147 s 60 l 30 p 17 g 1 ' 147 a 60 c 25 , 9 w 134 o 51 h 24 b 5 okay 126 t 50 m 24 4 x 118 n 50 d 23 v 4 j
Change the positions of the “higher” and “decrease” arguments to show the outcomes all in uppercase.
Counting character-by-character in a phrase or phrase
You may as well use a command much like these proven above to rely what number of instances every letter seems in a single phrase or phrase. Right here’s an instance:
$ echo "Hiya, World!" | sed 's/(.)/n1/g' | type | uniq -c | type -gr | column 3 l 1 r 1 d 1 2 o 1 H 1 , 1 1 W 1 e 1 !
Utilizing an alias
Whereas the instructions proven above are intelligent, they’re not simple to recollect or kind. Creating an alias may help with this. When you resolve what type of output you favor, flip the command into an alias like this:
$ alias CountChars="sed 's/(.)/n1/g' | type | uniq -c | type -gr | column"
Save the alias in your .bashrc file to be able to use it as wanted. Then use it in instructions like these:
$ cat myfile | CountChars 254 60 l 24 5 A 2 C 163 e 58 c 23 v 4 x 1 R 147 s 51 h 23 b 4 j 1 P 142 a 50 d 21 f 3 z 1 N 132 o 48 m 20 . 3 I 1 G 119 t 36 y 16 g 2 q 1 B 117 n 30 u 9 w 2 O 1 ' 112 i 29 p 7 T 2 M 90 r 25 , 5 okay 2 E $ echo "Hiya, World!" | CountChars 3 l 1 r 1 d 1 2 o 1 H 1 , 1 1 W 1 e 1 !
Utilizing a script
If you wish to see solely alphabetic characters, you should use a script just like the one proven under. It first modifications all of the letters to lowercase earlier than it runs by the alphabet, makes use of awk to rely the variety of instances every letter seems after which shows the counts provided that they’re bigger than 1. It solely works with no matter string is offered as an argument.
#!/bin/bash # make argument all lowercase
string=$(echo $1 | tr '[:upper:]' '[:lower:]') for char in {a..z} do rely=`awk -F"${char}" '{print NF-1}' <<< "${string}"` if [ $count != 0 ]; then echo -n $char: echo $rely fi accomplished
Run it like this:
$ CountByChar "Hiya, World!"
d:1
e:1
h:1
l:3
o:2
r:1
w:1
Be aware that characters will at all times be listed in alphabetical order. You may pipe the output to the column command if you would like fewer traces of output.
$ CountByChar "Hiya, World!" | column d:1 e:1 h:1 l:3 o:2 r:1 w:1
Wrap-up
Whether or not you’re searching for character counts in information or phrases, there are some useful choices out there. Turning the complicated ones into aliases might be one of the simplest ways to make the duty simple.
Copyright © 2022 IDG Communications, Inc.