Counting particular person characters on Linux

October 26, 2022

1

Figuring out what number of characters are in a file is straightforward on the Linux command line: use the ls -l command.

However, if you wish to get a rely of what number of instances every character seems in your file, you’re going to want a significantly extra sophisticated command or a script. This publish covers a number of totally different choices.

Counting what number of instances every character seems in a file

To rely what number of of every character are included in a file, you might want to string collectively a sequence of instructions that may take into account every character and use a type command earlier than it counts what number of of every character are included.

To try this, you should use a command like this one:

$ cat myfile | sed 's/(.)/n1/g' | type | uniq -c | column
     24              58 c           112 i           132 o             7 T
    254               2 C             3 I             2 O            30 u
      1 '            50 d             4 j            29 p            23 v
     25 ,           163 e             5 okay             1 P             9 w
     20 .             2 E            60 l             2 q             4 x
    142 a            21 f            48 m            90 r            36 y
      5 A            16 g             2 M             1 R             3 z
     23 b             1 G           117 n           147 s
      1 B            51 h             1 N           119 t

The sed command will separate the file right into a single character chunks. That output is then sorted by the type command. After that, every group of the identical character is counted by the uniq -c command and the column command is used to create the multi-column output. Because the outcomes are primarily based on the file content material, no characters are listed apart from these within the file.

Discover that the output shows the record of characters within the chosen file in alphanumeric order due to the type command. The primary two characters aren’t proven as a result of linefeeds and areas are solely recognizable in context.

If you wish to show the characters in frequency order as a substitute, all you might want to do is add a second type command utilizing the -g (normal numeric).

$ cat myfile | sed 's/(.)/n1/g' | type | uniq -c | type -g | column
      1 '             2 O             9 w            30 u           117 n
      1 B             2 q            16 g            36 y           119 t
      1 G             3 I            20 .            48 m           132 o
      1 N             3 z            21 f            50 d           142 a
      1 P             4 j            23 b            51 h           147 s
      1 R             4 x            23 v            58 c           163 e
      2 C             5 A            24              60 l           254
      2 E             5 okay            25 ,            90 r
      2 M             7 T            29 p           112 i

To reverse the itemizing to indicate essentially the most continuously used characters first, add an r (reverse) choice to that final type command.

$ cat myfile | sed 's/(.)/n1/g' | type | uniq -c | type -gr | column
    254              60 l            24               5 A             2 C
    163 e            58 c            23 v             4 x             1 R
    147 s            51 h            23 b             4 j             1 P
    142 a            50 d            21 f             3 z             1 N
    132 o            48 m            20 .             3 I             1 G
    119 t            36 y            16 g             2 q             1 B
    117 n            30 u             9 w             2 O             1 '
    112 i            29 p             7 T             2 M
     90 r            25 ,             5 okay             2 E

The character on the high of the record is, as I assume you guessed, the area character. The second most frequently used character within the file is an “e”. No shock there both. As well as, capital letters are listed final since they don’t seem to be continuously used.

Be aware that for those who do not need to distinguish between uppercase and lowercase letters you possibly can insert a tr (translate) command into the command string like this:

$ cat myfile | sed 's/(.)/n1/g' | tr '[:upper:]' '[:lower:]' | type | uniq -c | type -gr | column"
    254             115 i            36 y            21 f             3 z
    165 e            91 r            30 u            20 .             2 q
    147 s            60 l            30 p            17 g             1 '
    147 a            60 c            25 ,             9 w
    134 o            51 h            24 b             5 okay
    126 t            50 m            24               4 x
    118 n            50 d            23 v             4 j

Change the positions of the “higher” and “decrease” arguments to show the outcomes all in uppercase.

Counting character-by-character in a phrase or phrase

You may as well use a command much like these proven above to rely what number of instances every letter seems in a single phrase or phrase. Right here’s an instance:

$ echo "Hiya, World!" | sed 's/(.)/n1/g' | type | uniq -c | type -gr |  column
      3 l             1 r             1 d             1
      2 o             1 H             1 ,             1
      1 W             1 e             1 !

Utilizing an alias

Whereas the instructions proven above are intelligent, they’re not simple to recollect or kind. Creating an alias may help with this. When you resolve what type of output you favor, flip the command into an alias like this:

$ alias CountChars="sed 's/(.)/n1/g' | type | uniq -c | type -gr | column"

Save the alias in your .bashrc file to be able to use it as wanted. Then use it in instructions like these:

$ cat myfile | CountChars
    254              60 l            24               5 A             2 C
    163 e            58 c            23 v             4 x             1 R
    147 s            51 h            23 b             4 j             1 P
    142 a            50 d            21 f             3 z             1 N
    132 o            48 m            20 .             3 I             1 G
    119 t            36 y            16 g             2 q             1 B
    117 n            30 u             9 w             2 O             1 '
    112 i            29 p             7 T             2 M
     90 r            25 ,             5 okay             2 E
$ echo "Hiya, World!" | CountChars
      3 l             1 r             1 d             1
      2 o             1 H             1 ,             1
      1 W             1 e             1 !

Utilizing a script

If you wish to see solely alphabetic characters, you should use a script just like the one proven under. It first modifications all of the letters to lowercase earlier than it runs by the alphabet, makes use of awk to rely the variety of instances every letter seems after which shows the counts provided that they’re bigger than 1. It solely works with no matter string is offered as an argument.

#!/bin/bash

# make argument all lowercase
string=$(echo $1 | tr '[:upper:]' '[:lower:]')

for char in {a..z}
do
  rely=`awk -F"${char}" '{print NF-1}' <<< "${string}"`
  if [ $count  != 0 ]; then
    echo -n $char:
    echo $rely
  fi
accomplished

Run it like this:

$ CountByChar "Hiya, World!"
d:1
e:1
h:1
l:3
o:2
r:1
w:1

Be aware that characters will at all times be listed in alphabetical order. You may pipe the output to the column command if you would like fewer traces of output.

$ CountByChar "Hiya, World!" | column
d:1     e:1     h:1     l:3     o:2     r:1     w:1

Wrap-up

Whether or not you’re searching for character counts in information or phrases, there are some useful choices out there. Turning the complicated ones into aliases might be one of the simplest ways to make the duty simple.

Previous articleGet to know Google’s Coding Competitions

Next articleFinest Purchase knocks as much as $400 off MacBook Professional, $150 off MacBook Air in early Black Friday offers

Counting particular person characters on Linux

Counting what number of instances every character seems in a file

Counting character-by-character in a phrase or phrase

Utilizing an alias

Utilizing a script

Wrap-up

Finest Community Latency Testing Instruments

Community observability: What it means to distributors and to you

What’s a Software program Consulting Agency and How Does it Work?

LEAVE A REPLY Cancel reply

Most Popular

The way to mine Dogecoin 2022 — earn free DOGE together with your laptop computer

Partition Array into 3 Subarrays to maximise the product of sums

Energy Integrations Debuts Help for Planar Magnetics in PI Knowledgeable Energy Provide Design Device

Slushygate, sextortion, and nano-targeting • Graham Cluley

Recent Comments

ABOUT US

POPULAR POSTS

The way to mine Dogecoin 2022 — earn free DOGE together with your laptop computer

Partition Array into 3 Subarrays to maximise the product of sums

Energy Integrations Debuts Help for Planar Magnetics in PI Knowledgeable Energy Provide Design Device

POPULAR CATEGORY