Tuesday, June 14, 2022
HomeNetworkingEradicating duplicate characters from a string on Linux with awk

Eradicating duplicate characters from a string on Linux with awk


The awk command could make it straightforward to take away duplicate characters from a string even when these characters aren’t sequential, particularly when the method is changed into a script.

First, the awk command that we’ll be utilizing begins by working by means of every letter within the string. In a extra widespread command, you may see awk doing one thing like this:

$ echo one:two:three | awk ‘BEGIN {FS =":"} ; { print $2 }’
two
 

The FS portion of that command specifies the sphere separator—the character that’s used to separate the fields within the string in order that they are often processed individually.

What our script does, nevertheless, is use a area separator of “” (i.e., no character). This tells awk that there aren’t any area separators. In different phrases, each character is handled as whether it is itself a area. Right here’s are a pair examples:

$ echo one:two:three | awk ‘BEGIN { FS ="" } ; { print $2 }’
n
$ echo one:two:three | awk ‘BEGIN { FS ="" } ; { print $4 }’
:

Word that the instructions above find yourself displaying the second and fourth characters within the string, not the second and fourth “fields” and that no distinction is made between blanks, letters and numerous punctuation characters.

A bash script that makes use of awk to take away duplicate characters may appear to be this:

#!bin/bash

echo -n “Enter string: “
learn string

awk -v FS="" ‘{
  for(i=1;i<=NF;i++)
    str=(++a[$i]==1?str $i:str)
}
END {print str}’ <<< $string

That script prompts for a string after which makes use of awk to run by means of it one character at a time. It provides every successive character to the string (str) provided that that character isn’t already included. The characters are in any other case left of their authentic positions, with no sorting or additional processing. Right here’s an instance of working it:

$ ./rmdups
Enter string: Let’s go fly a kite!
Let’s goflyaki!

Discover that every character seems solely as soon as within the “Let’s goflyaki!” outcomes. The ultimate results of the method is displayed within the print assertion within the END portion of the awk command.

If you wish to see how the script works by viewing the string of characters rising as characters are added, you should use this model of the script as a substitute:

#!/bin/bash

echo -n “Enter string: “
learn characters

awk -v FS="" ‘{
  for(i=1;i<=NF;i++)
  {
    str=(++a[$i]==1?str $i:str)
    print str	                 # <== watch it develop
   }
}

END {print str}’ <<< $string

Operating the script with the additional print command, you’ll see output like this:

$ ./rmdups2
Enter string: Let’s go fly a kite!
L
Le
Let
Let’
Let’s
Let’s
Let’s g
Let’s go
Let’s go
Let’s gof
Let’s gofl
Let’s gofly
Let’s gofly
Let’s goflya
Let’s goflya
Let’s goflyak
Let’s goflyaki
Let’s goflyaki
Let’s goflyaki
Let’s goflyaki!
Let’s goflyaki!

Discover that the string grows solely when the present character isn’t already included within the string.

You can additionally implement the script merely as an awk script like this:

awk -v FS="" ‘{
  for(i=1;i<=NF;i++)
    str=(++a[$i]==1?str $i:str)
}
END {print str}’

You can then run the awk script like this:

$ echo “Let’s go fly a kite!” | rmdups.awk
Let’s goflyaki!

Wrap-Up

Each time processing duplicated characters greater than as soon as can be a severe waste of processing energy, an awk command like that proven on this put up can take away them fairly simply.

Be a part of the Community World communities on Fb and LinkedIn to touch upon subjects which can be high of thoughts.

Copyright © 2022 IDG Communications, Inc.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments