Skip to main content

If you don't have an IBM ID and password, register here.

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

Tip: Remove duplicate lines with uniq

Get to know your textutils

Jacek Artymiak (jacek@artymiak.com), Freelance author and consultant
Jacek Artymiak works as a freelance consultant, developer, and writer. Since 1991 he's been developing software for many commercial and free variants of UNIX and BSD operating systems (AIX, HP-UX, IRIX, Solaris, Linux, FreeBSD, NetBSD, OpenBSD, and others), as well as MS-DOS, Microsoft Windows, Mac OS, and Mac OS X. Jacek specializes in business and financial application development, Web design, network security, computer graphics, animation, and multimedia. He's a prolific writer on technology subjects and the coauthor of "Install, Configure, and Customize Slackware Linux" (Prima Tech, 2000) and "StarOffice for Linux Bible" (IDG Books, 2000). Many of Jacek's software projects can be found at SourceForge. You can learn more about him at his personal Web site and contact him at jacek@artymiak.com.

Summary:  Duplicate lines don't often cause a problem, but sometimes they really do. And when they do, there's little need to spend an afternoon working up a filter for them, when the uniq command is at your very fingertips. Find out how it can save you time and headaches.

Date:  03 Apr 2003
Level:  Introductory

Comments:  

After sorting, you may discover that some lines are duplicated. Sometimes this duplicate information is not needed and can be removed to save space on disk. The lines of text do not have to be sorted, but you should remember that uniq compares lines as it reads them and will remove only two or more consecutive lines. The following example illustrates how it works in practice:


Listing 1. Removing duplicate lines with uniq
$ cat happybirthday.txt

Happy Birthday to You!

Happy Birthday to You!

Happy Birthday Dear Tux!

Happy Birthday to You!


$ sort happybirthday.txt 

Happy Birthday Dear Tux!

Happy Birthday to You!

Happy Birthday to You!

Happy Birthday to You!


$ sort happybirthday.txt | uniq

Happy Birthday Dear Tux!

Happy Birthday to You!

Be warned that it is a bad idea to use uniq or any other tool to remove duplicate lines from files containing financial or other important data. In such cases, a duplicate line almost always means another transaction for the same amount, and removing it would cause a lot of trouble for the accounting department. Do not do it!

More on uniq

This series offers an introduction to text utilities, which complements information found in the man and info pages. You'll learn even more if you open a new terminal window and type either man uniq or info uniq -- or you can open a new browser window and view the uniq man page at gnu.org.

What if you wanted to make your job a little easier and display, say, only the unique or duplicate lines? You can do this with the -u (unique) and -d (duplicate) options, like this:


Listing 2. Using the -u and -d options
$ sort happybirthday.txt | uniq -u

Happy Birthday Dear Tux!


$ sort happybirthday.txt | uniq -d

Happy Birthday to You!

You can also get some statistics out of uniq with the -c option:


Listing 3. Using the -c option
$ sort happybirthday.txt | uniq -uc

      1 Happy Birthday Dear Tux!


$ sort happybirthday.txt | uniq -dc

      3 Happy Birthday to You!

If uniq compared just full lines it would still be useful, but that's not the end of this command's functionality. Especially handy is its ability to skip the given number of fields, using the -f option followed by the number of fields to skip. This is very useful when you are looking at system logs. Quite often, some entries are duplicated many times, which makes it difficult to look at logs. Using plain uniq won't work, because every entry begins with a different timestamp. But if you tell it to skip all time fields, suddenly your logs will become more manageable. Try uniq -f 3 /var/log/messages and see for yourself.

There is also another option, -s, which works just like -f but skips the given number of characters. You can use -f and -s together. uniq skips fields first, then characters. And what if you wanted to use only a preset number of characters for comparison? Try the -w option.

Questions or comments? I'd love to hear from you -- send mail to jacek@artymiak.com.

Next time, we'll take a look at nl. See you then!


Resources

About the author

Jacek Artymiak works as a freelance consultant, developer, and writer. Since 1991 he's been developing software for many commercial and free variants of UNIX and BSD operating systems (AIX, HP-UX, IRIX, Solaris, Linux, FreeBSD, NetBSD, OpenBSD, and others), as well as MS-DOS, Microsoft Windows, Mac OS, and Mac OS X. Jacek specializes in business and financial application development, Web design, network security, computer graphics, animation, and multimedia. He's a prolific writer on technology subjects and the coauthor of "Install, Configure, and Customize Slackware Linux" (Prima Tech, 2000) and "StarOffice for Linux Bible" (IDG Books, 2000). Many of Jacek's software projects can be found at SourceForge. You can learn more about him at his personal Web site and contact him at jacek@artymiak.com.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in

If you don't have an IBM ID and password, register here.


Forgot your IBM ID?


Forgot your password?
Change your password


By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)


By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

Help: Update or add to My dW interests

What's this?

This little timesaver lets you update your My developerWorks profile with just one click! The general subject of this content (AIX and UNIX, Information Management, Lotus, Rational, Tivoli, WebSphere, Java, Linux, Open source, SOA and Web services, Web development, or XML) will be added to the interests section of your profile, if it's not there already. You only need to be logged in to My developerWorks.

And what's the point of adding your interests to your profile? That's how you find other users with the same interests as yours, and see what they're reading and contributing to the community. Your interests also help us recommend relevant developerWorks content to you.

View your My developerWorks profile

Return from help

Help: Remove from My dW interests

What's this?

Removing this interest does not alter your profile, but rather removes this piece of content from a list of all content for which you've indicated interest. In a future enhancement to My developerWorks, you'll be able to see a record of that content.

View your My developerWorks profile

Return from help

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux, Open source
ArticleID=11302
ArticleTitle=Tip: Remove duplicate lines with uniq
publish-date=04032003
author1-email=jacek@artymiak.com
author1-email-cc=

Tags

Help
Use the search field to find all types of content in My developerWorks with that tag.

Use the slider bar to see more or fewer tags.

For articles in technology zones (such as Java technology, Linux, Open source, XML), Popular tags shows the top tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), Popular tags shows the top tags for just that product zone.

For articles in technology zones (such as Java technology, Linux, Open source, XML), My tags shows your tags for all technology zones. For articles in product zones (such as Info Mgmt, Rational, WebSphere), My tags shows your tags for just that product zone.

Use the search field to find all types of content in My developerWorks with that tag. Popular tags shows the top tags for this particular content zone (for example, Java technology, Linux, WebSphere). My tags shows your tags for this particular content zone (for example, Java technology, Linux, WebSphere).