Urdu Stop Words Banner

Let’s count them words!

Let’s count them words!

Every word has a story and all stories are made of words. This story is about counting words.

Can we count how many unique words there are in Urdu? 

Well, RekhtaDictionary has more than 3.5 lakh of them. That’s an overwhelming number. These words include compound words, idioms and phrases. Also a lot of them are grammatical or dialect-based variations of one another.

So, maybe we should ask a more practical question: how many unique words have been used in Urdu Poetry? In all the ghazals that we have in rekhta, there are roughly 65 thousand unique words used by all the poets. Some of them are used quite frequently and some are seldom used.

We asked the computer to take all the ghazals and count how many times each word has been used in this corpus. It only took a few seconds to do that. Then we asked which is the most commonly occurring word in Urdu Poetry, hoping to gain some groundbreaking insight with the answer. The computer replied and the answer was…totally anti-climatic! Can you guess what it is? Well, it is ‘hai / है /ہے’. Not quite profound, eh? We were not exactly interested in words like hai. So let us ignore them. 

Such words which are used so often in a language, are typically filtered out in most Natural Language Processing contexts, and also have a name: ‘stop-words’. So we set out to find a list of stop words for Urdu on the internet, but could not find a decent one. No worries, we can make our own list of stop words. Sorting the list of words by occurrence, and manually picking out these stop words, we made this list, which is publicly published just in case any Urdu NLP researcher finds it useful. It mainly consists of common auxiliary verbs, pronouns and prepositions. After this tangent, let us go back to our original question.

Which word, except for these stop-words, is the most commonly used word in Urdu Poetry? Try to guess once again please.

The answer: dil / दिल /دِل

Makes sense, right? Now we are getting somewhere! Which is the second most used word? It is Gam / ग़म /غم . Right on! Confirms the stereotype! Looks like we are onto something. 

Here are the 50 most common words ignoring stop words and verbs in ghazals: dil, Gam, aa.nkh, nazar, baat, zindagii, ishq, duniyaa, mohabbat, yaad, raat, din, KHudaa, KHvaab, shab, shahr, dard, rang, log, vaqt, dar, gul, husn, naam, safar, havaa, haath, raah, vafaa, shaam, jaan, yaar, KHaak, umr, kaam, phuul, dam, haal, KHabar, duur, roz, may, KHayaal, manzil, dariyaa, shauq, suurat, lab, bahaar, zaKHm.

Different poets might use slightly different distribution of words and word clouds are typically used to convey information about the distribution of words in some specific context. The more a word is used, the bigger it is on the cloud. So, let us make these word clouds for some prominent poets and see if they convey some insight into their commonalities and distinctiveness. 

In fact, this could be a fun game. Below are word clouds for Ghalib, Iqbal, Firaq, Faiz, Faraz, Ibn-e-Insha, Jaun Eliya and Farhat Ehsas but not in that order. The game is to guess which one belongs to which poet. The pen-names have been mostly removed from the word clouds. See if you can match the word-cloud to the poet. There are enough hints within the word clouds. Answers are at the end of this blog.

We can see that indeed these word clouds convey information about most employed themes by these poets. While dil is quite prominent in all of them, for Farhat Ehsas jism and badan are even more important. KHudii is a tell-tale Iqbal trademark, while jii and chaa.nd give Insha away. Chashm instead of aankh is prominent with Ghalib. Shahr seems to be increasing in usage in Ghazals with time as is clear from the case of Faraz, Jaun and Farhat.

In conclusion, exploring the words used in Urdu poetry not only unveils common themes but also deepens our appreciation for the language’s richness. The patterns we discovered are just the beginning. We invite you to delve further into the world of words and share your findings with us. Happy word hunting!



1. Ibn-e-Insha 2. Firaq 3. Faiz 4. Iqbal 5. Jaun Eliya 6. Ghalib 7. Farhat Ehsas 8. Faraz