Paper Title | : | Spatial distribution analysis of unigrams and bigrams of hindi literary document |
Author Name | : | Sifatullah Siddiqi |
Abstract | : | In this paper the spatial distribution analysis of a very famous Hindi literary document “Godan” authored by the great novelist Munshi Premchand has been presented. We have attempted to perform a thorough and comprehensive spatial distribution analysis of different kinds of words (unigram) and word pairs (bigrams) in the document. Single words have been divided into stop words, keywords and non-keywords while word pairs have been divided into stop-phrases, key phrases and non-key phrases. Our proposition is that the nature of the spatial distribution pattern of different types of unigrams and bigrams in the text is different and there is a significant similarity between spatial distribution patterns for the unigrams and bigrams of same type. In this paper, we have selected a lot of example words from the text and generated their spatial distribution graphs to prove our assertion. |
Keywords | : | Stop words, Keywords, Key phrase, Spatial distribution analysis, Hindi. |
Cite this article | : | Sifatullah Siddiqi.Spatial distribution analysis of unigrams and bigrams of hindi literary document. International Journal of Advanced Computer Research. 2018;8(35):97-109. DOI:10.19101/IJACR.2018.835003 |
References | : |
[1]Luhn HP. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development. 1957; 1(4):309-17. [2]Ortuno M, Carpena P, Bernaola-Galván P, Munoz E, Somoza AM. Keyword detection in natural languages and DNA. Europhysics Letters. 2002; 57(5):759-64. [3]Herrera JP, Pury PA. Statistical keyword detection in literary corpora. The European Physical Journal B. 2008; 63(1):135-46. [4]Carpena P, Bernaola-Galván P, Hackenberg M, Coronado AV, Oliver JL. Level statistics of words: finding keywords in literary texts and symbolic sequences. Physical Review E. 2009; 79(3):1-4. [5]Mehri A, Darooneh AH. Keyword extraction by nonextensivity measure. Physical Review E. 2011; 83(5):1-6. [6]Carretero-Campos C, Bernaola-Galván P, Coronado AV, Carpena P. Improving statistical keyword detection in short texts: entropic and clustering approaches. Physica A: Statistical Mechanics and its Applications. 2013; 392(6):1481-92. [7]Yang Z, Lei J, Fan K, Lai Y. Keyword extraction by entropy difference between the intrinsic and extrinsic mode. Physica A: Statistical Mechanics and its Applications. 2013; 392(19):4523-31. [8]Siddiqi S, Sharan A. Keyword extraction from single documents using mean word intermediate distance. International Journal of Advanced Computer Research. 2016; 6(25):138-45. [9]Sharan A, Siddiqi S, Singh J. Keyword extraction from Hindi documents using statistical approach. In intelligent computing, communication and devices 2015 (pp. 507-13). Springer, New Delhi. [10]Siddiqi S, Sharan A. Keyword and keyphrase extraction from single Hindi document using statistical approach. In international conference on signal processing and integrated networks 2015 (pp. 713-8). IEEE. |