International Journal of Advanced Computer Research ISSN (Print): 2249-7277    ISSN (Online): 2277-7970 Volume-8 Issue-35 March-2018

  1. Citations
Paper Title : Spatial distribution analysis of unigrams and bigrams of hindi literary document
Author Name : Sifatullah Siddiqi
Abstract :

In this paper the spatial distribution analysis of a very famous Hindi literary document “Godan” authored by the great novelist Munshi Premchand has been presented. We have attempted to perform a thorough and comprehensive spatial distribution analysis of different kinds of words (unigram) and word pairs (bigrams) in the document. Single words have been divided into stop words, keywords and non-keywords while word pairs have been divided into stop-phrases, key phrases and non-key phrases. Our proposition is that the nature of the spatial distribution pattern of different types of unigrams and bigrams in the text is different and there is a significant similarity between spatial distribution patterns for the unigrams and bigrams of same type. In this paper, we have selected a lot of example words from the text and generated their spatial distribution graphs to prove our assertion.

Keywords :

Stop words, Keywords, Key phrase, Spatial distribution analysis, Hindi.

Cite this article :

Sifatullah Siddiqi.Spatial distribution analysis of unigrams and bigrams of hindi literary document. International Journal of Advanced Computer Research. 2018;8(35):97-109. DOI:10.19101/IJACR.2018.835003

References :

[1]Luhn HP. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development. 1957; 1(4):309-17.

[2]Ortuno M, Carpena P, Bernaola-Galván P, Munoz E, Somoza AM. Keyword detection in natural languages and DNA. Europhysics Letters. 2002; 57(5):759-64.

[3]Herrera JP, Pury PA. Statistical keyword detection in literary corpora. The European Physical Journal B. 2008; 63(1):135-46.

[4]Carpena P, Bernaola-Galván P, Hackenberg M, Coronado AV, Oliver JL. Level statistics of words: finding keywords in literary texts and symbolic sequences. Physical Review E. 2009; 79(3):1-4.

[5]Mehri A, Darooneh AH. Keyword extraction by nonextensivity measure. Physical Review E. 2011; 83(5):1-6.

[6]Carretero-Campos C, Bernaola-Galván P, Coronado AV, Carpena P. Improving statistical keyword detection in short texts: entropic and clustering approaches. Physica A: Statistical Mechanics and its Applications. 2013; 392(6):1481-92.

[7]Yang Z, Lei J, Fan K, Lai Y. Keyword extraction by entropy difference between the intrinsic and extrinsic mode. Physica A: Statistical Mechanics and its Applications. 2013; 392(19):4523-31.

[8]Siddiqi S, Sharan A. Keyword extraction from single documents using mean word intermediate distance. International Journal of Advanced Computer Research. 2016; 6(25):138-45.

[9]Sharan A, Siddiqi S, Singh J. Keyword extraction from Hindi documents using statistical approach. In intelligent computing, communication and devices 2015 (pp. 507-13). Springer, New Delhi.

[10]Siddiqi S, Sharan A. Keyword and keyphrase extraction from single Hindi document using statistical approach. In international conference on signal processing and integrated networks 2015 (pp. 713-8). IEEE.