Heuristic Initialization And Similarity Integration Based Model for Improving Extractive Multi-Document Summarization


Nasreen J. Kadhim,Dheyaa Abdulameer Mohammed,




Heuristic Initialization,integrations of similarity measures,Gisting Evaluation (ROUGE),optimization based model,


Currently, the prominence of automatic multi document summarization task belongs to the information rapid increasing on the Internet. Automatic document summarization technology is progressing and may offer a solution to the problem of information overload. Automatic text summarization system has the challenge of producing high quality summary. In this paper, the design of generic text summarization model based on sentence extraction has been redirected into more semantic measure reflecting the two significant objectives: content coverage and diversity when generating summaries from multiple documents as an explicit optimization model. The proposed two models have been then coupled and defined as single-objective optimization problem. Also, different integrations of similarity measures have been introduced and applied to the proposed model in addition to the single similarity measure that bases on using Cosine, Dice and π½π‘Žπ‘π‘π‘Žπ‘Ÿπ‘‘ similarity measures for measuring text similarity involving integrating double similarity measures and triple similarity measures. The proposed optimization model has been solved using Genetic Algorithm. Moreover, heuristic initialization has been proposed and injected into the adopted evolutionary algorithm to harness its strength. Document sets supplied by Document Understanding Conference 2002 (π·π‘ˆπΆ2002) have been used for the proposed system as an evaluation dataset and as an evaluation metric, Recall-Oriented Understudy for Gisting Evaluation (π‘…π‘‚π‘ˆπΊπΈ) toolkit has been used for performance evaluation of the proposed method and for performance comparison against other baseline systems. Comparison results for the proposed optimization based model against other baselines verified that the proposed system outperforms other baseline approaches in terms of π‘…π‘œπ‘’π‘”π‘’ βˆ’ 2 and π‘…π‘œπ‘’π‘”π‘’ βˆ’ 1 scores wherein it has recorded a score of 0.4542 for π‘…π‘œπ‘’π‘”π‘’ βˆ’ 1 and 0.1623 for π‘…π‘œπ‘’π‘”π‘’ βˆ’ 2.


I. Asad Abdi, Norisma Idris, Rasim M. Alguliev, Ramiz M. Aliguliyev. (2015),
Automatic summarization assessment through a combination of semantic and
syntactic information for intelligent educational systems.
II. Asad Abdi, Norisma Idris, Rasim M Alguliev, Ramiz M Aliguliyev. (2015),
Asad Abdi, Norisma Idris, Rasim M Alguliev, Ramiz M Aliguliyev
III. Anna Huang. (2008), Similarity Measures for Text Document Clustering.
IV. Amit Singhal. (2001), Modern Information Retrieval: A Brief Overview
V. Islam, A. and Inkpen, D. 2008. Semantic text similarity using corpus-based
word similarity and string similarity, ACM Transactions on Knowledge
Discovery from Data 2 (2) Article 10, 25 p.
VI. Pang-Ning; Steinbach, Michael; Kumar, Vipin (2005), Introduction to Data

VIII. Rasim M. Alguliev, Ramiz M. Aliguliyev, Chingiz A. Mehdiyev. (2011), An
Optimization Model and DPSO-EDA for Document Summarization
IX. Radev, D., Jing, H., Stys, M. and Tam, D. 2004. Centroid-based
summarization of multiple documents, Information Processing &
Management 40 (6) 919–938.
X. Rasim M Alguliev, Ramiz M Aliguliyev, Chingiz A Mehdiyev. (2011), An
optimization model and DPSO-EDA for document summarization.
XI. Rasim M Alguliev, Ramiz M Aliguliyev, Makrufa S Hajirahimova, Chingiz
A Mehdiyev. (2011), MCMR: maximum coverage and minimum redundant
text summarization model
XII. Rasim M Alguliev, Ramiz M Aliguliyev, Nijat R Isazade. (2013),
Formulation of document summarization as a 0-1 nonlinear programming
XIII. Rasim M Alguliev, Ramiz M Aliguliyev, Chingiz A Mehdiyev. (2013), An
optimization approach to automatic generic document summarization
XIV. Rasim M Alguliyev, Ramiz M Aliguliyev, Nijat R Isazade. (2015), An
unsupervised approach to generating generic summaries of documents
XV. Rasmita Rautray, Rakesh Chandra Balabantaray. (2017), Cat swarm
optimization based evolutionary framework for multi document
XVI. Rasim M Alguliyev, Ramiz M Aliguliyev, Nijat R Isazade, Asad Abdi,
XVII. Rada Mihalcea, Courtney Corley, Carlo Strapparava. (2006), Corpus-based
and Knowledge-based Measures of Text Semantic Similarity.
XVIII. saleh et. Al. (2015), A genetic based optimization model for extractive multi
dormant text summarization. Iraqi Journal of Science. 2015;56(2B):1489-98.

View Download