This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. (June 2012) (Learn how and when to remove this template message)
Text simplification is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains the same. Text simplification is an important area of research, because natural human languages ordinarily contain large vocabularies and complex compound constructions that are not easily processed through automation. In terms of reducing language diversity, semantic compression can be employed to limit and simplify a set of words used in given texts.
Text Simplification is illustrated with an example from Siddharthan (2006). The first sentence contains two relative clauses and one conjoined verb phrase. A text simplification system aims to simplify the first sentence to the second sentence.
- Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents, which precedes the full purchasing agents report that is due out today and gives an indication of what the full report might hold.
- Also contributing to the firmness in copper, the analyst noted, was a report by Chicago purchasing agents. The Chicago report precedes the full purchasing agents report. The Chicago report gives an indication of what the full report might hold. The full report is due out today.
One approach to text simplification is lexical simplification via lexical substitution, a two-step process consisting of identifying complex words and replacing them with simpler synonyms. A key challenge here is identifying complex words, which is performed by a machine learning classifier trained on labelled data. An improvement over classical methods of applying binary labels to words as simple or complex is to ask labellers to sort words in order of complexity; this results in higher consistency of resultant labels.
- Siddharthan, Advaith (28 March 2006). "Syntactic Simplification and Text Cohesion". Research on Language and Computation. 4 (1): 77–109. doi:10.1007/s11168-006-9011-1.
- Gooding, Sian; Kochmar, Ekaterina; Sarkar, Advait; Blackwell, Alan (August 2019). "Comparative judgments are more consistent than binary classification for labelling word complexity". Proceedings of the 13th Linguistic Annotation Workshop: 208–214. doi:10.18653/v1/W19-4024. Retrieved 22 November 2019.
- Wei Xu, Chris Callison-Burch and Courtney Napoles. "Problems in Current Text Simplification Research". In Transactions of the Association for Computational Linguistics (TACL), Volume 3, 2015, Pages 283–297.
- Advaith Siddharthan. "Syntactic Simplification and Text Cohesion". In Research on Language and Computation, Volume 4, Issue 1, Jun 2006, Pages 77–109, Springer Science, the Netherlands.
- Siddhartha Jonnalagadda, Luis Tari, Joerg Hakenberg, Chitta Baral and Graciela Gonzalez. Towards Effective Sentence Simplification for Automatic Processing of Biomedical Text. In Proc. of the NAACL-HLT 2009, Boulder, USA, June.