Trimmer Algorithm
Natural Language Processing for the Web

1Natural Language Processing for the 8context factors, i.e. input, purpose and
Web. Prof. Kathleen McKeown 722 CEPSR, output factors, that bear on summarising
939-7118 Office Hours: Wed, 1-2; Mon 3-4 and its evaluation. (p. 1) It is important
TA: Fadi Biadsy 702 CEPSR, 939-7111 Office to recognize the role of context factors
Hours: Thurs 6-8. because the idea of a general-purpose
2Logistics. Remaining classes CS summary is manifestly an ignis fatuus. (p.
Conference Room Except April 3rd, back in 5) Similarly, the notion of a basic
223 Mudd Invited speakers: 7th Floor summary, i.e., one reflective of the
Interschool Lab CS account: apply for one source, makes hidden fact assumptions, for
now example that the subject knowledge of the
http://www.cs.columbia.edu/crf/accounts output’s readers will be on a par with
Presentations, Discussants Need two that of the readers for whom the source
presenters for next week If you haven’t was intended. (p. 5) I believe that the
already signed up, sign up on sheet going right direction to follow should start
around. with intermediate source processing, as
3Today. Overview Single doc exemplified by sentence parsing to logical
summarization systems: Trimmer (Zajic et form, with local anaphor resolutions.
al), Kathy Cut and Paste (Jing and 9Questions (from Sparck Jones). Does
McKeown), Sigfried Gold Statistical subject matter of the source influence
Sentence Compression (Knight and Marcu), summary style (e.g, chemical abstracts vs.
Kathy Tools Parsers, POS taggers, Barry sports reports)? Should we take the reader
Schiffman Evaluation Pyramids (Nenkova and into account and how? Is the state of the
Passonneau), Joshua Nankin Rouge (Lin and art sufficiently mature to allow
Hovy), Kathy. summarization from intermediate
4Sentence extraction. Sparck Jones: representations and still allow robust
`what you see is what you get’, some of processing of domain independent material?
what is on view in the source text is 10For the next two classes. Consider the
transferred to constitute the summary. papers we read in light of Sparck Jones’
5Background. Sentence extraction the remarks on the influence of context: Input
main approach Some more sophisticated Source form, subject type, unit Purpose
features for extraction Lexical chains, Situation, audience, use Output Material,
anaphoric reference Machine learning model format, style.
for learning an extraction summarizer: 11Trimmer Algorithm.
Kupiec, SIGIR 95. 12Headline Ambiguity. Iraqi Head Seeks
6Today’s systems. How can we edit the Arms Juvenile Court to Try Shooting
selected text? Defendant Teacher Strikes Idle Kids Kids
7Karen Sparck Jones Automatic Make Nutritious Snacks British Left
Summarizing: Factors and Directions. Waffles on Falkland Islands Red Tape Holds
8Sparck Jones claims. Need more power Up New Bridges Bush Wins on Budget, but
than text extraction and more flexibility More Lies Ahead Hospitals are Sued by 7
than fact extraction (p. 4) In order to Foot Doctors Ban on nude dancing on
develop effective procedures it is Governor’s desk Local high school dropouts
necessary to identify and respond to the cut in half.
