Без темы
<<  Selma Lagerl?f i Landskrona f?rfattaren, hennes verk och hennes tid Seven wonders of my hometown  >>
Sentiment Classification using Word Sub-Sequences and Dependency
Sentiment Classification using Word Sub-Sequences and Dependency
Table of Contents
Table of Contents
Table of Contents
Table of Contents
Background
Background
Document sentiment classification
Document sentiment classification
Two steps for the classification
Two steps for the classification
Early Studies
Early Studies
Issue
Issue
Objective
Objective
Table of Contents
Table of Contents
Overview
Overview
Word Sub-Sequence
Word Sub-Sequence
Dependency Sub-Tree
Dependency Sub-Tree
Frequent Sub-Pattern
Frequent Sub-Pattern
Table of Contents
Table of Contents
Movie review dataset
Movie review dataset
Features
Features
Classifiers and Tests (1/2)
Classifiers and Tests (1/2)
Classifiers and Tests (2/2)
Classifiers and Tests (2/2)
Table of Contents
Table of Contents
Results (1/2)
Results (1/2)
Results (2/2)
Results (2/2)
Discussion
Discussion
Table of Contents
Table of Contents
Conclusion
Conclusion
Future Work (1/2)
Future Work (1/2)
Future Work (2/2)
Future Work (2/2)
Thank you
Thank you
Examples of Weighed Patterns
Examples of Weighed Patterns
A Word Sequence = A Clause
A Word Sequence = A Clause
References
References

Презентация на тему: «Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining May, 18th-20th, 2005 Shotaro Matsumoto, Hiroya Takamura and Manabu Okumura Tokyo Institute of Technol». Автор: . Файл: «Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining May, 18th-20th, 2005 Shotaro Matsumoto, Hiroya Takamura and Manabu Okumura Tokyo Institute of Technol.ppt». Размер zip-архива: 362 КБ.

Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining May, 18th-20th, 2005 Shotaro Matsumoto, Hiroya Takamura and Manabu Okumura Tokyo Institute of Technol

содержание презентации «Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining May, 18th-20th, 2005 Shotaro Matsumoto, Hiroya Takamura and Manabu Okumura Tokyo Institute of Technol.ppt»
СлайдТекст
1 Sentiment Classification using Word Sub-Sequences and Dependency

Sentiment Classification using Word Sub-Sequences and Dependency

Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining May, 18th-20th, 2005 Shotaro Matsumoto, Hiroya Takamura and Manabu Okumura Tokyo Institute of Technology

2 Table of Contents

Table of Contents

1. Motivation 2. Our Approach 3. Experiments 4. Result and Discussion 5. Conclusion and Future Work

3 Table of Contents

Table of Contents

Motivation Background Document Sentiment Classification Early Studies Issue Objective 2. Our Approach 3. Experiments 4. Results and Discussions 5. Conclusion and Future Work

4 Background

Background

Online grass-roots reviews are rapidly increasing Contain useful reputation There are so many such documents that we cannot read them all Mining reputation from such documents is important

5 Document sentiment classification

Document sentiment classification

a task to classify an overall document according to the positive or negative polarity of its opinion (desirable or undesirable)

6 Two steps for the classification

Two steps for the classification

Feature extraction convert a document to a feature vector, which preserves features of the original document 2. Binary classification Classify the feature vector to positive or negative sentiment polarity

7 Early Studies

Early Studies

[Pang 02] Features: unigrams in the document Classifier: Na?ve Bayes, ME Model, Support Vector Machines (SVMs) Showed SVMs is superior to others [Pang 04] Features: unigrams obtained from the summary Classifier: SVMs [Mullen 04] Features: unigrams, unigrams of lemmatized words, prior knowledge from Internet and thesaurus Classifier: SVMs Get better results than [Pang 02]

8 Issue

Issue

Features in early studies A document is represented as a bag-of-words, where a text is regarded as a set of words ? Word order and syntactic relations between words in a sentence, intuitively important for the classification, are discarded

9 Objective

Objective

We propose a method for extracting word order and syntactic relations as features. We use frequent sub-patterns in sentences as these features.

10 Table of Contents

Table of Contents

1. Motivation 2. Our Approach Overview Word Sub-Sequence Dependency Sub-Tree Frequent Sub-pattern 3. Experiments 4. Result and Discussion 5. Conclusion and Future Work

11 Overview

Overview

We use a word sequence and a dependency tree as structured representations of a sentence We extract frequent sub-patterns from sentences as features for the classification

12 Word Sub-Sequence

Word Sub-Sequence

A word sequence S Just a sequence of words which represents a sentence preserves word order in a sentence A word sub-sequence S’ of a word sequence S Obtained by removing zero or more words from the original sequence Preserve the word order of the original sentence

13 Dependency Sub-Tree

Dependency Sub-Tree

A dependency tree D Expresses dependency between words in the sentence by child-parent relationships of nodes Preserves syntactic relations between words in the sentence A dependency sub-tree D’ of a dependency tree D Obtained by removing zero or more nodes from the original tree Preserves syntactic relations between words in the original sentence

14 Frequent Sub-Pattern

Frequent Sub-Pattern

The number of all sub-patterns (subsequences or subtrees) is too large ? Use only frequent sub-patterns Definition A sentence contains a pattern if and only if the pattern is a subsequence or a subtree of the sentence A support of a pattern is the number of sentences containing the pattern in a dataset If a support of a pattern is a given support threshold or more, the pattern is frequent. (In this experiment, we fixed support threshold to 10.) As implementations for mining frequent sub-patterns, we use Kudo’s Prefixspan and FREQT.

15 Table of Contents

Table of Contents

1. Motivation 2. Our Approach 3. Experiments Movie review dataset Features Classifiers and Tests 4. Result and Discussion 5. Conclusion and Future Work

16 Movie review dataset

Movie review dataset

Dataset 1: used in [Pang 02], [Mullen 04] 690 positive reviews and 690 negative reviews Written in English 3-fold cross-validation Dataset 2: used in [Pang 04] 1000 positives and 1000 negatives Written in English 10-fold cross-validation

17 Features

Features

We employ the following features and their combinations for the classification Bag-of-words features Unigram (ex: “good”, “film”): uni Unigram patterns appear in at least 2 distinct sentences Bigram (ex: “very good”, “film is”): bi Bigram patterns appear in at least 2 distinct sentences Frequent sub-pattern features Word Sub-Sequence: seq Dependency Sub-tree: dep Features of lemmatized words As in the extraction of the features uni, bi, seq, dep, also extract unil, bil, seql, depl

18 Classifiers and Tests (1/2)

Classifiers and Tests (1/2)

Classifier Method: SVMs, binary classifier based on supervised learning Kernel function: linear kernel Performance closely depends on its learning parameter C (called soft margin parameter) ? We carry out three kind of experiments

19 Classifiers and Tests (2/2)

Classifiers and Tests (2/2)

Test 1: fix C as 1 The result is used for comparison to the early studies Test 2: best accuracy with C ? {e-2.0, e-1.5, …, e2.0} Observe potential performance of features Use the result for finding the best effective combination of bag-of-words features Test 3: predict a proper value of C from training data Observe practical performance of features

20 Table of Contents

Table of Contents

1. Motivation 2. Our Approach 3. Experiments 4. Results and Discussion Results Discussion 5. Conclusion and Future Work

21 Results (1/2)

Results (1/2)

Results for dataset 1 vs Pang 82.9% ? 87.3% (error reduction: 26%) vs Mullen 84.6% ? 87.3% (error reduction: 18%)

22 Results (2/2)

Results (2/2)

Results for dataset 2 vs Pang 87.1% ? 92.9% (error reduction: 45%)

23 Discussion

Discussion

From the results of the test1, our method proved to be effective Accuracy by features: bow + dep ? bow + dep + seq (93%) >> bow + seq (89%) > bow (87%) Lemmatized features are not always more effective than the original ones

24 Table of Contents

Table of Contents

1. Motivation 2. Our Approach 3. Experiments 4. Results and Discussion 5. Conclusion and Future Work Conclusion Future Work

25 Conclusion

Conclusion

We proposed a method for incorporating word order and syntactic relations between words in a sentence into document sentiment classification by using frequent word sub-sequences and dependency sub-trees as features. Experimental results on movie review datasets show that our classifiers obtained the best results yet published using these datasets.

26 Future Work (1/2)

Future Work (1/2)

Negative/Interrogative Sentence affirmative sentence : This film is good. (1) Negative sentence: This film is not good. (2) Interrogative sentence: Is this film good? (3) All sub-patterns in sentence (1) are also contained in sentence (2). Similarly, there is a large overlap of patterns between (1) and (3). Distinguishing these sentence-types would solve these problems.

27 Future Work (2/2)

Future Work (2/2)

Incorporating discourse structures in a document Example (positive movie review) The scenario is simplistic. But I love this film. By a word “but”, we would know that “I love this film” is a more important sentence than “The scenario is simplistic” in the sense of sentiment classification.

28 Thank you

Thank you

29 Examples of Weighed Patterns

Examples of Weighed Patterns

Positive(+) weight shows positive sentiment polarity Negative(-) weight shows negative sentiment polarity The absolute value of each weight indicates how large the contribution of the feature is

30 A Word Sequence = A Clause

A Word Sequence = A Clause

Sentences are too long to be used for mining frequent sub-sequences Instead of sentences, we used clauses of sentences as word sequences As in the figure on the right, We split a sentence to a main clause and subordinate clauses with information of parse tree In addition, we removed stopwords. Conjunction, preposition, number, etc…

31 References

References

[Pang 02] [Pang 04] [Mullen 04]

«Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining May, 18th-20th, 2005 Shotaro Matsumoto, Hiroya Takamura and Manabu Okumura Tokyo Institute of Technol»
http://900igr.net/prezentacija/anglijskij-jazyk/sentiment-classification-using-word-sub-sequences-and-dependency-sub-trees-pacific-asia-knowledge-discovery-and-data-mining-may-18th-20th-2005-shotaro-matsumoto-hiroya-takamura-and-manabu-okumura-tokyo-institute-of-technol-79876.html
cсылка на страницу

Без темы

661 презентация
Урок

Английский язык

29 тем
Слайды
900igr.net > Презентации по английскому языку > Без темы > Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining May, 18th-20th, 2005 Shotaro Matsumoto, Hiroya Takamura and Manabu Okumura Tokyo Institute of Technol