Recommend for a Reason: Unlocking the Power of Unsupervised Aspect-Sentiment Co-Extraction
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Recommend for a Reason: Unlocking the Power of Unsupervised
Aspect-Sentiment Co-Extraction
Zeyu Li1 , Wei Cheng2 , Reema Kshetramade1 , John Houser1 , Haifeng Chen2 , and Wei Wang1
1
Department of Computer Science, University of California, Los Angeles
2
NEC Labs America
{zyli,weiwang}@cs.ucla.edu
{reemakshe,johnhouser}@ucla.edu
{weicheng,haifeng}@nec-labs.com
Abstract ciated with it. Users may show strong attentions to
certain properties but indifference to others. The at-
Compliments and concerns in reviews are valu-
tended advantageous or disadvantageous properties
able for understanding users’ shopping inter-
ests and their opinions with respect to spe- can dominate the attitude of users and consequently,
decide their generosity in rating.
arXiv:2109.03821v1 [cs.IR] 7 Sep 2021
cific aspects of certain items. Existing review-
based recommenders favor large and complex Table 1 exemplifies the impact of the user atti-
language encoders that can only learn latent tude using three real reviews for a headset. Three
and uninterpretable text representations. They aspects are covered: microphone quality, comfort-
lack explicit user-attention and item-property ableness, and sound quality. The microphone qual-
modeling, which however could provide valu-
able information beyond the ability to rec-
ity is controversial. R2 and R3 criticize it but R1
ommend items. Therefore, we propose a praises it. The sole disagreement between R1 and
tightly coupled two-stage approach, including R2 is on microphone, which is the major concern of
an Aspect-Sentiment Pair Extractor (ASPE) R2, results in the divergence of ratings (5 stars vs.
and an Attention-Property-aware Rating Es- 3 stars). However, R3 neglects that disadvantage
timator (APRE). Unsupervised ASPE mines and grades highly (5 stars) for its superior comfort-
Aspect-Sentiment pairs (AS-pairs) and APRE ableness indicated by the metaphor of “pillow”.
predicts ratings using AS-pairs as concrete
aspect-level evidences. Extensive experiments Secondly, understanding user motivations in
on seven real-world Amazon Review Datasets granular item properties provides valuable infor-
demonstrate that ASPE can effectively extract mation beyond the ability to recommend items. It
AS-pairs which enable APRE to deliver supe- requires aspect-based NLP techniques to extract
rior accuracy over the leading baselines. explicit and definitive aspects. However, existing
aspect-based models mainly use latent or implicit
1 Introduction aspects (Chin et al., 2018) whose real semantics
Reviews and ratings are valuable assets for the rec- are unjustifiable. Similar to Latent Dirichlet Al-
ommender systems of e-commerce websites since location (LDA, Blei et al., 2003), the semantics
they immediately describe the users’ subjective of the derived aspects (topics) are mutually over-
feelings about the purchases. Learning user pref- lapped (Huang et al., 2020b). These models under-
erences from such feedback is straightforward and mine the resultant aspect distinctiveness and lead
efficacious. Previous research on review-based rec- to uninterpretable and sometimes counterintuitive
ommendation has been fruitful (Chin et al., 2018; results. The root of the problem is the lack of large
Chen et al., 2018; Bauman et al., 2017; Liu et al., review corpora with aspect and sentiment annota-
2019). Cutting-edge natural language processing tions. The existing ones are either too small or
(NLP) techniques are applied to extract the latent too domain-specific (Wang and Pan, 2018) to be
user sentiments, item properties, and the compli- applied to general use cases. Progress on senti-
cated interactions between the two components. ment term extraction (Dai and Song, 2019; Tian
However, existing approaches have disadvan- et al., 2020; Chen et al., 2020a) takes advantage
tages bearing room for improvement. Firstly, they of neural networks and linguistic knowledge and
dismiss the phenomenon that users may hold dif- partially makes it possible to use unsupervised term
ferent attentions toward various properties of the annotation to tackle the lack-of-huge-corpus issue.
merchandise. An item property is the combination In this paper, we seek to understand how re-
of an aspect of the item and the characteristic asso- views and ratings are affected by users’ perceptionReviews Microphone Comfort Sound
R1 [5 stars]: Comfortable. Very high quality sound. . . . Mic is good too. There is good comfortable high quality
an switch to mute your mic. . . I wear glasses and these are comfortable with my (satisfied) (praising)
glasses on. . . .
R2 [3 stars]: I love the comfort, sound, and style but the mic is complete junk! complete love love
junk (angry)
R3 [5 stars]: . . . But this one feels like a pillow, there’s nothing wrong with the pretty bad like a pillow nothing
audio and it does the job. . . . con is that the included microphone is pretty bad. (unsatisfied) (enjoyable) wrong
Table 1: Example reviews of a headset with three aspects, namely microphone quality, comfort level, and sound
quality, highlighted specifically. The extracted sentiments are on the right. R1 vs. R2: Different users react differ-
ently (microphone quality) to the same item due to distinct personal attentions and, consequently, give divergent
ratings. R1 vs. R3: A user can still rate highly of an item due to special attention on particular aspects (comfort
level) regardless of certain unsatisfactory or indifferent properties (microphone and sound qualities).
of item properties in a fine-grained way and dis- 2 Related Work
cuss how to utilize these findings transparently and
effectively in rating prediction. We propose a two- Our work is related to four lines of literature which
stage recommender with an unsupervised Aspect- are located in the overlap of ABSA and Recom-
Sentiment Pair Extractor (ASPE) and an Attention- mender Systems.
Property-aware Rating Estimator (APRE). ASPE 2.1 Aspect-based Sentiment Analysis
extracts (aspect, sentiment) pairs (AS-
pairs) from reviews. The pairs are fed into APRE as Aspect-based sentiment analysis (ABSA) (Xu et al.,
explicit user attention and item property carriers in- 2020; Wang et al., 2018) predicts sentiments toward
dicating both frequencies and sentiments of aspect aspects mentioned in the text. Natural language is
mentions. APRE encodes the text by a contextu- modeled by graphs in (Zhang et al., 2019; Wang
alized encoder and processes implicit text features et al., 2020) such as Pointwise Mutual Information
and the annotated AS-pairs by a dual-channel rating (PMI) graphs and dependency graphs. Phan and
regressor. ASPE and APRE jointly extract explicit Ogunbona (2020) and Tang et al. (2020) utilize
aspect-based attentions and properties and solve contextualized language encoding to capture the
the rating prediction with a great performance. context of aspect terms. Chen et al. (2020b) focuses
on the consistency of the emotion surrounding the
Aspect-level user attitude differs from user pref- aspects, and Du et al. (2020) equips pre-trained
erence. The user attitudes produced by the inter- BERT with domain-awareness of sentiments. Our
actions of user attentions and item properties are work is informed by these progress which utilize
sophisticated and granular sentiments and ratio- PMI, dependency tree, and BERT for syntax feature
nales for interpretation (see Section 4.4 and A.3.5). extraction and language encoding.
Preferences, on the contrary, are coarse sentiments
2.2 Aspect or Sentiment Terms Extraction
such as like, dislike, or neutral. Preference-based
models may infer that R1 and R3 are written by Aspect and sentiment terms extraction is a presup-
headset lovers because of the high ratings. Instead, position of ABSA. However, manually annotating
attitude-based methods further understand that it data for training, which requires the hard labor of
is the comfortableness that matters to R3 rather experts, is only feasible on small datasets in particu-
than the item being a headset. Aspect-level atti- lar domains such as Laptop and Restaurant (Pontiki
tude modeling is more accurate, informative, and et al., 2014, 2015) which are overused in ABSA.
personalized than preference modeling. Recently, RINANTE (Dai and Song, 2019) and
SDRN (Chen et al., 2020a) automatically extract
Note. Due to the page limits, some support- both terms using rule-guided data augmentation
ive materials, marked by “†”, are presented in and double-channel opinion-relation co-extraction,
the Supplementary Materials. We strongly rec- respectively. However, the supervised approaches
ommend readers check out these materials. The are too domain-specific to generalize to out-of-
source code of our work is available on GitHub domain or open-domain corpora. Conducting do-
at https://github.com/zyli93/ASPE-APRE. main adaptation from small labeled corpora to un-labeled open corpora only produces suboptimal guage knowledge- or lexicon-based methods. The
results (Wang and Pan, 2018). SKEP (Tian et al., framework is visualized in Figure 1.
2020) exploits an unsupervised PMI+seed strategy
to coarsely label sentimentally polarized tokens Lexicon Review Text
as sentiment terms, showing that the unsupervised
Neural Net PMI Dependency parsing
method is advantageous when annotated corpora STNN
AAAB83icbVBNS8NAEJ3Ur1q/qh69BIvgqSSi6LHoxVOp2C9oQtlsN+3SzSbsTsQS+je8eFDEq3/Gm//GbZuDVh8MPN6bYWZekAiu0XG+rMLK6tr6RnGztLW9s7tX3j9o6zhVlLVoLGLVDYhmgkvWQo6CdRPFSBQI1gnGNzO/88CU5rFs4iRhfkSGkoecEjSSd9/se8geMavXp/1yxak6c9h/iZuTCuRo9Muf3iCmacQkUkG07rlOgn5GFHIq2LTkpZolhI7JkPUMlSRi2s/mN0/tE6MM7DBWpiTac/XnREYirSdRYDojgiO97M3E/7xeiuGVn3GZpMgkXSwKU2FjbM8CsAdcMYpiYgihiptbbToiilA0MZVMCO7yy39J+6zqXlSdu/NK7TqPowhHcAyn4MIl1OAWGtACCgk8wQu8Wqn1bL1Z74vWgpXPHMIvWB/fDPWRsQ==
Dep.-based Rules:
are insufficient in the domain-of-interest. STLex
AAAB9HicbVBNS8NAEN3Ur1q/qh69BIvgqSSi6LHoxYOHiv2CNpTNdtIu3Wzi7qS0hP4OLx4U8eqP8ea/cdvmoNUHA4/3ZpiZ58eCa3ScLyu3srq2vpHfLGxt7+zuFfcPGjpKFIM6i0SkWj7VILiEOnIU0IoV0NAX0PSHNzO/OQKleSRrOInBC2lf8oAzikbyHmrdDsIY0zsYT7vFklN25rD/EjcjJZKh2i1+dnoRS0KQyATVuu06MXopVciZgGmhk2iIKRvSPrQNlTQE7aXzo6f2iVF6dhApUxLtufpzIqWh1pPQN50hxYFe9mbif147weDKS7mMEwTJFouCRNgY2bME7B5XwFBMDKFMcXOrzQZUUYYmp4IJwV1++S9pnJXdi7Jzf16qXGdx5MkROSanxCWXpEJuSZXUCSOP5Im8kFdrZD1bb9b7ojVnZTOH5Besj28MkJJI
[
AAAB63icbVBNSwMxEJ2tX7V+VT16CRbBU9kVRY9FLx4r2A9ol5JNs21okg1JVihL/4IXD4p49Q9589+YbfegrQ8GHu/NMDMvUpwZ6/vfXmltfWNzq7xd2dnd2z+oHh61TZJqQlsk4YnuRthQziRtWWY57SpNsYg47USTu9zvPFFtWCIf7VTRUOCRZDEj2OZSn6RqUK35dX8OtEqCgtSgQHNQ/eoPE5IKKi3h2Jhe4CsbZlhbRjidVfqpoQqTCR7RnqMSC2rCbH7rDJ05ZYjiRLuSFs3V3xMZFsZMReQ6BbZjs+zl4n9eL7XxTZgxqVJLJVksilOObILyx9GQaUosnzqCiWbuVkTGWGNiXTwVF0Kw/PIqaV/Ug6u6/3BZa9wWcZThBE7hHAK4hgbcQxNaQGAMz/AKb57wXrx372PRWvKKmWP4A+/zBx9pjko=
STPMI
AAAB9HicbVDLSgNBEJyNrxhfUY9eBoPgKeyKosegFz0IEfOCZAmzk95kyOzDmd5gWPIdXjwo4tWP8ebfOEn2oNGChqKqm+4uL5ZCo21/Wbml5ZXVtfx6YWNza3unuLvX0FGiONR5JCPV8pgGKUKoo0AJrVgBCzwJTW94NfWbI1BaRGENxzG4AeuHwhecoZHc+1q3g/CIafX2ZtItluyyPQP9S5yMlEiGarf42elFPAkgRC6Z1m3HjtFNmULBJUwKnURDzPiQ9aFtaMgC0G46O3pCj4zSo36kTIVIZ+rPiZQFWo8Dz3QGDAd60ZuK/3ntBP0LNxVhnCCEfL7ITyTFiE4ToD2hgKMcG8K4EuZWygdMMY4mp4IJwVl8+S9pnJSds7J9d1qqXGZx5MkBOSTHxCHnpEKuSZXUCScP5Im8kFdrZD1bb9b7vDVnZTP75Besj2+mopIF
amod, nsubj+acomp
AS-pair Candidates
Compared to the above models, our ASPE has (Aspect 1, Sentiment 1),
Sentiment Terms (ST) (Aspect 2, Sentiment 2),…
two merits of being (1) unsupervised and hence
free from expensive data labeling; (2) generalizable filtering and merging
AS-pair Extractions (in green)
to different domains by combining three different (Aspect 1, Sentiment 1), (Aspect 2, Sentiment 2) …
labeling methods.
Figure 1: Pipeline of ASPE.
2.3 Aspect-based Recommendation
Aspect-based recommendation is a relevant task 3.2.1 Sentiment Terms Extraction
with a major difference that specific terms indicat- PMI-based method Pointwise Mutual Informa-
ing sentiments are not extracted. Only the aspects tion (PMI) originates from Information Theory and
are needed (Hou et al., 2019; Guan et al., 2019; is adapted into NLP (Zhang et al., 2019; Tian et al.,
Huang et al., 2020a; Chin et al., 2018). Some dis- 2020) to measure statistical word associations in
advantages are summarized as follows. Firstly, corpora. It determines the sentiment polarities of
the aspect extraction tools are usually outdated words using a small number of carefully selected
and inaccurate such as LDA (Hou et al., 2019), positive and negative seeds (s+ and s− ) (Tian et al.,
TF-IDF (Guan et al., 2019), and word embedding- 2020). It first extracts candidate sentiment terms
based similarity (Huang et al., 2020a). Second, the satisfying the part-of-speech patterns by Turney
representation of sentiment is scalar-based which (2002) and then measures the polarity of each can-
is coarser than embedding-based used in our work. didate term w by
2.4 Rating Prediction
X X
+ −
Pol(w) = PMI(w, s ) − PMI(w, s ). (1)
Rating prediction is an important task in recom- s+ s−
mendation. Related approaches utilize text mining
Given a sliding window-based context sampler ctx,
algorithms to build user and item representations
the PMI(·, ·) between words is defined by
and predict ratings (Kim et al., 2016; Zheng et al.,
2017; Chen et al., 2018; Chin et al., 2018; Liu p(w1 , w2 )
et al., 2019; Bauman et al., 2017). However, the PMI(w1 , w2 ) = log , (2)
p(w1 )p(w2 )
text features learned are latent and unable to pro-
vide explicit hints for explaining user interests. where p(·), the probability estimated by token
counts, is defined by p(w1 , w2 ) = |{ctx|w 1 ,w2 ∈ctx}|
total #ctx
3 ASPE and APRE and p(w1 ) = |{ctx|w 1 ∈ctx}|
total #ctx . Afterward, we collect
3.1 Problem Formulation the top-q sentiment tokens with strong polarities,
both positive and negative, as STPMI .
Review-based rating prediction involves two major
entities: users and items. A user u writes a review NN-based method As discussed in Section 2, co-
ru,t for an item t and rates a score su,t . Let Ru extraction models (Dai and Song, 2019) can accu-
denote all reviews given by u and Rt denote all rately label AS-pairs only in the training domain.
reviews received by t. A rating regressor takes in a For sentiment terms with consistent semantics in
tuple of a review-and-rate event (u, t) and review different domains such as good and great, NN
sets Ru and Rt to estimate the rating score su,t . methods can still provide a robust extraction recall.
In this work, we take a pretrained SDRN (Chen
3.2 Unsupervised ASPE et al., 2020a) as the NN-based method to gener-
We combine three separate methods to label AS- ate STNN . The pretrained SDRN is considered an
pairs without the need for supervision, namely PMI- off-the-shelf tool similar to the pretrained BERT
based, neural network-based (NN-based), and lan- which is irrelevant to our rating prediction data.Therefore, we argue ASPE is unsupervised for open are ignored to reduce sparsity. We use WordNet
domain rating prediction. synsets (Miller, 1995) to merge the synonym as-
pects. The aspect with the most synonyms is se-
Knowledge-based method PMI- and NN-based
lected as the representative of that aspect set.
methods have shortcomings. The PMI-based
method depends on the seed selection. The ac- amod dependency relation:
curacy of the NN-based method deteriorates when prep
the applied domain is distant from the training data. conj pobj
As compensation, we integrate a sentiment lexi- amod cc advmod nummod
con STLex summarized by linguists since expert Amazing sound and quality, all in one headset.
Extracted AS-pair candidates:
knowledge is widely used in unsupervised learn- (sound, amazing), (quality, amazing)
ing. Examples of linguistic lexicons include Sen- nsubj+acomp dependency relation:
conj
tiWordNet (Baccianella et al., 2010) and Opinion
Lexicon (Hu and Liu, 2004). The latter one is used cc
compound nsubj acomp nsubj acomp
in this work.
Sound quality is superior and comfort is excellent.
Building sentiment term set The three senti- Extracted AS-pair candidates:
(Sound quality, superior), (comfort, excellent)
ment term subsets are joined to build an overall
sentiment set used in AS-pair generation: ST = Figure 2: Two dependency-based rules for AS-pair can-
STPMI ∪ STNN ∪ STLex . The three sets compensate didates extraction. Effective dependency relations and
for the discrepancies of other methods and expand aspects and sentiments candidates are highlighted.
the coverage of terms shown in Table 10†.
Discussion ASPE is different from Aspect Ex-
3.2.2 Syntactic AS-pairs Extraction traction (AE) (Tulkens and van Cranenburgh, 2020;
To extract AS-pairs, we first label AS-pair can- Luo et al., 2019; Wei et al., 2020; Ma et al., 2019;
didates using dependency parsing and then filter Angelidis and Lapata, 2018; Xu et al., 2018; Shu
out non-sentiment-carrying candidates using (ST )1 . et al., 2017; He et al., 2017a) which extracts as-
Dependency parsing extracts the syntactic relations pects only and infers sentiment polarities in {pos,
between the words. Some nouns are considered neg, (neu)}. AS-pair co-extraction, however, of-
potential aspects and are modified by adjectives fers more diversified emotional signals than the
with two types of dependency relations shown in bipolar sentiment measurement of AE.
Figure 2: amod and nsubj+acomp. The pairs
of nouns and the modifying adjectives compose 3.3 APRE
the AS-pair candidates. Similar techniques are APRE, depicted in Figure 3, predicts ratings given
widely used in unsupervised aspect extraction mod- reviews and the corresponding AS-pairs. It first
els (Tulkens and van Cranenburgh, 2020; Dai and encodes language into embeddings, then learns ex-
Song, 2019). AS-pair candidates are noisy since plicit and implicit features, and finally computes
not all adjectives in it bear sentiment inclination. the score regression. One distinctive feature of
ST comes into use to filter out non-sentiment- APRE is that it explicitly models the aspect in-
carrying AS-pair candidates whose adjective is not formation by incorporating a da -dimensional as-
in ST . The left candidates form the AS-pair set. pect representation ai ∈ Rda in each side of the
Admittedly, the dependency-based extraction for substructures for review encoding. Let A(u) =
(noun, adj.) pairs is suboptimal and causes miss- (u) (u)
{a1 , . . . , ak } denotes the k aspect embeddings
ing aspect or sentiment terms. An implicit module
for users and A(t) for items. k is decided by the
is designed to remedy this issue. Open domain
number of unique aspects in the AS-pair set.
AS-pair co-extraction is blocked by the lacking of
public labeled data and is left for future work. Language encoding The reviews are encoded
We introduce ItemTok as a special aspect to- into low-dimensional token embedding sequences
ken of the nsubj+acomp rule where nsubj is by a fixed pre-trained BERT (Devlin et al., 2019),
a pronoun of the item such as it and they. Infre- a powerful transformer-based contextualized lan-
quent aspect terms with less than c occurrences guage encoder. For each review r in Ru or Rt ,
1
Section A.2.1† explains this procedure in detail by pseu- the resulting encoding H0 ∈ R(|r|+2)×de consists
docode of Algorithm 1†. of (|r| + 2) de -dimensional contextualized vectors:APRE - Item
IMPLICIT
Reviews Encoder
EXPLICIT
can nicely handle both sentiments and frequencies
vt Gt
AAAC63icbVJNb9MwGHbC1whfHRy5RFRFrQRVgpjGcWMHdgCpCLpNakr0xnUaa44d2U61yvJf4MIBhLjyh7jxb3Daoq3tXsn24+d5/PrjdVYxqnQU/fX8Gzdv3b6zcze4d//Bw0et3ccnStQSkyEWTMizDBRhlJOhppqRs0oSKDNGTrPzo0Y/nRGpqOCf9bwi4xKmnOYUg3ZUuut5nUSTC20WfZabY2u/xEEnyQSbqHnpBlM4Jl0azOjo/aextVsG04WeTU39QjbaWsZD28h1rxGAVQVsmDOi4XJyNev/TTHn9jKtKoEx8wEukueHs+nLgRDM2u41t+it55vZtA7Wbe+sTXXaakf9aBHhNohXoI1WMUhbf5KJwHVJuMYMlBrFUaXHBqSmmBEbJLUiFeBzmJKRgxxKosZmUSsbdhwzCXMhXeM6XLBXVxgoVXNg5yxBF2pTa8jrtFGt8zdjQ3lVa8LxcqO8ZqEWYVP4cEIlwZrNHQAsqTtriAuQgLX7HoF7hHjzytvg5FU/3utHH1+3D96unmMHPUXPUBfFaB8doGM0QEOEvcL76n33fvil/83/6f9aWn1vteYJWgv/9z+otO+D
AAAC63icbVJNb9MwGHbC1whfHRy5RFRFrQRVgpjGcWMHdgCpCLpNakr0xnUaa44d2U61yvJf4MIBhLjyh7jxb3DaonXtXsn24+d5/Pr1R1YxqnQU/fX8Gzdv3b6zcze4d//Bw0et3ccnStQSkyEWTMizDBRhlJOhppqRs0oSKDNGTrPzo0Y/nRGpqOCf9bwi4xKmnOYUg3ZUuut5nUSTC20WfZabY2u/xEEnyQSbqHnpBlM4Jl0azOjo/aextVsG04WeTU39QjbalYyHtpHrXiMAqwrYMGdEw+VkPev/TTHn9jKtKoEx8wEukueHs+nLgRDM2u41p+gF6+lmNtWbpb2zjkxb7agfLSLcBvEKtNEqBmnrTzIRuC4J15iBUqM4qvTYgNQUM2KDpFakAnwOUzJykENJ1Ngs3sqGHcdMwlxI17gOF+z6CgOlaip2zhJ0oTa1hrxOG9U6fzM2lFe1JhwvN8prFmoRNg8fTqgkWLO5A4AldbWGuAAJWLvvEbhLiDePvA1OXvXjvX708XX74O3qOnbQU/QMdVGM9tEBOkYDNETYK7yv3nfvh1/63/yf/q+l1fdWa56gK+H//gerhe+C
of term mentions. Aspects that are not mentioned
rating pred. Fim (·) ŝu,t rating pred. Fex (·)
AAAB9XicbVBNS8NAEJ34WetX1aOXxSJ4kJKIoseiF48V7Ae0tWy2m3bpZhN2J0oJ+R9ePCji1f/izX/jts1BWx8MPN6bYWaeH0th0HW/naXlldW19cJGcXNre2e3tLffMFGiGa+zSEa65VPDpVC8jgIlb8Wa09CXvOmPbiZ+85FrIyJ1j+OYd0M6UCIQjKKVHjpDiqnJemlySjDrlcpuxZ2CLBIvJ2XIUeuVvjr9iCUhV8gkNabtuTF2U6pRMMmzYicxPKZsRAe8bamiITfddHp1Ro6t0idBpG0pJFP190RKQ2PGoW87Q4pDM+9NxP+8doLBVTcVKk6QKzZbFCSSYEQmEZC+0JyhHFtCmRb2VsKGVFOGNqiiDcGbf3mRNM4q3kXFvTsvV6/zOApwCEdwAh5cQhVuoQZ1YKDhGV7hzXlyXpx352PWuuTkMwfwB87nD6Xzkpo=
AAADUnicbVJbb9MwFHZaLiMM1sEjLxFVUStBlSAQPG5Mgj1sUhF0m9SE6MR1GmtOHMVO1cryb0RCvPBDeOEBcHrR1qZHinP8fd+5WSfKGRXSdX9Zjeadu/fu7z2wH+4/enzQOnxyIXhZYDLEnPHiKgJBGM3IUFLJyFVeEEgjRi6j65OKv5ySQlCefZXznAQpTDIaUwzSQOGhlXR8SWZSLc4oVqdaf/Psjh9xNhbz1PxUYpBwKVCjk7MvgdY1gepCT4eqfFlU3EbGY13RZa8igOUJbIkjIuHmcjvruijOMn2TVqTAmDqHmf/ieDp5NeCcad3dMUVvM99Uh3K7t096B1gFu7XYdYcpyAQDUx91uAyjqamOx1z27DpJZmsybLXdvrswp+54K6eNVjYIWz/8McdlSjKJGQgx8txcBgoKSTEj2vZLQXLA1zAhI+NmkBIRqMVKaKdjkLET88J8mXQW6O0IBamoZjPKqmexzVXgLm5Uyvh9oGiWl5JkeFkoLpkjuVPtlzOmBcGSzY0DuKCmVwcnUACWZgtt8wje9sh15+J133vbdz+/aR99WD3HHnqGnqMu8tA7dIRO0QANEba+W7+tv9a/xs/Gn6bVbC6lDWsV8xRtWHP/P6YwF8o=
AAADNHicbVJbi9NAFJ7E21pvXX30JVgqLWhJRNHHXRd0EYWKdnehieFkMmmHncyEzKRsGeZH+eIP8UUEHxTx1d/gpKns9nIgycn3feecby5JwahUvv/dcS9dvnL12s711o2bt27fae/ePZKiKjEZYcFEeZKAJIxyMlJUMXJSlATyhJHj5PSg5o9npJRU8I9qXpAohwmnGcWgLBTvOm+6oSJnSi/eSaYPjfkUtLphIlgq57n96KlF4kagxwdvP0TGbAh0D/om1tWjsuZWOu6bmq76NQGsmMKaOCEKzn8udv0/FHNuztvKHBjT7+AsfLg/mzweCsGM6W1ZRX+138zEat3ba7MFrIv9jdrGYZiDmmJg+pWJmyqa2+E4Faoftzv+wF+Et5kEy6SDljGM21/DVOAqJ1xhBlKOA79QkYZSUcyIHVZJUgA+hQkZ25RDTmSkF4duvK5FUi8TpX248hboxQoNuazdW2XtWa5zNbiNG1cqexFpyotKEY6bQVnFPCW8+gZ5KS0JVmxuE8AltV49PIUSsLL3rGU3IVhf8mZy9GQQPBv475929l4ut2MH3UcPUA8F6DnaQ4doiEYIO5+db85P55f7xf3h/nb/NFLXWdbcQyvh/v0HzqwOTA==
(a)
vu
AAAC1XicbVLLjtMwFHXCawivAks2EVVRK0GVIBAsZ5jNLEAqj05HakJ04zqtNY4d2U7VyvIChNjyb+z4Bz4Cpy2aTjtXsn18zvH14zqvGFU6iv54/rXrN27eOrgd3Ll77/6D1sNHp0rUEpMhFkzIsxwUYZSToaaakbNKEihzRkb5+XGjj+ZEKir4F72sSFrClNOCYtCOylp/O4kmC21WfV6YE2u/xkEnyQWbqGXpBjNzTLY2mPHx+8+ptXsG04WezUz9XDbapYxHtpHrXiMAq2awY86JhovJdtb/m2LO7UVaVQJj5gMskmdH8+mLgRDM2u4Vt+gF2+nmNquzVjvqR6sI90G8AW20iUHW+p1MBK5LwjVmoNQ4jiqdGpCaYkZskNSKVIDPYUrGDnIoiUrNqio27DhmEhZCusZ1uGK3VxgoVXM45yxBz9Su1pBXaeNaF29TQ3lVa8LxeqOiZqEWYVPicEIlwZotHQAsqTtriGcgAWv3EQL3CPHulffB6ct+/LoffXzVPny3eY4D9AQ9RV0UozfoEJ2gARoi7H3yFt4377s/8q3/w/+5tvreZs1jdCn8X/8AmdHnYw==
IMPLICIT EXPLICIT # of reviews
by r will have hu,r = 0. To completely picture
Gu
AAAC63icbVJNb9MwGHbC1whfHRy5RFRFrQRVgpjGcWMHdgCpCLpNakr0xnUaa44d2U61yvJf4MIBhLjyh7jxb3Daoq3tXsn24+d5/PrjdVYxqnQU/fX8Gzdv3b6zcze4d//Bw0et3ccnStQSkyEWTMizDBRhlJOhppqRs0oSKDNGTrPzo0Y/nRGpqOCf9bwi4xKmnOYUg3ZUuut5nUSTC20WfZabY2u/xEEnyQSbqHnpBlM4Jl0azOjo/aextVsG04WeTU39QjbaWsZD28h1rxGAVQVsmDOi4XJyNev/TTHn9jKtKoEx8wEukueHs+nLgRDM2u41t+it55vZtA7Wbe+s49JWO+pHiwi3QbwCbbSKQdr6k0wErkvCNWag1CiOKj02IDXFjNggqRWpAJ/DlIwc5FASNTaLWtmw45hJmAvpGtfhgr26wkCpmgM7Zwm6UJtaQ16njWqdvxkbyqtaE46XG+U1C7UIm8KHEyoJ1mzuAGBJ3VlDXIAErN33CNwjxJtX3gYnr/rxXj/6+Lp98Hb1HDvoKXqGuihG++gAHaMBGiLsFd5X77v3wy/9b/5P/9fS6nurNU/QWvi//wGqOO+E
# of aspects
user u’s attentions to all aspects, we aggregate all
AAACYXicbVFNSwMxEM2uX7V+rfXoZbEUFKTsiqLHqpceFawKbV1m02wbmv0gmZWWsH/Smxcv/hGzbQ/VOpDk8d6bTGYSZoIr9LxPy15b39jcqmxXd3b39g+cw9qzSnNJWYemIpWvISgmeMI6yFGw10wyiEPBXsLxfam/vDOpeJo84TRj/RiGCY84BTRU4EwaPWQT1LM9jHS7KN78aqMXpmKgprE59Mgwwdygi2JF06dwVgQ6P5el9uuy26KU87Oi2gORjWDZGzh1r+nNwl0F/gLUySIeAuejN0hpHrMEqQClur6XYV+DRE4FMyVyxTKgYxiyroEJxEz19WxChdswzMCNUmlWgu6MXc7QEKuyJeOMAUfqr1aS/2ndHKObvuZJliNL6LxQlAsXU7cctzvgklEUUwOASm7e6tIRSKBoPqVqhuD/bXkVPF80/aum93hZb90txlEhx+SEnBKfXJMWaZMH0iGUfFnr1p61b33b27Zj1+ZW21rkHJFfYR//AMkkuHU=
(a)
weighted REVIEW-WISE weighted ↵u,r
avg. pool avg. pool
AAACc3icbVFNTxsxEPVuaUtDaUMrceFQizQSSBDtIlB75OPCoQdQCSAl29WsM0ssvB+yZ6tG1v6B/jxu/Asuvdeb5NCQjmT76b03Y884KZU0FASPnv9i5eWr16tvWmtv19+9b298uDZFpQX2RaEKfZuAQSVz7JMkhbelRsgShTfJ/Vmj3/xEbWSRX9GkxCiDu1ymUgA5Km7/7g4Jf5Gd7klqz+v6R9jqDpNCjcwkc4cdOyaeGezg7Nv3qK6XDHYHduvYVnu60RYqntSNXO02AqhyDAvmYYIEMxy3O0EvmAZfBuEcdNg8LuL2w3BUiCrDnIQCYwZhUFJkQZMUCl3tymAJ4h7ucOBgDhmayE5nVvOuY0Y8LbRbOfEp+2+Ghcw0/TlnBjQ2z7WG/J82qCj9GlmZlxVhLmYXpZXiVPDmA/hIahSkJg6A0NK9lYsxaBDkvqnlhhA+b3kZXB/0wqNecHnYOT6dj2OVbbFttsNC9oUds3N2wfpMsCdv0/vkce+Pv+Vv+59nVt+b53xkC+Hv/wUisr9U
u,r
AGGREGATION
reviews from u, i.e. Ru , using review-wise ag-
review-wise agg review-wise agg (a)
gregation weighted by αu,r given in the equation
v u,r (a)
below. αu,r indicates the significance of each re-
AAADF3icbVLLjtMwFHXCayiP6cCSTURV1EpMlSAQLGeYBbMAqQg6M1JTohvXaa1x7Mh2qqks/wUbfoUNCxBiCzv+BqctmmnaK8U5Pufc6+tHWjCqdBj+9fxr12/cvLVzu3Hn7r37u829BydKlBKTARZMyLMUFGGUk4GmmpGzQhLIU0ZO0/OjSj+dEamo4B/1vCCjHCacZhSDdlSy5+23Y00utFmMaWaOrf0UNdpxKthYzXP3M1PHJEuDGR69/TCydsNgOtC1iSmfykpbq3hoK7nsVgKwYgo1c0o0XE6uVv2/KObcXpZVOTBm3sFF/ORwNtnvC8Gs7WzZRXe93swmut7bG7uFrJLDRi112WDSbIW9cBHBJohWoIVW0U+af+KxwGVOuMYMlBpGYaFHBqSmmBHbiEtFCsDnMCFDBznkRI3M4l5t0HbMOMiEdB/XwYK9mmEgV1WHzpmDnqq6VpHbtGGps1cjQ3lRasLxcqGsZIEWQfVIgjGVBGs2dwCwpK7XAE9BAtbuKVWHENW3vAlOnvWiF73w/fPWwevVceygR+gx6qAIvUQH6Bj10QBh77P31fvu/fC/+N/8n/6vpdX3VjkP0Vr4v/8BviEB7A==
ASPECT-WISE
AAACNHicbVBLSwMxEM7WV62vqkcvi6VQQcquKHoseil4qWAf0NYlm2bb0OyDZFYsYX+UF3+IFxE8KOLV32C23YO2DiTz8X0zzMznRpxJsKxXI7e0vLK6ll8vbGxube8Ud/daMowFoU0S8lB0XCwpZwFtAgNOO5Gg2Hc5bbvjq1Rv31MhWRjcwiSifR8PA+YxgkFTTvG63AP6AGr6u56qJ8mdXSj33JAP5MTXSY0048wKVJIU5iRVwUeJo+JjkTjFklW1pmEuAjsDJZRFwyk+9wYhiX0aAOFYyq5tRdBXWAAjnOpRsaQRJmM8pF0NA+xT2VfToxOzrJmB6YVCvwDMKfu7Q2FfpmvqSh/DSM5rKfmf1o3Bu+grFkQx0IDMBnkxNyE0UwfNAROUAJ9ogIlgeleTjLDABLTPBW2CPX/yImidVO2zqnVzWqpdZnbk0QE6RBVko3NUQ3XUQE1E0CN6Qe/ow3gy3oxP42tWmjOynn30J4zvH2/brSM=
h(a)
u,r concat AGGREGATION
…
hcnn
AAACkHicbVFdb9MwFHXC1ygMynjkJaKqtEmoSgaIPYDY2MuEeBiCbpPaEN24N6s1x4nsG0Rl+ffwf3jj3+C0QaItV7J9fM6xr31vXkthKI5/B+Gt23fu3tu533vwcPfR4/6TvQtTNZrjmFey0lc5GJRC4ZgESbyqNUKZS7zMb05b/fI7aiMq9ZUWNaYlXCtRCA7kqaz/czgl/EF2OeeFPXPuW9IbTvNKzsyi9IudeyZbGezk9NOX1Lktg92HA5fZ5oVutbUbT1wrNwetALKew4Y5R4Jus37p35xcKeey/iAexcuItkHSgQHr4jzr/5rOKt6UqIhLMGaSxDWlFjQJLtHnagzWwG/gGiceKijRpHZZUBcNPTOLikr7oShasv+esFCa9p3eWQLNzabWkv/TJg0VR6kVqm4IFV8lKhoZURW13YlmQiMnufAAuBb+rRGfgwZOvoc9X4Rk88vb4OJwlLwexZ9fDY4/dOXYYc/Yc7bPEvaGHbMzds7GjAe7wcvgbfAu3AuPwvfhycoaBt2Zp2wtwo9/AIvgykk=
“comfort”
“mic quality”
User-specific
aspect repr.
view’s contribution to the overall understanding of
pool
AAACTXicbVHLSgMxFM3UR2t9jbp0M1gKFaTMiKLLqpsuK9gHtLVkMpk2mHmQ3BFLmB90I7jzL9y4UETMtF3Y1gtJDufcm3tz4sacSbDtNyO3srq2ni9sFDe3tnd2zb39lowSQWiTRDwSHRdLyllIm8CA004sKA5cTtvuw02mtx+pkCwK72Ac036AhyHzGcGgqYHplXtAn0BNdtdX9TS9d4rlnhtxT44DfaiRZgbTBJWmS5qq4ON0oJITkRbn77pKMzXRqlmyq/YkrGXgzEAJzaIxMF97XkSSgIZAOJay69gx9BUWwAinuk8iaYzJAx7SroYhDqjsq4kbqVXWjGf5kdArBGvC/q1QOJDZ+DozwDCSi1pG/qd1E/Av+4qFcQI0JNNGfsItiKzMWstjghLgYw0wEUzPapERFpiA/oCiNsFZfPIyaJ1WnfOqfXtWql3P7CigQ3SEKshBF6iG6qiBmoigZ/SOPtGX8WJ8GN/GzzQ1Z8xqDtBc5PK/13C2pA==
matrix A(u)
cnn+relu avg&max
“sound quality”
u’s attention to aspect a
pool pool pool
h1[CLS]
AAACHXicbVC7TsMwFHXKq5RXgZEloqrEVCWoCMaKLh0YiqAPqQmR4zitVech+wZRRfkRFn6FhQGEGFgQf4PbZoCWI9k+Oude+d7jxpxJMIxvrbCyura+UdwsbW3v7O6V9w+6MkoEoR0S8Uj0XSwpZyHtAANO+7GgOHA57bnj5tTv3VMhWRTewiSmdoCHIfMZwaAkp1yvWkAfIJ3drp+2suzOLFluxD05CdSTjpTgzP100Ly6sbPMKVeMmjGDvkzMnFRQjrZT/rS8iCQBDYFwLOXANGKwUyyAEU6zkpVIGmMyxkM6UDTEAZV2Otsu06tK8XQ/EuqEoM/U3x0pDuR0VlUZYBjJRW8q/ucNEvAv7JSFcQI0JPOP/ITrEOnTqHSPCUqATxTBRDA1q05GWGACKtCSCsFcXHmZdE9r5lnNuK5XGpd5HEV0hI7RCTLROWqgFmqjDiLoET2jV/SmPWkv2rv2MS8taHnPIfoD7esHQTujQw==
sound aspect comfort aspect
(a)
[CLS]
token repr. seq. H1 exp(tanh(wTex [hu,r ; a(u) ]))
AAAB/HicbVDLSsNAFJ34rPUV7dLNYBFclUQUXRbddFnBPqCNZTKdtEMnkzBzI4ZQf8WNC0Xc+iHu/BunaRbaeuBeDufcy9w5fiy4Bsf5tlZW19Y3Nktb5e2d3b19++CwraNEUdaikYhU1yeaCS5ZCzgI1o0VI6EvWMef3Mz8zgNTmkfyDtKYeSEZSR5wSsBIA7vSB/YIWd79IGtMp/fuwK46NScHXiZuQaqoQHNgf/WHEU1CJoEKonXPdWLwMqKAU8Gm5X6iWUzohIxYz1BJQqa9LD9+ik+MMsRBpExJwLn6eyMjodZp6JvJkMBYL3oz8T+vl0Bw5WVcxgkwSecPBYnAEOFZEnjIFaMgUkMIVdzciumYKELB5FU2IbiLX14m7bOae1Fzbs+r9esijhI6QsfoFLnoEtVRAzVRC1GUomf0it6sJ+vFerc+5qMrVrFTQX9gff4AZVSVPg==
H1
AAAB/HicbVDLSsNAFJ34rPUV7dLNYBFclUQUXRbddFnBPqCNZTKdtEMnkzBzI4ZQf8WNC0Xc+iHu/BunaRbaeuBeDufcy9w5fiy4Bsf5tlZW19Y3Nktb5e2d3b19++CwraNEUdaikYhU1yeaCS5ZCzgI1o0VI6EvWMef3Mz8zgNTmkfyDtKYeSEZSR5wSsBIA7vSB/YIWd79IGtMp/fuwK46NScHXiZuQaqoQHNgf/WHEU1CJoEKonXPdWLwMqKAU8Gm5X6iWUzohIxYz1BJQqa9LD9+ik+MMsRBpExJwLn6eyMjodZp6JvJkMBYL3oz8T+vl0Bw5WVcxgkwSecPBYnAEOFZEnjIFaMgUkMIVdzciumYKELB5FU2IbiLX14m7bOae1Fzbs+r9esijhI6QsfoFLnoEtVRAzVRC1GUomf0it6sJ+vFerc+5qMrVrFTQX9gff4AZVSVPg==
token (a)
… αu,r =P (a)
,
H0
AAADAXicbVJNb9MwGHbC1ygf6+DAgUtEVdRKUCUItB03dmAHkIqg26QmRG9cp7Xm2JHtVKssc+GvcOEAQlz5F9z4Nzht0dZ2rxTn8fM872u/trOSUaXD8K/nX7t+4+atrduNO3fv3d9u7jw4VqKSmAywYEKeZqAIo5wMNNWMnJaSQJExcpKdHdb6yZRIRQX/qGclSQoYc5pTDNpR6Y73qB1rcq7NfMxyc2Ttp6jRjjPBRmpWuJ+ZOCZdGMzw8O2HxNoNg+lA16ameiZrbaXiga3lqlsLwMoJrJkzouFicrnq/0Ux5/airCqAMfMOzuOnB9Px874QzNrOFV10V+tNbarX9/bG1uRmbthIm62wF84j2ATRErTQMvpp8088ErgqCNeYgVLDKCx1YkBqihmxjbhSpAR8BmMydJBDQVRi5jdog7ZjRkEupPu4Dubs5QwDharbcM4C9EStazV5lTasdL6XGMrLShOOFwvlFQu0COrnEIyoJFizmQOAJXV7DfAEJGDtHk19CNF6y5vg+EUvetUL379s7b9eHscWeoyeoA6K0C7aR0eojwYIe5+9r95374f/xf/m//R/Lay+t8x5iFbC//0P5tf4Zw==
r0 ∈Ru exp(tanh(wTex [hu,r0 ; a(u) ]))
Contextualized Language Encoder
ct
r
nd
lity
]
is
is
S]
fort
and
…
erio
elle
where [·; ·] denotes the concatenation of tensors.
[CL
Sou
qua
com
sup
exc
[
AS-pair 1 AS-pair 2 wex ∈ R(df +da ) is a trainable weight. With the
APRE - User Reviews Encoder (a)
usefulness distribution of αu,r , we aggregate the
Aspect annotations Shared aspect representation Review representation
(a)
Sentiment annotations/embeddings Computational modules/operations hu,r of r ∈ Ru by weighted average pooling:
Figure 3: Pipeline of APRE including a user review X
encoder in the orange dashed box and an item review
g (a)
u =
(a) (a)
αu,r hu,r .
r∈Ru
encoder in the top blue box, each containing an im-
plicit channel (left) and an aspect-based explicit chan-
nel (right). Internal details of item encoder are identical
Now we obtain the user attention representation
(a)
to the counterpart of user encoder and hence omitted. for aspect a, g u ∈ Rdf . We use Gu ∈ Rdf ×k to
(a)
denote the matrix of g u . The item-tower architec-
ture is omitted in Figure 3 since the item property
H0 = {h0[CLS] , h01 , . . . , h0|r| , h0[SEP] }. [CLS] and
modeling shares the identical computing proce-
[SEP] are two special tokens indicating starts and
dure. It generates the item property representations
separators of sentences. We use a trainable linear (a)
transformation, h1i = WTad h0i + bad , to adapt the g t of Gt . Mutual attention (Liu et al., 2019; Tay
BERT output representation H0 to our task as H1 et al., 2018; Dong et al., 2020) is not utilized since
where Wad ∈ Rde ×df , bad ∈ Rdf , and df is the the generation of user attention encodings Gu is
transformed dimension of internal features. BERT independent to the item properties and vice versa.
encodes the token semantics based upon the context Implicit review representation It is acknowl-
which resolves the polysemy of certain sentiment edged by existing works shown in Section 2 that
terms, e.g., “cheap” is positive for price but nega- implicit semantic modeling is critical because some
tive for quality. This step transforms the sentiment emotions are conveyed without explicit sentiment
encoding to attention-property modeling. word mentions. For example, “But this one feels
Explicit aspect-level attitude modeling For as- like a pillow . . . ” in R3 of Table 1 does not con-
pect a in the k total aspects, we pull out all the con- tain any sentiment tokens but expresses a strong
textualized representations of the sentiment words2 satisfaction of the comfortableness, which will be
that modify a, and aggregate their representations missed by the extractive annotation-based ASPE.
to a single embedding of aspect a in r as In APRE, we combine a global feature h1[CLS] , a
X local context feature hcnn ∈ Rnc learned by a con-
h(a)
u,r = h1j , wj ∈ ST ∩ r and wj modifies a. volutional neural network (CNN) of output channel
size nc and kernel size nk with max pooling, and
An observation by Chen et al. (2020b) suggests that two token-level features, average and max pooling
users tend to use semantically consistent words for of H1 to build a comprehensive multi-granularity
the same aspect in reviews. Therefore, sum-pooling review representation v u,r :
2
BERT uses WordPiece tokenizer that can break an out-of-
v u,r = h1[CLS] ; hcnn ; MaxPool(H1 ); AvgPool(H1 ) ,
vocabulary word into shorter word pieces. If a sentiment word
is broken into word pieces, we use the representation of the
first word piece produced. hcnn = MaxPool(ReLU(ConvNN_1D(H1 ))).We apply review-wise aggregation without aspects Baseline models Thirteen baselines in tradi-
for latent review embedding v u tional and deep learning categories are compared
with the proposed framework. The pre-deep learn-
exp(tanh(wTim v u,r )) ing traditional approaches predict ratings solely
βu,r = P T
,
r0 ∈Ru exp(tanh(w im v u,r0 ))
based upon the entity IDs. Table 3 introduces their
X basic profiles which are extended in Section A.3.3†.
vu = βu,r v u,r ,
Specially, AHN-B refers to AHN using pretrained
r∈Ru
BERT as the input embedding encoder. It is in-
(a) cluded to test the impact of the input encoders.
where βu,r is the counterpart of αu,r in the implicit
channel, wim ∈ Rdim is a trainable parameter, and Evaluation metric We use Mean Square Error
dim = 3df + nc . Using similar steps, we can also (MSE) for performance evaluation. Given a test set
obtain v t for the item implicit embeddings. Rtest , the MSE is defined by
1 X
Rating regression and optimization Implicit MSE = (ŝu,r − su,r )2 .
|Rtest |
features v u and v t and explicit features Gu and (u,r)∈Rtest
Gt compose the input to the rating predictor to
Reproducibility We provide instructions to re-
estimate the score su,t by
produce AS-pair extraction of ASPE and rating pre-
diction of baselines and APRE in Section A.3.1†.
ŝu,t = bu + bt + Fim ([v u ; v t ]) + hγ, Fex ([Gu ; Gt ])i .
| {z } | {z } | {z } The source code of our models is publicly available
biases implicit feature explicit feature
on GitHub4 .
Fim : R2dim → R and Fex : R2df ×k → Rk are 4.2 AS-pair Extraction of ASPE
multi-layer fully-connected neural networks with We present the extraction performance of unsuper-
ReLU activation and dropout to avoid overfitting. vised ASPE. The distributions of the frequencies
They model user attention and item property inter- of extracted AS-pairs in Figure 5 follow the trend
actions in explicit and implicit channels, respec- of Zipf’s Law with a deviation common to natural
tively. h·, ·i denotes inner-product. γ ∈ Rk and languages (Li, 1992), meaning that ASPE performs
{bu , bt } ∈ R are trainable parameters. The opti- consistently across domains. We show the qualita-
mization function of the trainable parameter set Θ tive results of term extraction separately.
with an L2 regularization weighted by λ is
Sentiment terms Generally, the AS-pair statis-
J(Θ) =
X
2
(su,t − ŝu,t ) + L2 -reg(λ). tics given in Table 9† on different datasets are quan-
ru,t ∈R
titatively consistent with the data statistics in Ta-
ble 2† regardless of domain. Figure 4 is a Venn
J(Θ) is optimized by back-propagation learning diagram showing the sources of the sentiment terms
methods such as Adam (Kingma and Ba, 2014). extracted by ASPE from AM. All three methods
are efficacious and contribute uniquely, which can
4 Experiments also be verified by Table 10† in Section A.3.2†.
Aspect terms Table 4 presents the most frequent
4.1 Experimental Setup
aspect terms of all datasets. ItemTok is ranked
Datasets We use seven datasets from Amazon top as users tend to describe overall feelings about
Review Datasets (He and McAuley, 2016)3 includ- items. Domain-specific terms (e.g., car in AM) and
ing AutoMotive (AM), Digital Music (DM), Musi- general terms (e.g., price, quality, and size) are in-
cal Instruments (MI), Pet Supplies (PS), Sport and termingled illustrating the comprehensive coverage
Outdoors (SO), Toys and Games (TG), and Tools and the high accuracy of the result of ASPE.
and Home improvement (TH). Their statistics are
shown in Table 2. 4.3 Rating Prediction of APRE
We use 8:1:1 as the train, validation, and test Comparisons with baselines For the task of
ratio for all experiments. Users and items with less review-based rating prediction, a percentage in-
than 5 reviews and reviews with less than 5 words 3
https://jmcauley.ucsd.edu/data/amazon
4
are removed to reduce data sparsity. https://github.com/zyli93/ASPE-APREDataset Abbr. #Reviews #Users #Items Density Ttl. #W #R/U #R/T #W/R
AutoMotive AM 20,413 2,928 1,835 3.419×10−3 1.77M 6.274 10.011 96.583
Digital Music DM 111,323 14,138 11,707 6.053×10−4 5.69M 7.087 8.558 56.828
Musical Instruments MI 10,226 1,429 900 7.156×10−3 0.96M 6.440 10.226 103.958
Pet Supplies PS 157,376 19,854 8,510 8.383×10−4 14.23M 7.134 16.644 100.469
Sports and Outdoors SO 295,434 35,590 18,357 4.070×10−4 26.38M 7.471 14.484 99.199
Toys and Games TG 167,155 19,409 11,924 6.500×10−4 17.16M 7.751 12.616 114.047
Tools and Home improv. TH 134,129 16,633 10,217 7.103×10−4 15.02M 7.258 11.815 124.429
Table 2: The statistics of the seven real-world datasets. (W: Words; U: Users; T: iTems; R: Reviews.)
Model Reference Cat. U/T ID Review AM DM MI PS SO TG TH
MF - Trad. X ItemTok song ItemTok ItemTok ItemTok ItemTok ItemTok
WRMF Hu et al. (2008) Trad. X product ItemTok sound dog knife toy light
time album guitar food quality game tool
FM Rendle (2010) Trad. X
car music string cat product piece quality
ConvMF Kim et al. (2016) Deep X X
look time quality toy size quality price
NeuMF He et al. (2017b) Deep X price sound tone time price child product
D-CNN Zheng et al. (2017) Deep X quality voice price product look color bulb
D-Attn Seo et al. (2017) Deep X light track pedal price bag part battery
NARRE Chen et al. (2018) Deep X X oil lyric tuner treat fit fun size
ANR Chin et al. (2018) Deep X battery version cable water light size flashlight
MPCN Tay et al. (2018) Deep X X
DAML Liu et al. (2019) Deep X Table 4: High frequency aspects of the corpora.
AHN Dong et al. (2020) Deep X X
AHN-B Same as AHN Deep X X
the original word2vec-based AHN5 because the
Table 3: Basics of compared baselines. Models’ in- weights of word2vec vectors are trainable while
put is marked by “X”. “U” and “T” denote Users
the BERT embeddings are fixed, which reduces
and iTems. D-CNN represents DeepCoNN. AHN-B de-
notes the variant of AHN with BERT embeddings.
the parameter capacity. Within baseline models,
deep-learning-based models are generally stronger
than entity ID-based traditional methods and recent
NN terms 105
ones tend to perform better.
Frequency of AS-pairs
PMI terms 169 2734 104
94
26 228 103 AM Ablation study Ablation studies answer the
972 DM
102 MI question of which channel, explicit or implicit, con-
PS
SO tributes to the superior performance and to what
101
5558 TG extent? We measure their contributions by rows of
100 TH
100 101 102 103 104 w/o EX and w/o IM in Table 5. w/o EX presents
Lexicon Frequency Rank of AS-pairs the best MSEs of an APRE variant without explicit
features under the default settings. The impact of
Figure 4: Sources of sen- Figure 5: Freq. rank vs.
timent terms from AM. frequency of AS-pairs AS-pairs is nullified. w/o IM, in contrast, shows
the best MSEs of an APRE variant only leveraging
the explicit channel while removing the implicit
crease above 1% in performance is considered sig- one (without implicit). We observe that the opti-
nificant (Chin et al., 2018; Tay et al., 2018). Ac- mal performances of the single-channel variants all
cording to Table 5, our model outperforms all base- fall behind those of the dual-channel model, which
line models including the AHN-B on all datasets reflects positive contributions from both channels.
by a minimum of 1.337% on MI and a maximum w/o IM has lower MSEs than w/o EX on several
of 4.061% on TG, which are significant improve- datasets showing that the explicit channel can sup-
ments. It demonstrates (1) the superior capability ply comparatively more performance improvement
of our model to make accurate rating predictions in than the implicit channel. It also suggests that the
different domains (Ours vs. the rest); (2) the perfor- costly latent review encoding can be less effective
mance improvement is NOT because of the use of
5
BERT (Ours vs. AHN-B). AHN-B underperforms The authors of AHN also confirmed this observation.0.83 0.83
Models AM DM MI PS SO TG TH Perf. in MSE Perf. in MSE
Epochs num.14 Epochs num. 14
Performance in MSE
Performance in MSE
Number of epochs
Number of epochs
T RADITIONAL M ODELS 0.82 12 0.82 12
MF 1.986 1.715 2.085 2.048 2.084 1.471 1.631 10 10
0.81 0.81
WRMF 1.327 0.537 1.358 1.629 1.371 1.068 1.216 8 8
0.80 6 0.80 6
FM 1.082 0.436 1.146 1.458 1.212 0.922 1.050
4 4
D EEP L EARNING - BASED M ODELS 0.79 0.0 0.1 0.2 0.3 0.4 0.5 2 0.7910 4 10 3 10 2 2
ConvMF 1.046 0.407 1.075 1.458 1.026 0.986 1.104 Dropout Rate Initial Learning Rate
NeuMF 0.901 0.396 0.903 1.294 0.893 0.841 1.072 (a) Dropout vs. MSE and Ep. (b) Init. LR vs. MSE and Ep.
D-Attn 0.816 0.403 0.835 1.264 0.897 0.887 0.980
D-CNN 0.809 0.390 0.861 1.250 0.894 0.835 0.975 Figure 6: Hyper-parameter searching and sensitivity
NARRE 0.826 0.374 0.837 1.425 0.990 0.908 0.958
MPCN 0.815 0.447 0.842 1.300 0.929 0.898 0.969
ANR 0.806 0.381 0.845 1.327 0.906 0.844 0.981 AM DM MI PS SO TG TH
DAML 0.829 0.372 0.837 1.247 0.893 0.820 0.962 127s∗ 31min 90s∗ 36min 90min 51min 35min
AHN-B 0.810 0.385 0.840 1.270 0.896 0.829 0.976
AHN 0.802 0.376 0.834 1.252 0.887 0.822 0.967 Table 6: Per epoch run time of APRE on the seven
O UR M ODELS AND P ERCENTAGE I MPROVEMENTS datasets. The run time of AM and MI, denoted by “∗ ”,
Ours 0.791 0.359 0.823 1.218 0.863 0.788 0.936 is disproportional to their sizes since they can fit into
∆(%) 1.390 3.621 1.337 2.381 2.784 4.061 2.350 the GPU memory for acceleration.
Val. 0.790 0.362 0.821 1.216 0.860 0.790 0.933
A BLATION S TUDIES
w/o EX 0.814 0.379 0.833 1.244. 0.882 0.796 0.965 hyper-parameter sensitivities to changes on internal
w/o IM 0.798 0.374 0.863 1.226 0.873 0.798 0.956 feature dimensions (da , df , and nc ), CNN kernel
size nk , and λ of L2 -reg weight.
Table 5: MSE of baselines, our model (Ours for test
and Val. for validation), and variants. The row of ∆ cal- Efficiency A brief run time analysis of APRE is
culates the percentage improvements over the best base- given in Table 6. The model can run fast with all
lines. All reported improvements over the best base- data in GPU memory such as AM and MI, which
lines are statistically significant with p-value < 0.01.
demonstrates the efficiency of our model and the
room for improvement on the run time of datasets
than the aspect-sentiment level user and item pro- that cannot fit in the GPU memory. The efficiency
filing, which is a useful finding. of ASPE is less critical since it only runs once for
each dataset.
Hyper-parameter sensitivity A number of 4.4 Case Study for Interpretation
hyper-parameter settings are of interest, e.g., Finally, we showcase an interpretation procedure
dropout, learning rate (LR), internal feature di- of the rating estimation for an instance in AM: how
mensions (da , df , nc , and nk ), and regularization does APRE predict u∗ ’s rating for a smart driving
weight λ of the L2 -reg in J(Θ). We run each assistant t∗ using the output AS-pairs of ASPE?
set of experiments on sensitivity search 10 times We select seven example aspect categories with all
and report the average performances. We tune review snippets mentioning those categories. Each
dropout rate in [0, 0.1, 0.2, 0.3, 0.4, 0.5] and LR6 category is a set of similar aspect terms, e.g., {look,
in [0.0001, 0.0005, 0.001, 0.005, 0.01] with other design} and {beep, sound}. Without loss of gener-
hyper-parameters set to default, and report in Fig- ality, we refer to the categories as aspects. Table 7
ure 6 the minimum MSEs and the epoch numbers presents the aspects and review snippets given by
(Ep.) on AM. For dropout, we find the balance of u∗ and received by t∗ with AS-pairs annotations.
its effects on avoiding overfitting and reducing ac- Three aspects, {battery, install, look}, are shared
tive parameters at 0.2. Larger dropouts need more (yellow rows). Each side has two unique aspects
training epochs. For LR, we also target a balance never mentioned by the reviews of the other side:
between training instability of large LRs and over- {materials, smell} of u∗ (green rows) and {price,
fitting concern of small LRs, thus 0.001 is selected. sound} of t∗ (blue rows).
Larger LRs plateau earlier with fewer epochs while APRE measures the aspect-level contribu-
smaller LRs later with more. Figure 7† analyzes tions of user-attention and item-property inter-
6
The reported LRs are initial since Adam and a LR sched- actions by the last term of su,t prediction, i.e.,
uler adjust it dynamically along the training. hγ, Fex ([Gu ; Gt ])i. The contribution on the ithaspect is calculated by the ith dimension of γ times From reviews given by user u∗ . All aspects attended (3).
the ith value of Fex ([Gu ; Gt ]) which is shown in battery [To t1 ] After leaving this attached to my car for two
Table 8. The top two rows summarize the atten- days of non-use I have a dead battery. Never had a dead
battery . . . , so I am blaming this device.
tions of u∗ and the properties of t∗ . Inferred Impact
install [To t2 ] This was unbelievably easy to install. I have
states the interactional effects of user attentions done . . . . The real key . . . the installation is so easy. [To
t3 ] There were many installation options, but once . . . ,
and item properties based on our assumption that they clicked on easily.
attended aspects bear stronger impacts to the final
look [To t3 ] It was not perfect and not shiny, but it did look
prediction. On the overlapping aspects, the infe- better. [To t4 ] It takes some elbow grease, but the
results are remarkable.
rior property of battery produces the only negative
score (-0.008) whereas the advantages on install material [To t5 ] The plastic however is very thin and the cap is
pretty cheap. [To t6 ] Great value. . . . . They are very
and look create positive scores (0.019 and 0.015), hard plastic, so they don’t mark up panels.
which is consistent with the inferred impact. Other smell [To t7 ] This has a terrible smell that really lingers
aspects, either unknown to user attentions or to item awhile. It goes on green. . . .
properties, contribute relatively less: t∗ ’s unappeal- From reviews received by item t∗ .
ing price accounts for the small score 0.009 and the battery [From u1 ] The reason this won’t work on an iPhone 4
mixture property of sound accounts for the 0.006. or . . . because it uses low power Bluetooth, . . . . (7)
This case study demonstrates the usefulness of install [From u2 ] Your mileage and gas mileage and cost of
fuel is tabulated for each trip- Installation is pretty
the numbers that add up to ŝu,t . Although small in simple - but it . . . . (3)
scale, they carry significant information of valued look [From u3 ] Driving habits, fuel efficiency, and engine
or disliked aspects in u∗ ’s perception of t∗ . This health are nice features. The overall design is nice and
easy to navigate. (3)
process of decomposition is a great way to interpret
model prediction on an aspect-level granularity, price [From u4 ] In fact, there are similar products to this
available at a much lower price that do work with . . . (7)
which is a capacity that other baseline models do
sound [From u5 ] The Link device makes an audible sound
not enjoy. when you go over 70 mpg, brake hard, or accelerate too
fast. (3) [From u6 ] Also, the beep the link device
In Section A.3.5†, another case study indicates makes . . . sounds really cheapy. (7)
that a certain imperfect item property without user
attentions only inconsiderably affects the rating Table 7: Examples of reviews given by u∗ and received
although the aspect is mentioned by the user’s re- by t∗ with Aspect-Sentiment pair mentions as well as
views. other sentiment evidences on seven example aspects.
5 Conclusion Aspects material smell battery install look price sound
Attn. of u∗ 3 3 3 3 3 n/a n/a
In this work, we propose a tightly coupled two- Prop. of t∗ n/a n/a 7 3 3 7 3/7
Inferred Impact Unk. Unk. Neg. Pos. Pos. Unk. Unk.
stage review-based rating predictor, consisting
γ i Fex (·)i (×10−2 ) 1.0 0.8 -0.8 1.9 1.5 0.9 0.6
of an Aspect-Sentiment Pair Extractor (ASPE)
and an Attention-Property-aware Rating Estimator Table 8: Attentions and properties summaries, inferred
(APRE). ASPE extracts aspect-sentiment pairs (AS- impacts, and the learned aspect-level contributions.
pairs) from reviews and APRE learns explicit user
attentions and item properties as well as implicit
sentence semantics to predict the rating. Extensive Acknowledgement
quantitative and qualitative experimental results
We would like to thank the reviewers for their help-
demonstrate that ASPE accurately and compre-
ful comments. The work was partially supported
hensively extracts AS-pairs without using domain-
by NSF DGE-1829071 and NSF IIS-2106859.
specific training data and APRE outperforms the
state-of-the-art recommender frameworks and ex- Broader Impact Statement
plains the prediction results taking advantage of the
extracted AS-pairs. This paper proposes a rating prediction model that
Several challenges are left open such as fully or has a great potential to be widely applied to rec-
weakly supervised open domain AS-pair extraction ommender systems with reviews due to its high
and end-to-end design for AS-pair extraction and accuracy. In the meantime, it tries to relieve the un-
rating prediction. We leave these problems for justifiability issue for black-box neural networks by
future work. suggesting what aspects of an item a user may feelsatisfied or dissatisfied with. The recommender SIGKDD International Conference on Knowledge
system can better understand the rationale behind Discovery and Data Mining.
users’ reviews so that the merits of items can be car-
David M Blei, Andrew Y Ng, and Michael I Jordan.
ried forward while the defects can be fixed. As far 2003. Latent dirichlet allocation. Journal of ma-
as we are concerned, this work is the first work that chine Learning research.
takes care of both rating prediction and rationale
understanding utilizing NLP techniques. Chong Chen, Min Zhang, Yiqun Liu, and Shaoping
Ma. 2018. Neural attentional rating regression with
We then address the generalizability and deploy- review-level explanations. In Proceedings of the
ment issues. Reported experiments are conducted 2018 World Wide Web Conference.
on different domains in English with distinct review
Shaowei Chen, Jie Liu, Yu Wang, Wenzheng Zhang,
styles and diverse user populations. We can ob-
and Ziming Chi. 2020a. Synchronous double-
serve that our model performs consistently which channel recurrent network for aspect-opinion pair
supports its generalizability. Ranging from smaller extraction. In Proceedings of the 58th Annual Meet-
datasets to larger datasets, we have not noticed any ing of the Association for Computational Linguis-
potential deployment issues. Instead, we notice tics.
that stronger computational resources can greatly Xiao Chen, Changlong Sun, Jingjing Wang, Shoushan
speed up the training and inference and scale up Li, Luo Si, Min Zhang, and Guodong Zhou. 2020b.
the problem size while keeping the major execution Aspect sentiment classification with document-level
pipeline unchanged. sentiment preference modeling. In Proceedings of
the 58th Annual Meeting of the Association for Com-
In terms of the potential harms and misuses, we putational Linguistics.
believe they and their consequences involve two
perspectives: (1) the harm of generating inaccurate Jin Yao Chin, Kaiqi Zhao, Shafiq Joty, and Gao Cong.
or suboptimal results from this recommender; (2) 2018. Anr: Aspect-based neural recommender. In
Proceedings of the 27th ACM International Confer-
the risk of misuse (attack) of this model to reveal ence on Information and Knowledge Management.
user identity. For point (1), the potential risk of
suboptimal results has little impact on the major Hongliang Dai and Yangqiu Song. 2019. Neural as-
function of online shopping websites since recom- pect and opinion term extraction with mined rules
as weak supervision. In Proceedings of the 57th An-
menders are only in charge of suggestive content. nual Meeting of the Association for Computational
For point (2), our model does not involve user and Linguistics.
item ID modeling. Also, we aggregate the user re-
views in the representation space so that user iden- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2019. BERT: pre-training of
tity is hard to infer through reverse-engineering at- deep bidirectional transformers for language under-
tacks. In all, we believe our model has little risk of standing. In Proceedings of the 2019 Conference of
causing dysfunction of online shopping platforms the North American Chapter of the Association for
and leakages of user identities. Computational Linguistics: Human Language Tech-
nologies, NAACL-HLT.
Xin Dong, Jingchao Ni, Wei Cheng, Zhengzhang Chen,
References Bo Zong, Dongjin Song, Yanchi Liu, Haifeng Chen,
and Gerard de Melo. 2020. Asymmetrical hierar-
Stefanos Angelidis and Mirella Lapata. 2018. Summa- chical networks with attentive interactions for in-
rizing opinions: Aspect extraction meets sentiment terpretable review-based recommendation. In Pro-
prediction and they are both weakly supervised. In ceedings of the AAAI Conference on Artificial Intel-
Proceedings of the 2018 Conference on Empirical ligence.
Methods in Natural Language Processing (EMNLP).
Chunning Du, Haifeng Sun, Jingyu Wang, Qi Qi, and
Stefano Baccianella, Andrea Esuli, and Fabrizio Sebas- Jianxin Liao. 2020. Adversarial and domain-aware
tiani. 2010. Sentiwordnet 3.0: an enhanced lexical bert for cross-domain sentiment analysis. In Pro-
resource for sentiment analysis and opinion mining. ceedings of the 58th Annual Meeting of the Associa-
In Proceedings of the Seventh International Confer- tion for Computational Linguistics.
ence on Language Resources and Evaluation.
Xinyu Guan, Zhiyong Cheng, Xiangnan He, Yongfeng
Konstantin Bauman, Bing Liu, and Alexander Tuzhilin. Zhang, Zhibo Zhu, Qinke Peng, and Tat-Seng Chua.
2017. Aspect based recommendations: Recom- 2019. Attentive aspect modeling for review-aware
mending items with the most valuable aspects based recommendation. ACM Transactions on Informa-
on user reviews. In Proceedings of the 23rd ACM tion Systems (TOIS), 37(3):1–27.You can also read