ABSTRACT
In this era of information explosion, conflicts are often encountered when information is provided by multiple sources. Traditional truth discovery task aims to identify the truth the most trustworthy information, from conflicting sources in different scenarios. In this kind of tasks, truth is regarded as a fixed value or a set of fixed values. However, in a number of real-world cases, objective truth existence cannot be ensured and we can only identify single or multiple reliable facts from opinions. Different from traditional truth discovery task, we address this uncertainty and introduce the concept of trustworthy opinion of an entity, treat it as a random variable, and use its distribution to describe consistency or controversy, which is particularly difficult for data which can be numerically measured, i.e. quantitative information. In this study, we focus on the quantitative opinion, propose an uncertainty-aware approach called Kernel Density Estimation from Multiple Sources (KDEm) to estimate its probability distribution, and summarize trustworthy information based on this distribution. Experiments indicate that KDEm not only has outstanding performance on the classical numeric truth discovery task, but also shows good performance on multi-modality detection and anomaly detection in the uncertain-opinion setting.
- A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, 2011.Google Scholar
- D. P. Bertsekas. Nonlinear programming. Athena Scientific, 1999.Google Scholar
- X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. PVLDB, 2(1):550--561, 2009. Google ScholarDigital Library
- X. L. Dong, L. Berti-Equille, and D. Srivastava. Data fusion: resolving conflicts from multiple sources. In WAIM, 2013. Google ScholarDigital Library
- A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In WSDM, 2010. Google ScholarDigital Library
- A. Hinneburg and H.-H. Gabriel. Denclue 2.0: Fast clustering based on kernel density estimation. In Advances in Intelligent Data Analysis VII, pages 70--80. Springer, 2007. Google ScholarDigital Library
- J. Kim and C. D. Scott. Robust kernel density estimation. JMLR, 13(1):2529--2565, 2012. Google ScholarDigital Library
- L.-W. Ku, Y.-T. Liang, and H.-H. Chen. Opinion extraction, summarization and tracking in news and blog corpora. In AAAI spring symposium: Computational approaches to analyzing weblogs, volume 100107, 2006.Google Scholar
- Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, M. Demirbas, W. Fan, and J. Han. A confidence-aware approach for truth discovery on long-tail data. PVLDB, 8(4):425--436, 2014. Google ScholarDigital Library
- Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD, 2014. Google ScholarDigital Library
- X. Li, W. Meng, and C. Yu. T-verifier: Verifying truthfulness of fact statements. In ICDE, 2011. Google ScholarDigital Library
- Y. Li, J. Gao, C. Meng, Q. Li, L. Su, B. Zhao, W. Fan, and J. Han. A survey on truth discovery. ACM SIGKDD Explorations Newsletter, 17(2):1--16, 2016. Google ScholarDigital Library
- R. W. Ouyang, L. Kaplan, P. Martin, A. Toniolo, M. Srivastava, and T. J. Norman. Debiasing crowdsourced quantitative characteristics in local businesses and services. In IPSN, 2015. Google ScholarDigital Library
- E. Parzen. On estimation of a probability density function and mode. The annals of mathematical statistics, pages 1065--1076, 1962.Google Scholar
- J. Pasternack and R. Dan. Making better informed trust decisions with generalized fact-finding. In IJCAI, 2011. Google ScholarDigital Library
- J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In COLING, 2010. Google ScholarDigital Library
- J. Pasternack and D. Roth. Latent credibility analysis. In WWW, 2013. Google ScholarDigital Library
- G. J. Qi, C. C. Aggarwal, J. Han, and T. Huang. Mining collective intelligence in diverse groups. In WWW, 2013. Google ScholarDigital Library
- V. G. V. Vydiswaran, C. X. Zhai, and D. Roth. Content-driven trust propagation framework. In SIGKDD, 2011. Google ScholarDigital Library
- D. Wang, L. Kaplan, H. Le, and T. Abdelzaher. On truth discovery in social sensing: A maximum likelihood estimation approach. In IPSN, 2012. Google ScholarDigital Library
- H. Wang, Y. Lu, and C. Zhai. Latent aspect rating analysis on review text data: a rating regression approach. In SIGKDD, 2010. Google ScholarDigital Library
- H. Wang, Y. Lu, and C. Zhai. Latent aspect rating analysis without aspect keyword supervision. In SIGKDD, 2011. Google ScholarDigital Library
- X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. Knowledge and Data Engineering, IEEE Transactions on, 20(6):796--808, 2008. Google ScholarDigital Library
- B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. QDB Workshop, 2012.Google Scholar
- B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6):550--561, 2012. Google ScholarDigital Library
- S. Zhi, B. Zhao, W. Tong, J. Gao, D. Yu, H. Ji, and J. Han. Modeling truth existence in truth discovery. In SIGKDD, 2011. Google ScholarDigital Library
- D. Zhou, J. C. Platt, S. Basu, and Y. Mao. Learning from the wisdom of crowds by minimax entropy. In NIPS, 2012.Google ScholarDigital Library
Index Terms
- From Truth Discovery to Trustworthy Opinion Discovery: An Uncertainty-Aware Quantitative Modeling Approach
Recommendations
Towards Fair Truth Discovery from Biased Crowdsourced Answers
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningCrowdsourcing systems have gained considerable interest and adoption in recent years. One important research problem for crowdsourcing systems is truth discovery, which aims to aggregate noisy answers contributed by the workers to obtain the correct ...
Towards Confidence in the Truth: A Bootstrapping based Truth Discovery Approach
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningThe demand for automatic extraction of true information (i.e., truths) from conflicting multi-source data has soared recently. A variety of truth discovery methods have witnessed great successes via jointly estimating source reliability and truths. All ...
Empowering Truth Discovery with Multi-Truth Prediction
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementTruth discovery is the problem of detecting true values from the conflicting data provided by multiple sources on the same data items. Since sources' reliability is unknown a priori, a truth discovery method usually estimates sources' reliability along ...
Comments