skip to main content
10.1145/2187836.2187943acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections

Information integration over time in unreliable and uncertain environments

Published: 16 April 2012 Publication History


Often an interesting true value such as a stock price, sports score, or current temperature is only available via the observations of noisy and potentially conflicting sources. Several techniques have been proposed to reconcile these conflicts by computing a weighted consensus based on source reliabilities, but these techniques focus on static values. When the real-world entity evolves over time, the noisy sources can delay, or even miss, reporting some of the real-world updates. This temporal aspect introduces two key challenges for consensus-based approaches: (i) due to delays, the mapping between a source's noisy observation and the real-world update it observes is unknown, and (ii) missed updates may translate to missing values for the consensus problem, even if the mapping is known. To overcome these challenges, we propose a formal approach that models the history of updates of the real-world entity as a hidden semi-Markovian process (HSMM). The noisy sources are modeled as observations of the hidden state, but the mapping between a hidden state (i.e. real-world update) and the observation (i.e. source value) is unknown. We propose algorithms based on Gibbs Sampling and EM to jointly infer both the history of real-world updates as well as the unknown mapping between them and the source values. We demonstrate using experiments on real-world datasets how our history-based techniques improve upon history-agnostic consensus-based approaches.


X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. Proc. VLDB Endow., 2:562--573, August 2009.
D. Freitag. Multistrategy learning for information extraction. In ICML, pages 161--169, 1998.
A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In WSDM, pages 131--140. ACM, 2010.
S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6):721--741, Nov. 1984.
N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. Radar and Signal Processing, IEE Proceedings F, 140(2):107--113, Apr. 1993.
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. JACM, 46(5):604--632, 1999.
P. Li, X. L. Dong, A. Maurino, and D. Srivastava. Linking temporal records. VLDB 2011.
D. Liu, P. Ning, A. Liu, C. Wang, and W. Du. Attack-resistant location estimation in wireless sensor networks. ACM Trans. Inf. Syst. Secur., 11(4), 2008.
S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48(3):443 -- 453, 1970.
D. Sankoff. Matching sequences under deletion/insertion constraints. PNAS, 69(1):4--6, 1972.
P. H. Sellers. On the theory and computation of evolutionary distances. SIAM Journal on Applied Mathematics, 26(4):pp. 787--793, 1974.
A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. on Info. Theory, 13(2):260--269, Apr. 1967.
R. A. Wagner and M. J. Fischer. The string-to-string correction problem. JACM, 21:168--173, 1974.
D. H. Wolpert. Stacked generalization. Neural Networks, 5(2):241--259, 1992.
X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In KDD, pages 1048--1052, 2007.
S. Z. Yu. Hidden semi-markov models. Artificial Intelligence, 174(2):215--243, 2010.
S.-Z. Yu and H. Kobayashi. A hidden semi-markov model with missing data and multiple observation sequences for mobility tracking. Signal Processing, 83(2):235--250, 2003.

Cited By

View all
  • (2024)Generalizing truth discovery by incorporating multi-truth featuresComputing10.1007/s00607-024-01288-9106:5(1557-1583)Online publication date: 22-Apr-2024
  • (2018)Exploring changeProceedings of the VLDB Endowment10.14778/3282495.328249612:2(85-98)Online publication date: 1-Oct-2018
  • (2018)Leveraging the Crowd to Detect and Reduce the Spread of Fake News and MisinformationProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159734(324-332)Online publication date: 2-Feb-2018
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Other conferences
WWW '12: Proceedings of the 21st international conference on World Wide Web
April 2012
1078 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]


  • Univ. de Lyon: Universite de Lyon



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2012


Request permissions for this article.

Check for updates

Author Tags

  1. information integration
  2. probabilistic model
  3. semi-markov


  • Research-article


WWW 2012
  • Univ. de Lyon
WWW 2012: 21st World Wide Web Conference 2012
April 16 - 20, 2012
Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics


Cited By

View all
  • (2024)Generalizing truth discovery by incorporating multi-truth featuresComputing10.1007/s00607-024-01288-9106:5(1557-1583)Online publication date: 22-Apr-2024
  • (2018)Exploring changeProceedings of the VLDB Endowment10.14778/3282495.328249612:2(85-98)Online publication date: 1-Oct-2018
  • (2018)Leveraging the Crowd to Detect and Reduce the Spread of Fake News and MisinformationProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159734(324-332)Online publication date: 2-Feb-2018
  • (2018)Dynamic Truth Discovery on Numerical Data2018 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2018.00097(817-826)Online publication date: Nov-2018
  • (2017)Distilling Information Reliability and Source Trustworthiness from Digital TracesProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052672(847-855)Online publication date: 3-Apr-2017
  • (2017)Profiling Entities over Time in the Presence of Unreliable SourcesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.268480429:7(1522-1535)Online publication date: 1-Jul-2017
  • (2017)Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS.2017.196(966-976)Online publication date: Jun-2017
  • (2017)Constraint-aware dynamic truth discovery in big data social media sensing2017 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2017.8257911(57-66)Online publication date: Dec-2017
  • (2017)Exploring Scalability and Time-Sensitiveness in Reliable Social Sensing With Accuracy AssessmentIEEE Access10.1109/ACCESS.2017.27074805(14405-14418)Online publication date: 2017
  • (2016)A Time Machine for InformationACM SIGMOD Record10.1145/3003665.300367145:2(23-32)Online publication date: 28-Sep-2016
  • Show More Cited By

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media