ABSTRACT
In order to ease content enrichment, exchange, and sharing, web-scale collaborative platforms such as Wikipedia or Google Docs enable unbounded interactions between a large number of contributors, without prior knowledge of their level of expertise and reliability. Version control is then essential for keeping track of the evolution of the shared content and its provenance. In such environments, uncertainty is ubiquitous due to the unreliability of the sources, the incompleteness and imprecision of the contributions, the possibility of malicious editing and vandalism acts, etc. To handle this uncertainty, we use a probabilistic XML model as a basic component of our version control framework. Each version of a shared document is represented by an XML tree and the whole document, together with its different versions, is modeled as a probabilistic XML document. Uncertainty is evaluated using the probabilistic model and the reliability measure associated to each source, each contributor, or each editing event, resulting in an uncertainty measure on each version and each part of the document. We show that standard version control operations can be implemented directly as operations on the probabilistic XML model; efficiency with respect to deterministic version control systems is demonstrated on real-world datasets.
- Cassandra Project. http://cassandra.apache.org/.Google Scholar
- Google Drive. https://drive.google.com/.Google Scholar
- Java Git. http://www.eclipse.org/jgit/.Google Scholar
- Linux Kernel. https://www.kernel.org/.Google Scholar
- {Sub}Versioning for Java. http://svnkit.com/.Google Scholar
- Wikipedia Platform. http://www.wikipedia.org/.Google Scholar
- T. Abdessalem, M. L. Ba, and P. Senellart. A probabilistic XML merging tool. In EDBT, 2011. Demonstration. Google ScholarDigital Library
- T. Abdessalem and G. Jomier. VQL: A query language for multiversion databases. In DBPL, 1997. Google ScholarDigital Library
- S. Abiteboul, B. Kimelfeld, Y. Sagiv, and P. Senellart. On the expressiveness of probabilistic XML models. VLDB Journal, 18(5), 2009. Google ScholarDigital Library
- B. T. Adler and L. de Alfaro. A content-driven reputation system for the Wikipedia. In WWW, 2007. Google ScholarDigital Library
- A. Al-Khudair, W. A. Gray, and J. C. Miles. Dynamic evolution and consistency of collaborative configurations in object-oriented databases. In Proc. TOOLS, 2001. Google ScholarDigital Library
- K. Altmanninger, M. Seidl, and M. Wimmer. A survey on model versioning approaches. IJWIS, 5, 2009.Google Scholar
- M. L. Ba, T. Abdessalem, and P. Senellart. Towards a version control model with uncertain data. In PIKM, 2011. Google ScholarDigital Library
- W. Cellary and G. Jomier. Consistency of versions in object-oriented databases. In VLDB, 1990. Google ScholarDigital Library
- S. Chacon. Git Book. http://book.git-scm.com/.Google Scholar
- G. Cobéna and T. Abdessalem. A comparative study of XML change detection algorithms. In Services and Business Computing Solutions with XML: Applications for Quality Management and Best Processes. IGI Global, 2009.Google ScholarCross Ref
- G. Cobéna, S. Abiteboul, and A. Marian. Detecting Changes in XML Documents. In ICDE, 2002.Google ScholarCross Ref
- B. Collins-Sussman, B. W. Fitzpatrick, and C. M. Pilato. Version Control with Subversion. O'Reilly Media, 2008. Google ScholarDigital Library
- R. Conradi and B. Westfechtel. Towards a uniform version model for software configuration management. In System Configuration Management, 1997. Google ScholarDigital Library
- G. de la Calzada and A. Dekhtyar. On measuring the quality of Wikipedia articles. In WICOW, 2010. Google ScholarDigital Library
- L. Khan, L. Wang, and Y. Rao. Change detection of XML documents using signatures. In Real World RDF and Semantic Web Applications, 2002.Google Scholar
- E. Kharlamov, W. Nutt, and P. Senellart. Updating Probabilistic XML. In Updates in XML, 2010. Google ScholarDigital Library
- B. Kimelfeld, Y. Kosharovsky, and Y. Sagiv. Query evaluation over probabilistic XML. VLDB Journal, 18(5), 2009. Google ScholarDigital Library
- B. Kimelfeld and Y. Sagiv. Modeling and querying probabilistic XML data. SIGMOD Rec., 37(4), 2009. Google ScholarDigital Library
- B. Kimelfeld and P. Senellart. Probabilistic XML\string: Models and complexity. In Z. Ma and L. Yan, editors, Advances in Probabilistic Databases for Uncertain Information Management. Springer-Verlag, 2013.Google ScholarCross Ref
- A. Koc and A. U. Tansel. A survey of version control systems. In ICEME, 2011.Google Scholar
- T. Lindholm, J. Kangasharju, and S. Tarkoma. Fast and simple XML tree differencing by sequence alignment. In DocEng, 2006. Google ScholarDigital Library
- M. Magnani and D. Montesi. A survey on uncertainty management in data integration. J. Data and Information Quality, 2, 2010. Google ScholarDigital Library
- S. Maniu, B. Cautis, and T. Abdessalem. Building a signed network from interactions in Wikipedia. In DBSocial, 2011. Google ScholarDigital Library
- A. Nierman and H. V. Jagadish. ProTDB: probabilistic data in XML. In VLDB, 2002. Google ScholarDigital Library
- S. Rönnau and U. Borghoff. Versioning XML-based office documents. Multimedia Tools and Applications, 43, 2009. Google ScholarDigital Library
- S. Rönnau and U. Borghoff. XCC: change control of XML documents. CSRD, 2010.Google Scholar
- L. I. Rusu, W. Rahayu, and D. Taniar. Maintaining versions of dynamic XML documents. In WISE, 2005. Google ScholarDigital Library
- M. Sabel. Structuring wiki revision history. In WikiSym, 2007. Google ScholarDigital Library
- C. Thao and E. V. Munson. Version-aware XML documents. In DocEng, 2011. Google ScholarDigital Library
- M. van Keulen and A. de Keijzer. Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB Journal, 18, 2009. Google ScholarDigital Library
- M. Van Keulen, A. de Keijzer, and W. Alink. A Probabilistic XML Approach to Data Integration. In ICDE, 2005. Google ScholarDigital Library
- J. Voss. Measuring Wikipedia. In ISSI, 2005.Google Scholar
- Y. Wang, D. J. DeWitt, and J.-Y. Cai. X-Diff: An Effective Change Detection Algorithm for XML Documents. In ICDE, 2003.Google ScholarCross Ref
Index Terms
- Uncertain version control in open collaborative editing of tree-structured documents
Recommendations
Version-aware XML documents
DocEng '11: Proceedings of the 11th ACM symposium on Document engineeringA document often goes through many revisions before it is finalized. In the normal document creation process, newer revisions overwrite older ones and only the final revision is kept. At any stage of document creation, it might be desirable to see how ...
Towards XML version control of office documents
DocEng '05: Proceedings of the 2005 ACM symposium on Document engineeringOffice applications such as OpenOffice and Microsoft Office are widely used to edit the majority of today's business documents: office documents. Usually, version control systems consider office documents as binary objects, thus severely hindering ...
Towards a version control model with uncertain data
PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge managementContent-based online collaborative platforms and office applications are widely used for collaborating and exchanging data, in particular in the form of XML-based electronic documents. Usually, a version control system is built-in in these applications ...
Comments