Topics
Survey Articles
To get an overview of data exchange, data integration, and data streams, participants are requested to read the following three survey articles that summarize invited tutorials/talks of the DEIS'10 organizers at the ACM Principles of Database systems conference (PODS) in 2002, 2005, and 2007.-
Data Exchange:
[Slides of the DEIS'10 tutorial given by Phokion Kolaitis]
Schema mappings, data exchange, and metadata management. Phokion G. Kolaitis. Invited talk at PODS'05: 24th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 61-75, Baltimore, Maryland, USA, June 2005. -
Data Integration:
[Slides of the DEIS'10 tutorial given by Maurizio Lenzerini]
Data Integration: A Theoretical Perspective. Maurizio Lenzerini. Invited tutorial at PODS'02: 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 233-246, Madison, Wisconsin, USA, June 2002. -
Data Streams:
[Slides of the DEIS'10 tutorial given by Nicole Schweikardt]
Machine models and lower bounds for query processing. Nicole Schweikardt. Invited tutorial at PODS'07: 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 41-52, Beijing, China, June 2007.
Additional information can be found in the monographs
- Relational and XML Data Exchange by Marcelo Arenas, Pablo Barcelo, Leonid Libkin, and Filip Murlak. Morgan and Claypool Publishers, Synthesis Lectures on Data Management, 2010, Vol. 2, No. 1, Pages 1-112.
- Data Streams: Algorithms and Applications by S. Muthukrishnan. Foundations and Trends in Theoretical Computer Science, 1(2), 2005.
- Logical Foundations of Relational Data Exchange. Pablo Barcelo. SIGMOD Record 38(1): 49-58 (2009).
- Composition and Inversion of Schema Mappings. Marcelo Arenas, Jorge Perez, Juan Reutter, and Cristian Riveros. SIGMOD Record 38(3): 17-28 (2009).
- Machine models for query processing. Nicole Schweikardt. SIGMOD Record 38(2): 18-28 (2009).
Specialized Topics for Participant Presentations
Each participant will be asked to study the relevant literature in one of the following
specialized topics that will be assigned to the participant by the organizers
of DEIS'10, based on the interests and expertise of the participants (indicated in the
letter of interest submitted by the participant).
The topics to be covered during DEIS'10 can be found in the
following bibliography.
Bibliography
-
The chase procedure and its applications to data exchange
[Slides of the DEIS'10 presentation given by Adrian Onet]- Task: Give an overview of the chase procedure starting with some of the original papers about the chase (e.g."A proof procedure for data dependencies" by Beeri and Vardi – JACM 1984) and perhaps ending with the "Chase Revisited" by Deutsch, Nash, Remmel – PODS 2008. The presentation should include some of the uses of the chase procedure to data exchange (e.g., the construction of canonical universal solutions).
-
Algorithms for computing the core of universal
solutions
[Slides of the DEIS'10 presentation given by Vadim Savenkov]- R. Fagin, P.G. Kolaitis, and L. Popa: Data exchange: getting to the core. ACM Trans. Database Syst. 30(1), pp. 174-210, 2005.
- G. Gottlob and A. Nash: Efficient core computation in data exchange. J. ACM 55(2), 2008.
- R. Pichler and V. Savenkov: Towards practical feasibility of core computation in data exchange. Theor. Comput. Sci. 411(7-9), pp. 935-957, 2010.
- B. Marnette: Generalized schema mappings: from termination to tractability. In Proc. PODS'09 (Symposium on Principles of Database Systems), pp. 13-22, 2009.
- B. ten Cate, L. Chiticariou, P.G. Kolaitis, and W.C. Tan: Laconic Schema Mappings: Computing the Core with SQL Queries. Proc. VLDB'09 (International Conference on Very Large Data Bases), volume 2(1): pp. 1006-1017, 2009.
-
The inverse operator on schema mappings and its uses in data exchange
[Slides of the DEIS'10 presentation given by Jorge Perez]- Task: Give a balanced overview of the various approaches to the inverse operator presented in recent database theory conferences, including the original inverse operator by Fagin, quasi-inverses, maximum recoveries, and maximum extended recoveries.
-
Semantics of query answering in data exchange / closed
world reasoning
[Slides of the DEIS'10 presentation given by Andre Hernich]- R. Fagin, P.G. Kolaitis, R.J. Miller, and L. Popa: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1):89-124, 2005.
- L. Libkin: Data exchange and incomplete information. In Proc. PODS'06 (Symposium on Principles of Database Systems), pp. 60-69, 2006.
- A. Hernich and N. Schweikardt: CWA-solutions for data exchange settings with target dependencies. In Proc. PODS'07 (Symposium on Principles of Database Systems), pp. 113-122, 2007.
- L. Libkin and C. Sirangelo: Data exchange and schema mappings in open and closed worlds. In Proc. PODS'08 (Symposium on Principles of Database Systems), pp. 139-148, 2008.
- F.N. Afrati and P.G. Kolaitis: Answering aggregate queries in data exchange. In Proc. PODS'08 (Symposium on Principles of Database Systems), pp. 129-138, 2008.
- A. Hernich: Answering Non-Monotonic Queries in Relational Data Exchange. In Proc. ICDT'10 (International Conference on Database Theory), pp. 143-154, 2010.
-
Integerity Constraints in Data Exchange
[Slides of the DEIS'10 presentation given by Victor Didier Gutierrez Basulto]-
R. Fagin, P.G. Kolaitis, R.J. Miller, and L. Popa:
Data exchange: semantics and query answering.
Theor. Comput. Sci. 336(1):89-124, 2005.
(with emphasis on weakly acyclic sets of tgds and egds) - A. Nash, P.A. Bernstein, and S. Melnik: Composition of mappings given by embedded dependencies. ACM Trans. Database Syst. 32(1): 4, 2007.
- M. Arenas, R. Fagin, and A. Nash: Composition with target constraints. In Proc. ICDT 2010 (International Conference on Database Theory), pp. 129-142, 2010.
-
R. Fagin, P.G. Kolaitis, R.J. Miller, and L. Popa:
Data exchange: semantics and query answering.
Theor. Comput. Sci. 336(1):89-124, 2005.
-
Query answering in data integration
[Slides of the DEIS'10 presentation given by Piotr Wieczorek]- S. Abiteboul and O.M. Duschka: Complexity of Answering Queries Using Materialized Views. In Proc. PODS'98 (Symposium on Principles of Database Systems), pp. 254-263, 1998.
- O.M. Duschka, M.R. Genesereth, and A.Y. Levy: Recursive Query Plans for Data Integration. J. Log. Program. 43(1), pp. 49-73, 2000.
- R. Pottinger and A.Y. Halevy: MiniCon: A scalable algorithm for answering queries using views. VLDB J. 10(2-3), pp. 182-198, 2001.
- A. Deutsch, B. Ludäscher, and A. Nash: Rewriting queries using views with access patterns under integrity constraints. Theor. Comput. Sci. 371(3), pp. 200-226, 2007.
-
Data Integration: Consistent Query Answering
[Slides of the DEIS'10 presentation given by Slawomir Staworko]- Task: Give an overview on database repairs and consistent query answering with emphasis on the semantics and the complexity, and also try to make a connection to possible applications of inconsistent databases to data exchange and data integration.
-
Description Logics for Data Integration
[Slides of the DEIS'10 presentation given by Yazmin Angelica Ibanez-Garcia]- C. Beeri, A.Y. Levy, and M.-C. Rousset: Rewriting Queries Using Views in Description Logics. In Proc. PODS'97 (Symposium on Principles of Database Systems), pp. 99-108, 1997.
- A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati: Linking Data to Ontologies. J. Data Semantics 10, pp. 133-173, 2008.
- D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning 39(3), pp. 385-429, 2007.
- R. Kontchakov, C. Lutz, D. Toman, F. Wolter, and M. Zakharyaschev: Combined FO Rewritability for Conjunctive Query Answering in DL-Lite. Description Logics'09 (International Workshop on Description Logics), 2009.
- A. Cali, G. Gottlob, and T. Lukasiewicz: A general datalog-based framework for tractable query answering over ontologies. In Proc. PODS'09 (Symposium on Principles of Database Systems), pp. 77-86, 2009.
-
Data cleaning for data integration
[Slides of the DEIS'10 presentation given by Ekaterini Ioannou]- N. Koudas, S. Sarawagi, and D. Srivastava: Record linkage: similarity measures and algorithms. In Proc. SIGMOD'06 (SIGMOD Conference), pp. 802-803, 2006.
- X. Dong, A.Y. Halevy, and J. Madhavan: Reference Reconciliation in Complex Information Spaces. In Proc. SIGMOD'05 (SIGMOD Conference), pp. 85-96, 2005.
- W.E. Winkler: Overview of Record Linkage and Current Research Directions. RESEARCH REPORT SERIES (Statistics #2006-2), Statistical Research Division U.S. Census Bureau Washington, DC 20233. Available at www.census.gov/srd/papers/pdf/rrs2006-02.pdf.
-
View-based query processing
[Slides of the DEIS'10 presentation given by Paolo Guagliardo]- D. Calvanese, G. De Giacomo, M. Lenzerini, and M.Y. Vardi: Rewriting of Regular Expressions and Regular Path Queries. J. Comput. Syst. Sci. 64(3), pp. 443-465, 2002.
- D. Calvanese, G. De Giacomo, M. Lenzerini, and M.Y. Vardi: View-based query processing: On the relationship between rewriting, answering and losslessness. Theor. Comput. Sci. 371(3), pp. 169-182, 2007.
- A. Nash, L. Segoufin, and V. Vianu: Views and queries: Determinacy and rewriting. ACM Trans. Database Syst. 35(3), 2010.
- M. Marx: Queries determined by views: pack your views. In Proc. PODS'07 (Symposium on Principles of Database Systems), pp. 23-30, 2007.
- T.D. Millstein, A.Y. Halevy, and M. Friedman: Query containment for data integration systems. J. Comput. Syst. Sci. 66(1), pp. 20-39, 2003.
-
Analyzing, comparing and debugging schema mappings
[Slides of the DEIS'10 presentation given by Emanuel Sallinger]- L. Chiticariu and W.C. Tan: Debugging schema mappings with routes. In Proc. VLDB'06 (International Conference on Very Large Data Bases), pp. 79-90, 2006.
- R. Fagin, P.G. Kolaitis, A. Nash, and L. Popa: Towards a theory of schema-mapping optimization. In Proc. PODS'08 (Symmposium on Principles of Database Systems), pp. 33-42, 2008.
- G. Gottlob, R. Pichler, and V. Savenkov: Normalization and optimization of schema mappings. In Proc. VLDB'09 (International Conference on Very Large Data Bases), volume 2(1), pp. 1102-1113, 2009.
- M. Arenas, J. Perez, J.L. Reutter, and C. Riveros: Foundations of schema-mapping management. In Proc. PODS'10 (Symposium on Principles of Database Systems), pp. 227-238, 2010.
-
Probabilistic data integration and probabilistic data exchange
[Slides of the DEIS'10 presentation given by Livia Predoiu]- Task: Give an overview of recent work on probabilistic description logics and probabilistic ontology mappings, and, in addition, cover some other approaches to probabilistic data integration and exchange, such as "Data Integration with Uncertainty" by Dong, Halevy, Yu – VLDB Journal 2009 and "Probabilistic Data Exchange" by Fagin, Kimmelfeld, Kolaitis – ICDT 2010.
-
Learning and discovering queries and mappings
[Slides of the DEIS'10 presentation given by Marie Jacob]- G. Gottlob and P. Senellart: Schema mapping discovery from data instances. J. ACM 57(2), 2010.
- A. Das Sarma, X. Dong, and A.Y. Halevy: Bootstrapping pay-as-you-go data integration systems. In Proc. SIGMOD'08 (SIGMOD Conference), pp. 861-874, 2008.
- R.J. Miller, L.M. Haas, and M.A. Hernandez: Schema Mapping as Query Discovery. In Proc. VLDB'00 (International Conference on Very Large Data Bases), pp. 77-88, 2000. Extended Version: University of Toronto Technical Report, CRSG-412.
- L. Chiticariu, P.G. Kolaitis, and L. Popa: Interactive generation of integrated schemas. In Proc. SIGMOD'08 (SIGMOD Conference), pp. 833-846, 2008.
- R. Dhamankar, Y. Lee, A. Doan, A.Y. Halevy, and P. Domingos: iMAP: Discovering Complex Mappings between Database Schemas. In Proc. SIGMOD'04 (SIGMOD Conference), pp. 383-394, 2004.
-
Peer Data Management Systems
[Slides of the DEIS'10 presentation given by Armin Roth]- L. Serafini, F. Giunchiglia, J. Mylopoulos, and P.A. Bernstein: Local Relational Model: A Logical Formalization of Database Coordination. In Proc. CONTEXT'03 (International and Interdisciplinary Conference, CONTEXT), pp. 286-299, 2003.
- M. Arenas, V. Kantere, A. Kementsietsidis, I. Kiringa, R.J. Miller, and J. Mylopoulos: The hyperion project: from data integration to data coordination. SIGMOD Record 32(3), pp. 53-58, 2003.
- S. Abiteboul, O. Benjelloun, and T. Milo: The Active XML project: an overview. VLDB J. 17(5), pp. 1019-1040, 2008.
- K. Hose, A. Roth, A. Zeitz, K.-U. Sattler, and F. Naumann: A research agenda for query processing in large-scale peer data management systems. Inf. Syst. 33(7-8), pp. 597-610, 2008.
-
Theory of Peer Data Management
[Slides of the DEIS'10 presentation given by Sebastian Skritek]- A.Y. Halevy, Z.G. Ives, D. Suciu, and I. Tatarinov: Schema mediation for large-scale semantic data sharing. VLDB J. 14(1), pp. 68-83, 2005.
- D. Calvanese, G. De Giacomo, M. Lenzerini, and R. Rosati: Logical Foundations of Peer-To-Peer Data Integration. In Proc. PODS'04 (Symposium on Principles of Database Systems), pp. 241-251, 2004.
- S. Abiteboul, O. Benjelloun, and T. Milo: Positive Active XML. In Proc. PODS'04 (Symposium on Principles of Database Systems), pp. 35-45, 2004.
- I. Tatarinov and A.Y. Halevy: Efficient Query Reformulation in Peer-Data Management Systems. In Proc. SIGMOD'04 (SIGMOD Conference), pp. 539-550, 2004.
- T.J. Green, G. Karvounarakis, Z.G. Ives, and V. Tannen: Update Exchange with Mappings and Provenance. In Proc. VLDB'07 (International Conference on Very Large Data Bases), pp. 675-686, 2007. Technical report available at http://repository.upenn.edu/cis_reports/763/.
-
XML Data Exchange
[Slides of the DEIS'10 presentation given by Amelie Gheerbrant]- A. Fuxman, M.A. Hernandez, C.T. Howard Ho, R.J. Miller, P. Papotti, and L. Popa: Nested mappings: schema mapping reloaded. In Proc. VLDB (International Conference on Very Large Data Bases), pp. 67-78, 2006.
- H. Jiang, Howard Ho, L. Popa, and W.-S. Han: Mapping-driven XML transformation. In Proc. WWW'07 (International Conference on World Wide Web), pp. 063-1072, 2007.
- M. Arenas and L. Libkin: XML data exchange: Consistency and query answering. J. ACM 55(2), 2008.
- S. Amano, L. Libkin, and F. Murlak: XML schema mappings. In Proc. PODS (Symposium on Principles of Database Systems), pp. 33-43, 2009.
-
XML Data Integration
[Slides of the DEIS'10 presentation given by Lucja Kot]- S. Amano, C. David, L. Libkin, and F. Murlak: On the tradeoff between mapping and querying power in XML data exchange. In Proc. ICDT'10 (International Conference on Database Theory), pp. 155-164, 2010.
- S. Abiteboul, L. Segoufin, and Victor Vianu: Representing and querying XML with incomplete information. ACM Trans. Database Syst. 31(1), pp. 208-254, 2006.
- C. David, L. Libkin, and F. Murlak: Certain answers for XML queries. In Proc. PODS'10 (Symposium on Principles of Database Systems), pp. 191-202, 2010.
- P. Barcelo, L. Libkin, A. Poggi, and C. Sirangelo: XML with incomplete information: models, properties, and query answering. In Proc. PODS'09 (Symposium on Principles of Database Systems), pp. 237-246, 2009.
-
Stream-based processing of XML documents
[Slides of the DEIS'10 presentation given by Cristian Riveros]- L. Segoufin and V. Vianu: Validating Streaming XML Documents. In Proc. PODS'02 (Symposium on Principles of Database Systems), pp. 53-64, 2002.
- L. Segoufin and C. Sirangelo: Constant-memory validation of streaming XML documents against DTDs. In Proc. ICDT'07 (International Conference on Database Theory), pp. 299-313, 2007.
- Z. Bar-Yossef, M. Fontoura, and V. Josifovski: On the memory requirements of XPath evaluation over XML streams. J. Comput. Syst. Sci. 73(3), pp. 391-441, 2007.
- M. Grohe, C. Koch, and N. Schweikardt: Tight lower bounds for query processing on streaming and external memory data. Theor. Comput. Sci., 380(1-2), pp. 199-217, 2007.
- M. Shalem and Z. Bar-Yossef: The Space Complexity of Processing XML Twig Queries Over Indexed Documents. In Proc. ICDE'08 (International Conference on Data Engineering), pp. 824-832, 2008. Full version available at http://webee.technion.ac.il/people/zivby/
-
Data stream management systems and query languages
[Slides of the DEIS'10 presentation given by Sandra Geisler]- M. Stonebraker, U. Cetintemel, and S.B. Zdonik: The 8 requirements of real-time stream processing. SIGMOD Record 34(4), pp. 42-47, 2005.
- Y. Ahmad and U. Cetintemel: Data Stream Management Architectures and Prototypes. Pages 639-643 in: Ling Liu, M. Tamer Özsu (Eds.): Encyclopedia of Database Systems. Springer US, 2009.
- M. Cherniack and S.B. Zdonik: Stream-Oriented Query Languages and Operators. Pages 2848-2854 in: Ling Liu, M. Tamer Özsu (Eds.): Encyclopedia of Database Systems. Springer US, 2009.
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom: Models and Issues in Data Stream Systems. In Proc. PODS'02 (Symposium on Principles of Database Systems), pp. 1-16, 2002.
- A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, and J. Widom: STREAM: The Stanford Data Stream Management System. Technical Report, Stanford InfoLab, 2004. Available at http://ilpubs.stanford.edu:8090/641/.
- A. Biem, E. Bouillet, H. Feng, A. Ranganathan, A. Riabov, O. Verscheure, H.N. Koutsopoulos, and C. Moran: IBM infosphere streams for scalable, real-time, intelligent transportation services. In Proc. SIGMOD'10 (SIGMOD Conference), pp. 1093-1104, 2010.
-
Basic algorithmic techniques for processing data streams
[Slides of the DEIS'10 presentation given by Mariano Zelke]- B. Lahiri and S. Tirthapura: Stream Sampling. Pages 2838-2842 in: Ling Liu, M. Tamer Özsu (Eds.): Encyclopedia of Database Systems. Springer US, 2009.
- N. Alon, Y. Matias, and M. Szegedy: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, pp. 137-147, 1999.
- S. Muthukrishnan: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science, 1(2), 2005.
-
Querying and mining data streams
[Slides of the DEIS'10 presentation given by Elena Ikonomovska]- E. Vee: Stream Similarity Mining. Pages 2842-2847 in: Ling Liu, M. Tamer Özsu (Eds.): Encyclopedia of Database Systems. Springer US, 2009.
- G. Cormode and S. Muthukrishnan: What's hot and what's not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), pp. 249-278, 2005.
- M. Datar and S. Muthukrishnan: Estimating Rarity and Similarity over Data Stream Windows. In Proc. ESA'02 (European Symposium on Algorithms), pp. 323-334, 2002.
- S. Guha: Tight results for clustering and summarizing data streams. In Proc. ICDT'09 (International Conference on Database Theory), pp. 268-275, 2009.
-
Distributed Processing of Data Streams and Large Data Sets
[Slides of the DEIS'10 presentation given by Marwan Hassani]- M.N. Garofalakis: Distributed Data Streams. Pages 883-890 in: Ling Liu, M. Tamer Özsu (Eds.): Encyclopedia of Database Systems. Springer US, 2009.
- M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S.B. Zdonik: Scalable Distributed Stream Processing. In Proc. CIDR'03 (Conference on Innovative Data Systems Research), 2003.
- G. Cormode, S. Muthukrishnan, and Ke Yi: Algorithms for distributed functional monitoring. In Proc. SODA'08 (Symposium on Discrete Algorithms), pp. 1076-1085, 2008.
- J. Feldman, S. Muthukrishnan, A. Sidiropoulos, C. Stein, and Z. Svitkina: On distributing symmetric streaming computations. In Proc. SODA'08 (Symposium on Discrete Algorithms), pp. 710-719, 2008.
- J. Dean and S. Ghemawat: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), pp. 107-113, 2008.