organizers
General Co-Chairs:
Zhiyong Peng (Wuhan University)
Bin Cui (Beijing University)
Hongzhi Wang (Harbin Institute of Technology)
TPC Co-Chairs:
Jiawei Jiang (Wuhan University)
Yuanyuan Zhu (Wuhan University)
Sheng Wang (Wuhan University)
Meeting schedule
Date(2023-07-29)Location(Ruihua Hotel Chicago room) | |||
Time | Talk | Session Chairs | |
14:00-14:45 | Keynote 1:To Support Complex Structures in RDBMSs Jeffrey Xu Yu (The Chinese University of Hong Kong) | Bin Cui (Beijing University) | |
14:45-15:30 | Keynote 2: The Value of Distributed RDBMS Zhenkun Yang (OceanBase, Ant Group) | Hongzhi Wang (Harbin Institute of Technology) | |
15:30-15:50 | Coffee Break | ||
15:50-17:30 | |||
Young Scholar Forum | |||
Architecting Heterogeneity-aware Big Data Processing System Bo Tang (Southern University of Science and Technology) | Yuanyuan Zhu (Wuhan University) | ||
Advances in Learned Query Optimizer Rong Zhu (Alibaba Group) | |||
Efficient big graph data analysis: mesoscopic and macroscopic perspectives Long Yuan (Nanjing University of Science and Technology) | |||
Discussion on Incremental Graph Processing Shufeng Gong (Northeastern University) |
The Chinese University of Hong Kong
Keynote: To Support Complex Structures in RDBMSs
Abstract:RDBMs have been extensively studied over decades to manage large datasets, and there has been a long history of supporting complex structures in RDBMSs since early 1970. In this talk, we review some approaches to handle complex structures in RDBMSs. As it becomes important to support large graphs processing by algorithm design and system development, we revisit the issue how to support graph processing in RDBMSs. Our work is motivated by the fact that in real applications there are many relations that are closely related to a large graph stored in RDBMS and there are needs to conduct graph/data analytics together in an integrated system. To support graph analytics in RDBMSs, we discuss new relational algebra operations that can be defined by the basic relational algebra operations with group-by-&-aggregation; and we discuss new SQL recursive queries that are ensured to have a fixpoint with the new operations. In addition, we discuss how to support such new operations and new SQL recursive queries by SparkSQL and GraphX on Spark, and how to optimize SQL queries by separating communication from computation in a distributed system, in order to achieve high efficiency.
BIO: Dr Jeffrey Xu Yu is a Professor in the Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong. His current main research interests include graph algorithms, graph processing systems, and query processing in database systems. Dr. Yu served as an Information Director and a member in ACM SIGMOD executive committee (2007-2011), an associate editor of IEEE TKDE (2004-2008), and an associate editor in VLDB Journal (2007-2013). Currently he serves as an associate editor of ACM TODS, WWW Journal, etc. Dr. Yu served/serves in many organization committees and program committees in international conferences/workshops including PC Co-chair of APWeb'04, WAIM'06, APWeb/WAIM'07, WISE'09, PAKDD'10, DASFAA'11, ICDM'12, NDBC'13, ADMA'14, CIKM'15, Bigcomp17, DSAA'19, CIKM'19, and DASFAA'20, and conference general Co-chair of APWeb'13, ICDM'18, and ADC'22.
OceanBase, Ant Group
Keynote: The Value of Distributed RDBMS
Abstract:In 1970, the relational model was invented by Dr. E. F. Codd followed by the emergence of SQL language, RDBMS gradually matured in the 1980s and became the cornerstone of various information systems. Mainly due to technical difficulties, all RDBMS are centralized systems. In the 1990s, with the development of the Internet, centralized RDBMS, due to its very limited scalability, were unable to meet the high concurrency and massive data requirements of Internet services and database sharding became the only feasible solution. However, sharding implies poor inter-shard transaction performance and costly and risky refactor of the business system. With scale-out ability and agile flexibility, distributed RDBMS is the ideal solution to high concurrency access and massive data system. This report presents the advantages of distributed RDBMS and introduces the practice of OceanBase distributed relational database system.
BIO: Dr. Zhenkun Yang got his bachelor and master from the Department of Mathematics and Ph.D. from the Department of Computer Science, Peking University. Then he became an associate professor and a full professor in computer science, Peking University. He received the Cheung Kong Scholar Award, Peking University in 1999. He was the 4th person in the first-class award of the National Science and Technology Progress of China in 1995. He also won the first-class award of Science and Technology Progress of Beijing in 1996, National Youth Science and Technology Award of China in 1998, Qiushi Eminent Youth Award of the China Association for Science and Technology in 1998, and Wusi Youth Award of Beijing in 2000. He won the Wangxuan Award of China Computer Federation in 2022. He is the first inventor of 20+ patents. In 2010, he initiated the development of OceanBase distributed relational database at Alibaba. OceanBase passed the TPC-C benchmark test in 2019 and broke the performance record held by Oracle for 9 years and passed the TPC-H benchmark test in 2021 and broke the performance record(@30,000GB). Today, OceanBase is serving many banks, insurance, energy, communication companies as well as government sectors both in China and abroad. Now Dr. Zhenkun Yang is the chief scientist of OceanBase company.
Southern University of Science and Technology
Keynote:Architecting Heterogeneity-aware Big Data Processing System
Abstract:With the rapid development of mordern hardware, how to architect an efficient and effective big data processing system on heterogeneous computing hardware becomes the hot topic of database and system communities. In this talk, I will briefly introduce the research progress from the database research group, Southern University of Science and Technology. In particular, I will include three topics: (1) Learning-based Progressive Cardinality Estimation; (2) GHive: Accelerating Analytical Query Processing in Apache Hive via CPU-GPU Heterogeneous Computing; and (3) QEVIS: Multi-grained Visualization of Distributed Query Execution. Last, I will highlight the lessons we learned during the above research journey.
BIO: Dr. Bo Tang is an assistant professor in Department of Computer Science and Engineering, Southern University of Science and Technology. His main research interests are big data analytics and database system. His research work always published at the top-tier data management conferences and journals (e.g., SIGMOD, PVLDB). Some of his research outputs have been used in the products of Microsoft, Huawei and ByteDance, etc.
Advances in Learned Query Optimizer
Keynote:Architecting Heterogeneity-aware Big Data Processing System
Abstract:Learned techniques for query optimizer is the forefront in AI4DB. It provides the most suitable experimental plots for utilizing ML techniques and learned query optimizer has exhibited superiority with enough evidence. In this talk, we aim at providing a wide and deep review and analysis on learned query optimizer, ranging from algorithm design, real-world applications and system deployment. Based on them, we summarize some design principles and point out several future directions.
BIO:Rong Zhu is a research scientist in Data Analytics and Intelligence Lab (DAIL), Alibaba DAMO Academy. He also serves as an adjunct industry mentor in Chinese University of Hong Kong, Shen Zhen. He received his Ph.D. and B.S. degree from Harbin Institute of Technology in 2019 and 2013, respectively. His research interests lie in the intersection of databases, machine learning and systems, with an emphasis on AI4DB. He has published nearly 30 papers in top-tier conferences and journals, including VLDB, ICDE, TKDE, ICLR and etc, and give tutorials in EDBT and CIKM. He received the Second Prize of Natural Science Award by China Ministry of Education in 2019, the Outstanding Doctoral Dissertation Nomination Award by CCF in 2020 and ACM SIGMOD China Rising Star Award in 2022.
Nanjing University of Science and Technology
Keynote:Efficient big graph data analysis: mesoscopic and macroscopic perspectives
Abstract: Graph is an abstract data structure consisting of vertices and edges, which can be used naturally to model the multi-source heterogeneous data. Therefore, graph model and graph data analysis are becoming one of the most important topics in big data research and widely used in real applications. Graph analysis can be conducted from different perspectives. In this talk, I will introduce the typical research problems and relevant research progress in mesoscopic and macroscopic graph analysis, and discuss the future research directions in this important and growing research area.
BIO:Long Yuan is a Professor at the School of Computer Science and Engineering, Nanjing University of Science and Technology. His research interests include graph data management and analysis, especially in the efficient algorithm design for big graph data. He has published more than 40 papers in top conferences and journals such as VLDB, WWW, ICDE, VLDBJ, and TKDE. He was the recipient of DASFAA 2023 Best Student Paper Award, ACM SIGMOD China Rising Star Award. He has been invited as the program committee member of many top conferences in database and data mining,such as VLDB、ICDE、The Web Conference、NeurIPS、AAAI、CIKM、WSDM.
Northeastern University
Keynote:Discussion on Incremental Graph Processing
Abstract: By using the memorized previous computation state, incremental graph computation can reduce unnecessary re-computation, which may require a lot of memory. How to efficiently process dynamic graphs incrementally with as few intermediate results as possible? A small change may propagate over the whole graph and lead to large-scale iterative computations. How to constraint incremental computation into a smaller scope? We will discuss how to solve the above two problems.
BIO:Shufeng Gong received the PhD degree in computer science from Northeastern University, China, in 2021. He is an lecture with Northeastern University. His research interests include cloud computing, distributed graph processing, and stream processing.