The emergence of Big Data has brought about a paradigm shift to many fields of data analytics. Multimedia is typical big data: not just big in volume, but also unstructured, noisy, redundant, and heterogeneous. Problems we didn’t see before are becoming critical now for big multimedia analysis, e.g., the scalability and high computation cost of sophisticated algorithms, the incompleteness and shortage of well-annotated raw data, the heterogeneity in integrating data from different sources, the difficulty in discovering valuable knowledge from noisy and redundant data, etc. This workshop aims to provide a world's premier forum of renowned researchers to share their insightful opinions and discuss cutting-edge research on the scientific and technological challenges of multimedia big data analysis. The workshop will feature in three types of sessions including keynotes, invited talks and panel discussion.
2017-05-14 (Day 1): SIGMM China Symposium
Keynote Speech 1 |
|
14:00-14:45 |
Ramesh Jain, UC Irvine, ACM Fellow Cybernetics for Personal Health |
Keynote Speech 2 |
|
14:45-15:30 |
Shih-Fu Chang, Columbia University, Chair of ACM SIGMM New Frontiers of Large Scale Multimedia Information Retrieval |
15:30-15:45 Tea Break |
|
Invited Talk |
|
15:45-16:15 |
Wenwu Zhu, Tsinghua University , EIC of TMM When AI meets Multimedia: Cross-media Inference |
16:15-16:45 |
Chong Wah Ngo, City University of Hong Kong, ACM Distinguished Scientist Food Recognition: Challenge and Opportunity |
16:45-17:15 |
Xian-Sheng Hua, Alibaba, ACM Distinguished Scientist Big Visual Data Computing on Cloud and into Industries |
17:15-17:45 |
Tao Mei, MSRA, ACM Distinguished Scientist Vision and Language: Bridging Vision and Language with Deep Learning |
Keynote 1: Ramesh Jain (University of California Irvine), ACM/IEEE/IAPR/AAAI/SPIE Fellow
Talk title: Cybernetics for Personal Health
Abstract: A person’s health is the result of her genetics, lifestyle, and environment. Cybernetic approach may help people manage lifestyle and environment for many chronic conditions, such as Diabetes. Advances in smart phones, sensors, and wearable technology are now making it possible to analyze and understand an individual’s life style from mostly passively collected objective data streams to build her model and predict important health events in her life. Wearable/mobile sensors, smart homes, social networks, e-mail, calendar systems, and environmental sensors continuously generate data streams that can be used as lifestyle data. By assimilating and aggregating these multi-sensory data streams, we may create an accurate chronicle of a person’s life. By correlating life events with other events, and using a novel causality exploration framework, one can build model of the person. Such a model is the objective characterization of a person’s health, lifestyle, social life, and other aspects. We illustrate how to build an objective personal model for a person. It is possible to build a model of the person that could result in actionable insights and alerts in everyday life as well as provide predictive and preventive guidance for serious health events. We are studying this cybernetic approach considering Type 2 Diabetes as a concrete example.
Bio: Ramesh joined University of California, Irvine as the first Bren Professor in Bren School of Information and Computer Sciences in 2005. Ramesh has been an active researcher in experiential computing, multimedia information systems, machine vision, and intelligent systems. While professor of computer science and engineering at the University of Michigan, Ann Arbor and the University of California, San Diego, he founded and directed artificial intelligence and visual computing labs. He was also the founding Editor-in-Chief of IEEE MultiMedia magazine and Machine Vision and Applications journal and served on the editorial boards of several magazines and journals. He has co-authored more than 450 research papers in well-respected journals and conference proceedings. His co-authored and co-edited books include two text books: Machine Vision (published in 1995), and Multimedia Computing (published in 2014). Ramesh has been elected Fellow of ACM, IEEE, IAPR, AAAI, and SPIE. He is the recipient of several awards including the ACM SIGMM Technical Achievement Award 2010.
Ramesh co-founded multiple companies (Imageware, Virage, Praja, Seraja, mChron, and Krumbs), managed them in initial stages, and then turned them over to professional management. Currently, he is working with Krumbs — a Visual Web company. He enjoys working with companies, is involved in research, and enjoys writing. His current research is in Social Life Networks, Objective Self, and Visual Web.
Keynote 2: Shih-Fu Chang (Columbia University),
IEEE/AAAS Fellow, ACM SIGMM Chair
Talk title: New Frontiers of Large Scale Multimedia Information Retrieval
Abstract: Multimedia information retrieval aims to automatically extract useful information from large collection of images, videos, and combinations with other media. It’s now possible to search information over millions of products with just an example image on the mobile device. Intelligent apps are being deployed today to automatically generate captions of images or videos at a sophistication level that could not be imagined before. In this talk, I will review core technologies involved in past achievements and discuss opportunities ahead. First, I will discuss research on extracting dynamic event information from unconstrained videos, touching on the challenges of event detection and localization, and fusion of video information in a large camera network with social media content. Second, instead of relying on fixed categories used in most retrieval systems, we introduce a new paradigm to automatically discover new multimedia entities and relations when entering new application domains. Last, to support emerging applications requiring higher level information, I will discuss efforts in understanding sentiments and emotions expressed in images on social media and their differences across languages and cultures.
Bio: Shih-Fu Chang is the Sr. Executive Vice Dean and the Richard Dicker Professor of The Fu Foundation School of Engineering and Applied Science at Columbia University. His research is focused on multimedia information retrieval, computer vision, machine learning, and signal processing. A primary goal of his work is to develop intelligent systems that can harness rich information from the vast amount of visual data such as those emerging on the Web, collected through pervasive sensing, or stored in gigantic archives. A consistent theme of his research is turning unstructured multimedia data into searchable information. His work on content-based visual search in the early 90's, VisualSEEk and VideoQ, set the foundation of this vibrant area. Over the years, he continued to create innovative techniques for image/video recognition, multimodal analysis, visual information ontology, image authentication, and compact hashing for large-scale image databases. He also applies the novel capabilities to multi-source news video search, mobile search, 3D object search, and brain machine interfaces. Impact of his work can be seen in more than 300 peer-reviewed publications, numerous paper awards, more than 30 issued patents, and technologies licensed to six companies. For his long-term pioneering contributions, he has been awarded the IEEE Signal Processing Society Technical Achievement Award, ACM Multimedia Special Interest Group Technical Achievement Award, Honorary Doctorate from the University of Amsterdam, the IEEE Kiyo Tomiyasu Award, and IBM Faculty Award. For his dedicated contributions to education, he received the Great Teacher Award from the Society of Columbia Graduates. He served as Chair of Columbia Electrical Engineering Department (2007-2010), the Editor-in-Chief of the IEEE Signal Processing Magazine (2006-8), and advisor for several international research institutions and companies. In his current capacity in Columbia Engineering, he plays a key role in the School's strategic planning, special research initiatives, international collaboration, and faculty development. He is a Fellow of the American Association for the Advancement of Science (AAAS) and IEEE.
Invited talk 1: Wenwu Zhu (Tsinghua University)
Talk title: When AI meets Multimedia: Cross-media Inference
Bio: Wenwu Zhu is currently a Professor and Deputy Head of Computer Science Department of Tsinghua University. Prior to his current post, he was a Senior Researcher and Research Manager at Microsoft Research Asia. He was the Chief Scientist and Director at Intel Research China from 2004 to 2008. He worked at Bell Labs New Jersey as Member of Technical Staff during 1996-1999. He received the Ph.D. degree from New York University Polytechnic School of Engineering in 1996 in Electrical and Computer Engineering.
Wenwu Zhu is an IEEE Fellow, SPIE Fellow, and ACM Distinguished Scientist. He has published over 200 referred papers in the areas of multimedia computing, communications and networking. He is inventor or co-inventor of over 50 patents. He received six Best Paper Awards, including ACM Multimedia 2012 and IEEE Transactions on Circuits and Systems for Video Technology in 2001. He also received the State Natural Science Award, 2nd place, 2012 in China. His current research interests are in the area of multimedia big data computing, Cyber-Physical-Human big data computing, and multimedia communications and networking. He served(s) on various editorial boards, such as Guest Editors for the Proceedings of the IEEE, IEEE T-CSVT, and IEEE JSAC; Associate Editors for IEEE Transactions on Mobile Computing, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, and IEEE Transactions on Big Data. He serves(d) in the steering committee for IEEE Transactions on Multimedia (2015-present) and IEEE Transactions on Mobile Computing (2007-2010), respectively. He served as TPC Co-chair for ACM Multimedia 2014 and IEEE ISCAS 2013, respectively.
Invited talk 2: Chong Wah Ngo (City University of Hong Kong)
Talk title: Food Recognition: Challenge and Opportunity
Abstract: In multimedia, dishes recognition is regarded as a difficult problem due to diverse appearance of food in shape and color because of different cooking and cutting methods. As a result, while there is a large number of cooking recipes posted on the Internet, finding a right recipe for a food picture remains a challenge. The problem is also shared among health-related applications. For example, food-log management, which records dairy food intake, often requires manual input of food/ingredients for nutrition estimation. This talk will share with you the challenge of recognizing ingredients, in Chinese dishes particularly, for recipe retrieval. Recognizing Chinese dishes is challenging due to non-standardized cooking methods, diverse appearances of dishes and wild composition of ingredients. I will introduce two deep neutral architectures that explore the mutual but fuzzy relationship between food and ingredients for recognition. The learnt deep features are used for cross-modal retrieval of food and recipes.
Bio: Chong-Wah Ngo is a Professor in Dept, of Computer Science, City University of Hong Kong. He received his Ph.D in Computer Science from the Hong Kong University of Science & Technology (HKUST), and his MSc and BSc, both in Computer Engineering, from Nanyang Technological University in Singapore. His research interests are in multimedia search and computing. He has been serving the technical program committees of numerous multimedia and information retrieval conferences including ACM Multimedia (MM), ACM SIGIR, International Conf. on Multimedia Retrieval (ICMR) and International Conf. on Multimedia and Expo (ICME). He was an Associate Editor of IEEE Trans. on Multimedia (2011-14), and is on the editorial board of Journal of Multimedia Data Engineering and Management, and Journal of Advances in Multimedia. He is the program area chair of ICME 2015 and ICPR 2014, program co-chair of ACM ICIMCS 2011, ACM MMM 2012 and ACM ICMR 2012, conference chair of PCM 2014 and ACM ICMR 2015. He is founding director of VIREO (VIdeo REtrieval grOup). He served as the chairman of ACM Hong Kong Chapter during 2008 and 2009.
Invited talk 3: Xian-Sheng Hua (Alibaba)
Talk title: Big Visual Data Computing on Cloud and into Industries
Bio: Dr Xian-Sheng Hua became a Researcher and Senior Director of Alibaba Group in April of 2015, leading the multimedia technology team in the Search Division. Before that, he was a senior researcher of Microsoft Research Redmond since 2013, worked on Web-scale image and video understanding and search, as well as related applications. He was a Principal Research and Development Lead in Multimedia Search for the Microsoft search engine, Bing, since 2011, where he led a team that designed and delivered leading-edge media understanding, indexing and searching features. He joined Microsoft Research Asia in 2001 as a researcher. Since then, his research interests have been in the areas of multimedia search, advertising, understanding, and mining, as well as pattern recognition and machine learning. He has authored or co-authored more than 250 research papers in these areas and has filed more than 90 patents. Dr Hua received his BS in 1996 and PhD in applied mathematics in 2001 from Peking University, Beijing. He served or is now serving as an associate editor of IEEE Transactions on Multimedia, an associate editor of ACM Transactions on Intelligent Systems and Technology, an editorial board member of Advances in Multimedia and Multimedia Tools and Applications, and an editor of Scholarpedia (multimedia category). He was vice program chair; workshop organizer; senior TPC member and area chair; and demonstration, tutorial, and special session chairs and PC member of many more international conferences. He served as a program co-chair for IEEE ICME 2013, ACM Multimedia 2012, and IEEE ICME 2012, as well as on the Technical Directions Board of IEEE Signal Processing Society. He was honored as one of the recipients of the prestigious 2008 MIT Technology Review TR35 Young Innovator Award for his outstanding contributions to video search. He won the Best Paper and Best Demonstration Awards at ACM Multimedia 2007, the Best Poster Award at IEEE International Workshop on Multimedia Signal Processing 2008, the Best Student Paper Award at ACM Conference on Information and Knowledge Management 2009, the Best Paper Award at International Conference on MultiMedia Modeling 2010, the best demonstration award at ICME 2014 and best paper award of IEEE Trans. On CSVT in 2014. He was named one of Global Entrepreneur's "Business Elites of People under 40 to Watch" in 2009. He is a fellow of IEEE and an ACM Distinguished Scientist.
Invited talk 4: Tao Mei (Microsoft Research Asia)
Talk title: Vision and Language: Bridging Vision and Language with Deep Learning
Abstract: Visual recognition has been a fundamental challenge in computer vision for decades. Thanks to the recent development of deep learning techniques, researchers are striving to bridge vision (image and video) and natural language, which has become an emerging research area. We will present a few recent advances bridging vision and language with deep learning techniques, including image and video captioning, image and video chatting, storytelling, vision and language grounding, datasets, grand challenges, and open issues.
Bio: Tao Mei is a Lead Researcher with Microsoft Research, Beijing, China. His current research interests include multimedia analysis and retrieval, and computer vision. In particular, he is interested in applying the techniques from these areas to a broad range of multimedia and vision applications, such as video analytics, deep learning, personal media, multimedia search, social and mobile applications. Tao has shipped a dozen inventions and technologies to Microsoft products, such as Bing, Office, MSN, OneDrive, Azure, XiaoIce, etc. He has authored or co-authored over 100 papers in journals and conferences, 10 book chapters, and edited four books. He holds over 16 U.S. granted patents and 20+ in pending.
Tao was the recipient (together with his interns) of 10 paper awards from prestigious multimedia journals and conferences, including IEEE Communications Society MMTC Best Journal Paper Award in 2015, IEEE Circuits and Systems Society Circuits and Systems for Video Technology Best Paper Award in 2014, IEEE Trans. on Multimedia Prize Paper Award in 2013, Best Paper Awards at ACM Multimedia in 2009 and 2007, and the Best Student Paper Award at IEEE VCIP in 2012, etc. He was the principle designer of the automatic video search system that achieved the best performance in the worldwide TRECVID evaluation in 2007. He received Microsoft Gold Star Award in 2010, Spot Award in 2014, and Special Stock Award in 2016. He is an Editorial Board Member of IEEE Trans. on Multimedia (TMM), ACM Trans. on Multimedia Computing, Communications, and Applications (TOMM), Machine Vision and Applications (MVA), and Multimedia Systems (MMSJ), and was an Associate Editor of Neurocomputing, a Guest Editor of eight international journals. He is the General Co-chair of ACM ICIMCS 2013, the Program Co-chair of ACM Multimedia 2018, IEEE ICME 2015, IEEE MMSP 2015 and MMM 2013, and the Area Chair for a dozen international conferences. He is a Senior Member of the IEEE and the ACM.
Tao received B.E. and Ph.D. degrees from the University of Science and Technology of China, Hefei, China, in 2001 and 2006, respectively. He is an Adjunct Professor (PhD advisor) in the University of Science and Technology of China and the Sun Yat-Sen University.
Tentative Title: Challenges and Opportunities when Multimedia Meets Big Data
Tentative list of panelists: all the above invited speakers.