Multimodal information processing and analysis
Theodoros Giannakopoulos
Modern multimedia databases can contain millions of files such as videos, digital music collections, image archives and measurements from wearables. High-level semantic descriptions of such multimedia content are crucial for several application domains such as hybrid recommender and personalization systems, multimodal security applications, environmental monitoring and health monitoring systems. Until recently, the major focus of machine learning and data mining research has either focused on structured types of data (such as text or metadata) or artificial supervised and/or unsupervised benchmarks which differ a lot from real-world and multimodal use cases. The recent advances in computer vision, speech recognition and deep learning has provided the scientific community with the “tools” to focus on a wide range of machine learning applications that take into consideration multiple modalities.
This course provides an introduction to processing and analysis of all basic modalities from which high-level semantic information can be extracted using modern machine learning, such as audio, speech, music, images and videos. The goal of the course is to let the student obtain hands-on experience on real-world multimodal signal analysis tasks such as music information retrieval, video characterization and retrieval, video segmentation, emotion recognition and multimodal content-based recommendation. Basic concepts of other courses are introduced such as: speech and audio processing and analysis, computer vision and image processing. However, the overall goal of the course is to provide “horizontal” and practical knowledge regarding modern tasks of multimodal information analysis.
Modern multimedia databases can contain millions of files such as videos, digital music collections, image archives and measurements from wearables. High-level semantic descriptions of such multimedia content are crucial for several application domains such as hybrid recommender and personalization systems, multimodal security applications, environmental monitoring and health monitoring systems. Until recently, the major focus of machine learning and data mining research has either focused on structured types of data (such as text or metadata) or artificial supervised and/or unsupervised benchmarks which differ a lot from real-world and multimodal use cases. The recent advances in computer vision, speech recognition and deep learning has provided the scientific community with the “tools” to focus on a wide range of machine learning applications that take into consideration multiple modalities.
This course provides an introduction to processing and analysis of all basic modalities from
Modern multimedia databases can contain millions of files such as videos, digital music collections, image archives and measurements from wearables. High-level semantic descriptions of such multimedia content are crucial for several application domains such as hybrid recommender and personalization systems, multimodal security applications, environmental monitoring and health monitoring systems. Until recently, the major focus of machine learning and data mining research has either focused on structured types of data (such as text or metadata) or artificial supervised and/or unsupervised benchmarks which differ a lot from real-world and multimodal use cases. The recent advances in computer vision, speech recognition and deep learning has provided the scientific community with the “tools” to focus on a wide range of machine learning applications that take into consideration multiple modalities.
This course provides an introduction to processing and analysis of all basic modalities from
Ημερολόγιο
Ανακοινώσεις
- Τρίτη, 10 Δεκεμβρίου 2019
- Τρίτη, 10 Δεκεμβρίου 2019
- Τρίτη, 15 Οκτωβρίου 2019
- Δευτέρα, 12 Νοεμβρίου 2018