마이크로소프트연구소아시아와 공동연구협력을 위한 프로젝트 모집 공고

  1. 목적

마이크로소프트연구소아시아와 공동연구협력 프로젝트* 선정

* 2021년 글로벌 핵심인재 양성지원 사업(과학기술정보통신부)의 ‘글로벌 기업 연계형’과제로 추진되며, 본 프로젝트는 ‘국가연구개발혁신법 시행령 제64조제2항의4(연구개발과제 수의 제한)’에 의거하여 동시에 수행 가능한 최대 과제 수에 해당함.

 

  1. 운영방향

마이크로소프트연구소아시아에서 선정한 연구주제에 부합하는 창의적 아이디어 공모를 통해 선정

프로젝트 수행시 공동연구협력을 위한 마이크로소프트연구소아시아의 전문가 매칭

공동 연구원* 마이크로소프트연구소아시아에 6개월 파견

* 공동연구기관(국내 대학)의 연구원(석박사생)

 

  1. 과정개요

A. 프로젝트 지원

– 지원규모: 총 15-16억 원*, 15개 과제

* 글로벌 핵심인재 양성지원 사업(과기정통부) 예산 : 13.5억 원

– 지원분야:

 

Advanced Machine Learning algorithms

We call for research proposals targeting at the development of low sample complexity, fast and lightweight algorithms for speech, natural language, computer vision, and games.

Topics:

  • Neural speech synthesis
  • Off-policy reinforcement learning

 

Next Generation Recommendation Systems

Information overload has become a huge challenge for online users. In order to alleviate this problem, recommendation systems play an increasingly important role in Internet services. It is a constant hot topic in industry and academia. In this theme, we encourage researchers to collaborate and tackle important challenges in recommendation systems, including data heterogeneity, data sparsity, and lack of interpretability. We welcome research proposals which aim to improve the recommendation performance by leveraging recent progress in deep learning, natural language understanding, and knowledge graph. Priority will be given to proposals which could leverage MIND (https://msnews.github.io/), a large-scale news recommendation dataset shared by Microsoft, or contribute to Microsoft Recommenders (https://github.com/microsoft/recommenders), an open-source repository for helping developers to build their own recommendation systems more efficiently. We believe that personalized recommendation systems will continue to develop in various directions, including effectiveness, diversity, computational efficiency, and interpretability, ultimately addressing the problem of information overload.

Topics:

  • Deep learning based recommendation
  • Deep learning based user representation
  • Responsible recommendation systems
  • Reinforcement learning based recommendation
  • Knowledge aware recommendation
  • Graph learning based recommendation

 

AI-Powered Media Experiences

Going forward, we will see an increasing number of people working remotely. Even without the global pandemic, working remote would be a great option for many people for a variety of reasons, for example, reducing the carbon emissions from transportation. In Microsoft, we have a vision to provide intelligent media services to help users have effective and inclusive collaborations and discussions while working remotely. AI and multimedia computing are the two pillars of the strategy to realize this vision. For example, computer vision and audio enhancement technologies could power new media experiences for online meetings, as we developed in Microsoft Teams. AI-based optimization of real-time communication (RTC), as a new paradigm shift of software development, could also fundamentally improve user experiences in audio, video, and screen sharing. In Microsoft Research Asia, we invest on core AI and media computing technologies for many years. We would like to collaborate closely with professors and researchers in academia to advance the fundamental research in AI-powered rich media communication. The collaborative topics include but not limited to real-time computer vision, audio enhancement, augmented reality, media compression, and AI-based RTC optimization.

Topics:

  • Real-time computer vision
  • ML-based audio enhancement
  • ML-based video codec

 

Neural Machine Translation

Machine translation task is the crown jewel in the field of natural language processing, and its related technical research has greatly promoted the overall development of natural language processing field, among which the machine translation technologies developed on the basis of Transformer has achieved great success in many natural language understanding and natural language generation tasks. At the same time, machine translation tasks themselves still face many problems to be solved, such as low-resource language translation, multilingual translation, translation robustness, document-level translation consistency and consistency. We welcome research proposals to exploring these related issues.

Topics:

  • Pre-training for machine translation
  • Low-resource language translation
  • Multilingual machine translation
  • Context-aware/document-level translation
  • Translation robustness
  • Novel machine translation modeling

 

Spoken Language Processing

Spoken language processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond. Given the raw speech signal, spoken language processing tries to enhance it by removing the noise, separate the speech signal for each speaker, recognize it into text, analyzes the semantic meaning, generates the proper responding text, and synthesize it into speech output. With the rapid development of deep learning and the more and more available data for model training, remarkable progress is being made in spoken language processing, such as automatic speech recognition achieves human parity for the specific domain. However, there are still big challenges for further exploration, such as how to leverage large-scale unlabeled speech data, how to improve model accuracy for low resource scenarios, how to deal with the cocktail party problem in a complex scenario. To push forward the research for this direction, we welcome research proposals to exploring these related issues.

Topics:

  • Pre-training for Speech
  • Automatic Speech Recognition
  • Multilingual ASR
  • Low Resource ASR
  • Speech Separation
  • Speech Translation

 

Efficient Multi-Modal Pre-Training

Recent advances of multi-modal pre-training explored the power of pre-training on large scale of data to accelerate performances on many multi-modal tasks (e.g., visual question and answer, image-text retrieval, video localization, speech recognition). Despite the great success of pre-training models, the large scale of parameters and heavy computation requirement has limited its application in real-world scenarios and exploration of more researchers. To address this challenge and further advance multi-modal research, we welcome research proposals which aim to design fast, lightweight and efficient pre-training models for multi-modality. We believe the efficient multi-modal pre-training models will advance the real scenarios in both research and industry community with the latest pre-training works.

Topics:

  • Knowledge distillation for multi-modal pre-training
  • Model suppression in multi-modal pre-training
  • Quantization in multi-modal pre-training
  • Multi-modal pre-training model optimization

 

Multimodal NLP

Learning joint representations of vision and language could lead to the next AI breakthroughs. Motivated by this, we propose multimodal NLP as a research theme and call for collaborations with professors/researchers from academic community. The goal is to develop cutting-edge language-centered multimodality models for various tasks, such as commonsense knowledge learning from visual contents, text-to-image/video retrieval and generation, image/video-based QA, reasoning and captioning etc. From research perspective, we hope the collaborations can lead to impactful research papers or achieve state-of-the-art results on latest research-driven leaderboards.

Topics:

  • Language-centered Vision-Language Pre-training
  • Commonsense Knowledge Learning from Visual Contents
  • Text-to-Image/Video Retrieval and Generation
  • Image/Video-based QA, Reasoning and Captioning

 

Multi-modality learning, understanding and generation

Multi-modality information, e.g., language and vision, provides rich relation between modalities. There are opportunities for understanding each modality better, learning the representations better in a supervision manner or an unsupervised manner, and even text and image generation. It is potentially useful for search engine (e.g., Google and Bing), content generation and edit (e.g., Office and Visual Studio). Motivated by this, we propose multi-modality learning, understanding and generation as a research theme and call for collaborations with professors/researchers from academic community. The goal is to develop cutting-edge models for various tasks over multi-modalities. From research perspective, we hope the collaborations can lead to impactful research papers or achieve state-of-the-art results on latest research-driven leaderboards.

Topics:

  • Multi-modality pretraining
  • Multi-modality understanding
  • Multi-modality content generation
  • Multi-modality process with universal networks
  • Image Retrieval, QA and Captioning
  • Video Retrieval, QA and Captioning

 

Large-scale Distributed Machine Learning System

The computational requirements for training DNNs have grown rapidly nowadays, requiring hundreds of ZettaFLOPs of computation for training a single state-of-the-art model such as GPT-3. To support the massive computation, it is now a common practice to parallelize the training process in hundreds or thousands of machines. Therefore, the large-scale distributed system becomes more and more critical. However, there are many challenges to achieve linear speedup and high resource utilization in such a system, which we divide largely into three categories. First, we need a more efficient networking system that enables many accelerators across machines to tightly collaborate with each other. Inter-accelerator collaboration is increasingly important for supporting large models and also for splitting heavy operations to incorporate more cores and larger memory bandwidth efficiently. Second, we need to utilize cluster resources in an elastic way for higher resource efficiency and fault tolerance. Distributed training is very fragile as it synchronizes many machines frequently and repeatedly for a long time, and it also often under-utilizes accelerators as it uses more of them in parallel, which is detrimental to overall cluster efficiency. We tackle this by leveraging elastic and dynamic re-adjustment of the resource usage that seamlessly mitigates failures and further optimizes the overall resource efficiency. Third, since DNN structures are dynamically evolving every year, we need a framework that finds optimization opportunities for the accelerator automatically with an arbitrarily given DNN architecture. This requires a careful co-design of hardware, compiler, scheduler, and libraries to reveal all kinds of hidden opportunities for further acceleration.

Topics:

  • Elastic Distributed ML system
  • System Benchmarking and Diagnosis
  • Collaborative Communication Library Acceleration
  • Hardware Acceleration for ML System
  • Kernel Optimization and Op Sharing

 

Enhancing Generalization for Reinforcement Learning Based Real Time Communications

Leverage RL to boost RTC is a critical problem with a neutral standing point. To accelerate training process, a RTC Gym is needed to simulate the realistic network status and user behavior. However, it is quite common that models trained on Gym cannot perform excellently in real world, which is called as the generalization issue. The root causes include 1) no enough data for training: 2) no clear description or mathematical analysis for data; 3) no reasonable method in Gym to generalize the training data. MSRA has built up an open global RTC research platform, OpenNetLab (ONL). ONL provides a suite toolkits for RL-based RTC, including gym, emulator, and real testbeds. We are investigating how to better enhance the cooperate the tools in ONL for RL-based RTC. Currently, we have supported collecting and storing Internet data on ONL. And we are searching for corporations in the following aspects: 1) Data analysis : Processing the ONL data and flittering features of network status and user behavior. 2) Data generalization: Updating/Generalizing the training set in Gym based on daily analysis results. 3)Gym test: Quantifying the sim-to-real gaps.

Topics:

  • Realistic network/workload model
  • Gym for RTC application
  • ML/RL algorithm

 

Machine Learning-based Program Synthesis

Program synthesis is a task that enable computers to automatically generate programs from various specifications (such as natural language description, program sketch, and input/output examples). It has many promising applications: not only improving productivity of software developers, but also providing more friendly human-computer interaction paradigms for the general public. A key challenge in program synthesis is the exponential program space, which makes the complexity of traditional program synthesis algorithms too high to be useful in practice. In this era of big data, it is an opportunity to address this challenge by learning from large-scale labeled and unlabeled data. We welcome research proposals which aim to make contributions towards machine learning models and algorithms for the program synthesis problem.

Topics:

  • Program synthesis applications in different domains
  • Semantic parsing of natural language specifications
  • Unsupervised learning from big corpora of programs
  • Compositional generalization of machine learning models

 

3D Content Creation for Learning and from Learning

As deep learning demonstrates its capability for 2D image/video generation, deep learning based 3D content creation still faces several challenges. One challenge is the limited amount of 3D data in the existing dataset, which limits research and development of deep learning algorithm in this field. Another challenge is how to fuse deep learning-based 3D content creation methods with existing interactive modeling approaches for variant 3D modeling/animation tasks. In this research theme, we aim to develop a set of dataset and labeling tools, as well as algorithms to tackle these two challenges.

Topics:

  • High quality 3D object database acquisition
  • Data labeling tools and system
  • Learning based 3D content creation algorithms/network design

 

Spatial/Human Understanding in Video

We are developing both cloud-based and edge-based intelligence engines that can turn raw video data into insights to facilitate various applications and services such as business (retail store, office) intelligence, smart home intelligence, video augmented reality, etc. We have taken a human centric approach that focuses on understanding human, human attributes, and human activities in the scene. Successful applications are required to commit to open source release of their outcomes as part of a collective Human SDK open source project.

Topics:

  • Object/human detection and tracking
  • Object/human re-identification
  • Human pose estimation
  • Action recognition
  • 2D/3D scene understanding
  • Multi-modality (audio/visual/language) human understanding
  • Unsupervised/supervised model adaptation (to domains and downstream tasks)

 

Infuse AI to Empower Heterogeneous Devices and New Applications on the Edge

With the recent advances in software and hardware, there is computing paradigm shift from centralized cloud computing to distributed computing on the edge. Together with the breakthroughs in AI, we advocate intelligent edge computing to infuse AI to empower heterogeneous devices and diverse AI applications on the edge. We call for proposals on intelligent edge computing that 1) advance the state-of-the-art research by top publications, prototypes and open-source code; 2) build and deploy real edge systems to solve real problems and learn experience in the wild; and 3) leverage Azure cloud to connect and enable new generation devices and applications.

Topics:

  • Affordable AI models tailored for diverse hardware (model compression and optimization, AutoML etc.)
  • Efficient software stack best utilizing heterogenous resources (system optimization, resource management and scheduling, software/hardware co-design etc.)
  • Privacy and security (user privacy, data and model protection etc.)
  • Learning on the edge (distributed learning, continuous learning, collaborative learning etc.)
  • New applications and scenarios (AIoT, AR, VR, gaming, 5G etc.)

 

– 지원기간: 2021.5.1 – 2022.4.30

– 지원내용: 프로젝트경비 (기업부문: USD10K와 정부부문: Korea Won 80-100M) – 총 9천 – 1억 2천만원

* 정부부문의 프로젝트비 산정. 사용 등은 정보통신방송 연구개발 관리규정에 따름 (IITP의 추후 안내)

* 기업부문은 기업과제 별도 계약에 따름.

– 선정심사: 마이크로소프트연구소아시아의 전문성심사(서면)

* 심사결과는 선정된 과제에 한하여 개별 통보하며 공개되지 않음

* 과제 공동연구기관으로 선정된 국내 대학은 IITP와 협약 등 정부 과제 수행을 위한 절차를 추진해야 함(대학별로 공동연구기관으로 협약 체결)

 

B. 공동연구원 파견

– 공동 연구원: 별도 심사를 통해 선발

– 파견기간: 6개월(2021. 9월 – 2022. 3월 예정)

– 파견기관: 마이크로소프트연구소아시아 (중국, 북경)

 

  1. 신청자격

프로젝트당 학생(2-5명) 및 지도교수로 팀을 구성

학생: 국내 IT관련학과 대학원에 재학중인 전일제 석박사과정 대학원생

* 한국 국적의 내국인 (휴학생 또는 박사후 과정은 제외)

교수: 국내 IT관련학과 소속 전임교원으로서 지원기간 동안 프로젝트 총괄 및 학생 연구 지도가 가능한 자

 

  1. 지원절차

프로젝트 선정 공고 -> 제안서 제출(온라인 접수, 지원양식, 100% 영문제안) -> 선정심사 -> 지원대상 선정통보 -> 협약체결 및 프로젝트 경비 지급

* 지원양식에서 예산작성은 기업부문 USD10K 기준으로 작성. 정부부문의 프로젝트비는 선정통보후 별도 안내 예정

 

  1. 신청 유의사항

프로젝트팀은 총1개 분야에 한해 신청할 수 있음

신청자격에 부합하지 않을 경우 선정심사 대상에서 제외될 수 있음

* 국가연구개발사업에 참여제한 중인 자와 기관은 신청 불가

 

  1. 신청요령

신청방법: 이메일 신청([email protected])

신청접수마감: 2021년 3월 20일(토) 17:00

* 제출된 서류는 일제 반환되지 않음

 

  1. 문의처

사업담당: 마이크로소프트연구소 이미란 전무 (010-3600-4226, [email protected])

 

관련 게시물