Behind the Code: Meet Young-Suk Lee
Young-Suk Lee first learned about machine translation while studying English linguistics and literature. Now as a machine translation developer, she works on the multilingual systems that power Watson Language Translator. We talked to her about her first exposure to machine learning tech, the themes that run through her multiple projects, how she draws inspiration from her colleagues, and where she sees IBM using language innovation to solve the world’s biggest problems.
What is your current role at IBM?
I am a Research Staff Member in the Multilingual Natural Language Processing group led by Radu Florian.
I play a significant role in three projects with very different goals. I develop multilingual machine translation systems for Watson Language Translator, which focuses on productization of multilingual translation systems.
I’m involved in external customer engagement for patent translation, which focuses on customer support.
I do R&D in the research challenge Fundamental Advances in Semantic Parsing, which focuses on developing state-of-the-art technologies and systems for AMR (abstract meaning representation) parsing.
What does a day at IBM entail?
I work as if every day is my first day of work at IBM: I wake up every morning, all excited about the projects I work on. A bit nervous about things to accomplish as well as new skills to learn. The first thing I do every morning is to determine the project I will be working on, which depends on the discussions I have had with my management and the urgency of the task.
The most rewarding accomplishments so far this year, which I see grow organically with a strong ecosystem, include multilingual translation between English and Indian sub-continental languages (Bengali, Gujarathi, Malayalam, Nepali, Sinhala, Tamil, and Telegu). The system adopts multitask learning to achieve optimal performances for each language pair, while sharing the data across all languages to deliver a system greater than the sum of its parts.
For the AMR project, I have developed a technique for creating synthetic data. This technique, combined with transfer learning in my domain adaptation work for QALD (question-answering in linked data), eventually led to the state-of-the-art performance of an AMR parser without any additional human annotated data.
A common theme in all of my research, regardless of the application area, is to develop unsupervised data acquisition techniques scalable to any domain and language, which create business value by saving time and money for resource creation.
The only thing that changed in recent years is that I try to allocate as much time as possible for interacting and working with my colleagues and others, primarily because I have learned that relationship is something we have to work on to improve, and the greatest accomplishments are most often the outcome of great team work.
And I am thankful every day that I have a job I love and enjoy SO MUCH! I feel like I am the most fortunate person in the world and shout out loud every morning.
Tell us about your background and why you decided to pursue a career in tech.
I got into tech to do machine translation with a humanities background. I grew up in South Korea, majoring in English literature in college. I got exposed to machine translation while pursuing my master’s degree in English linguistics and became passionate about it.
I was a full-time teaching assistant for Language Research Institute at Seoul National University, South Korea. And the institute hosted a monthly interdisciplinary luncheon where professors of computer science and humanities get together and discuss current topics in language and computers. One of the professors—the only professor in the country working on machine translation back then—described the machine translation system developed by his team.
I got fascinated by the idea that machines can translate from one language to another, which is hard even for a human. We had mandatory education in English for 6 years in middle and high school, and another 4 years in college. And yet we struggle so much communicating in English. So I wanted to learn more about the technology and algorithms behind machine translation. I was very fortunate to be in an extraordinary school where I got exposed to the new world of AI.
By the time I was finishing the master’s program, I started applying for graduate schools in the U.S. with the ultimate goal of studying machine translation. And I decided to go to the University of Pennsylvania, which offered me a full scholarship (the William Penn fellowship) for my graduate studies and had the strongest natural language processing program. While pursuing my PhD, I received an MSE in computer science and engineering, which eventually made it easier for me to switch my career from linguistics to computer science.
Who is your role model in tech?
Pretty much every manager and colleague I have interacted with in the Multilingual Natural Language Processing group has served as my role model. The technical competence of every manager I have had in my almost 20 years of service at IBM has reminded me of why IBM Research is a premier research lab, weathering the constantly changing landscapes in the tech world.
All of the colleagues I have collaborated with aspire to be the top technical person with an utmost integrity. By collaborating with them I have come to realize that all great accomplishments are the outcomes of teamwork, and synergy among team members is what leads to the greatest results.
What tech trends do you see for 2020?
The unprecedented pandemic is demanding ever more user-centric multilingual AI applications run on hybrid cloud for real-time global communications. High-performance, often close to human-performance, multilingual natural processing applications available from Watson IBM Cloud, such as Watson Language Translator, the Watson Assistant, and the question-answering system GAAMA, will accelerate global internet communications, overcoming language barriers.
IBM has already responded to the pandemic by integrating high-performance multilingual natural language processing capabilities, as in this COVID-19 response multilingual chatbot using WLT and Watson Assistant launched by the Government of India, and using Go Ahead Ask Me Anything (GAAMA) technology to extract answers from the COVID-19 Open Research Dataset (CORD-19) of scientific articles, just to name a few. I see rapid innovation in this space by IBM to solve one of the world’s most important problems.