• 1932 November 18
    (b.) -
    2010 September 14


A Czech-American researcher in information theory, automatic speech recognition, and natural language processing, he was well known for his oft-quoted statement, "Every time I fire a linguist, the performance of the speech recognizer goes up". He was born as Bedřich Jel?nek in Kladno, Czechoslovakia just before the outbreak of World War II to Vil?m and Trude Jelinek. His father was Jewish; his mother was born in Switzerland to Czech Catholic parents and had converted to Judaism. His father, a dentist, had planned early for an escape to England; he arranged for a passport, visa, and the shipping of his dentistry materials. The couple planned to send their son to an English private school. However, Vil?m decided to stay at the last minute and was eventually sent to the Theresienstadt concentration camp, where he died in 1945. The family was forced to move to Prague in 1941, but he, along with his sister and mother ?thanks to the latter's background - escaped the concentration camps. After the war, he entered in the gymnasium, a type of school with a strong emphasis on academic learning, and providing advanced secondary education in some parts of Europe and the CIS, comparable to British grammar schools, sixth form colleges and U.S. preparatory high schools. He did this despite having missed several years of schooling because education of Jewish children had been forbidden since 1942. His mother, anxious that her son should get a good education, made great efforts for their emigration, especially when it became clear he would not be allowed to even attempt the graduation examination. He immigrated with his family to the United States in the early years of the communist regime. He studied engineering in evening classes at the City College of New York and received stipends from the National Committee for a Free Europe that allowed him to study at the Massachusetts Institute of Technology where he studied Engineering. About his choice of specialty, he said: "Fortunately, to electrical engineering there belonged a discipline whose aim was not the construction of physical systems: the theory of information". In 1961, he married Czech screenwriter Milena Jelinek. He obtained his Ph.D. in 1962, with Robert Fano as his adviser. After completing his graduate studies, he, having developed an interest in linguistics, had plans to work with Charles F. Hockett at Cornell University. Unfortunately for him, these fell through and during the next ten years he continued to study information theory. He taught for 10 years at Cornell University. Having previously worked at IBM during a sabbatical, he began full-time work there in 1972 - at first on leave from Cornell, but permanently in 1974 at IBM Research. He remained there for over twenty years. Although at first he had been offered a regular research job, upon his arrival he learned that Josef Raviv had recently been promoted to head of the newly opened IBM Haifa Research Laboratory, and became head of the Continuous Speech Recognition group at the Thomas J. Watson Research Center. Despite his team's successes in this area, his work remained little known in his home country because Czech scientists were not allowed to participate in key conferences. At IBM, his team revolutionized approaches to computer speech recognition and machine translation. He had begun to develop an interest in linguistics after the immigration of his wife, who initially enrolled in the MIT linguistics program with the help of Roman Jakobson. He often accompanied her to Chomsky's lecture, and even discussed the possibility of changing orientation with his adviser. Fano was "really upset", and after the failure of his project with Hockett at Cornell, he did not return to this field of research until starting work at IBM. The scope of research at IBM was considerably different from that of most other teams. According to Liberman, "While [Jenlinek] was leading IBM?s effort to solve the general dictation problem during the decade or so following 1972, most other U.S. companies and academic researchers were working on very limited problems, or were staying out of the field entirely". It was said by Steve Young in 2010, "He was not a pioneer of speech recognition; he was the pioneer of speech recognition." He regarded speech recognition as an information theory problem - a noisy channel, in this case the acoustic signal - which some observers considered a daring approach. The concept of perplexity was introduced in their first model, New Raleigh Grammar, which was published in 1976 as the paper "Continuous Speech Recognition by Statistical Methods" in the journal Proceedings of the IEEE. According to Young, the basic noisy channel approach "reduced the speech recognition problem to one of producing two statistical models". Whereas New Raleigh Grammar was a hidden Markov model, their next model, called Tangora, was broader and involved n-grams, specifically trigrams. Even though "it was obvious to everyone that this model was hopelessly impoverished", it was not improved upon until he presented another paper in 1999. The same trigram approach was applied to phones in single words. Although the identification of parts of speech turned out not to be very useful for speech recognition, tagging methods developed during these projects are now used in various NLP applications. The incremental research techniques developed at IBM eventually became dominant in the field after DARPA, in the mid-80s, returned to NLP research and imposed that methodology to participating teams, shared common goals, data, and precise evaluation metrics. The Continuous Speech Recognition Group's research, which required large amounts of data to train the algorithms, eventually led to the creation of the Linguistic Data Consortium. In the 1980s, although the broader problem of speech recognition remained unsolved, they sought to apply the methods developed to other problems; machine translation and stock value prediction. A group of IBM researchers went on to work for Renaissance Technologies. He wrote, "The performance of the Renaissance fund is legendary, but I have no idea whether any methods we pioneered at IBM have ever been used. My former colleagues will not tell me: theirs is a very hush-hush operation!" Methods very similar to those developed for achieving speech recognition are at the base of most machine translation systems in use today. Observers have said that Pierce's paradigm, according to which engineering achievements in this area would be built on scientific progress, has been inverted, with the achievements in engineering being at the base of a number of scientific findings. After the 1989 fall of communism, he helped establish scientific relationships, regularly visiting to lecture and helping to persuade IBM to establish a computing center at Charles University. In 1993, he retired from IBM and went to Johns Hopkins University to be the Director of the Center for Language and Speech Processing. He was also a Julian Sinclair Smith Professor of Electrical and Computer Engineering. He worked there for 17 years and was still working there at the time of his death from a heart attack at the close of an otherwise normal workday. He was survived by his wife, daughter and son. His works won "best paper" awards on several occasions, and he received a number of company awards while he worked at IBM. He received the Society Award for "outstanding technical contributions and leadership" from the IEEE Signal Processing Society for 1997, and the ESCA Medal for Scientific Achievement in 1999. He was a recipient of an IEEE Third Millennium Medal in 2000, the ELRA's first Antonio Zampolli Prize in 2004, the 2005 James L. Flanagan Speech and Audio Processing Award, and the 2009 Lifetime Achievement Award from the Association for Computational Linguistics. He received an honoris causa Ph.D. from Charles University in 2001, was elected to the National Academy of Engineering in 2006 and was made one of twelve inaugural fellows of the International Speech Communication Association in 2008. Among the numerous works he authored or co-authored are: (1968) ?Probabilistic Information Theory: Discrete and memoryless models.? McGraw-Hill series in systems science. New York: McGraw-Hill. 689p. LCCN 68-11611; (1969) "Fast sequential decoding algorithm using a stack". IBM Journal of Research and Development 13(6):675?685. doi:10.1147/rd.136.0675; (1976) "Continuous speech recognition by statistical methods". Proceedings of the IEEE 64(4):532?556, doi:10.1109/PROC.1976.10159; with Brown, P.; J. Cocke, S. Della Pietra, V. Della Pietra, R, Mercer and P. Roossin (1988). "A statistical approach to language translation". In D?nes Vargha, ed. Coling 88: Proceedings of the 12th conference on Computational linguistics, volume 1. Budapest: John Von Neumann society for computing sciences. pp. 71?76. doi:10.3115/991635.991651. ISBN 963-8431-56-3; (1997) ?Statistical Methods for Speech Recognition?, Cambridge, Mass.: MIT Press. 283p. ISBN 0-262-10066-5. (review) (review 2); and with Xu, Peng and Ahmad Emami (2003), "Training Connectionist Models for the Structured Language Model", In Michael Collins and Mark Steedman, eds. EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing. East Stroudsburg, Penn.: Association for Computational Linguistics. pp. 160?167, ISBN 1-932432-13-2. doi:10.3115/1119355.1119376. (won "best paper" award)
  • Date of Birth:

    1932 November 18
  • Date of Death:

    2010 September 14
  • Noted For:

    Team member at IBM that revolutionized approaches to computer speech recognition and machine translation and was considered to be the pioneer of speech recognition
  • Category of Achievement:

  • More Info: