EdUHK Researcher Builds Corpus to Support Learning and Teaching of Cantonese

Hong Kong is a multilingual society, but nearly 90% of the population speak Cantonese as a first language. Cantonese is used in both formal and informal settings. Many non-local people living and working in Hong Kong therefore need to learn Cantonese in order to integrate themselves into the local community. Despite its dominant status, however, Cantonese has never been formalised and implemented into the school curriculum. Consequently, learning and teaching materials and teaching methods vary considerably. Dr Andy Chin Chi-on, Head and Associate Professor at the Department of Linguistics and Modern Language Studies, The Education University of Hong Kong (EdUHK), proposed a research programme adopting a more scientific and objective approach to promote the learning and teaching Cantonese.

Studies in the past five decades have enriched our understanding of the lexicon, phonology and grammar of Cantonese; yet some deeper issues, such as pragmatics, semantics and discourse, remain to be explored. This kind of research requires a significant amount of authentic and natural language data. The research team thus proposed the construction of a Cantonese corpus to expand the scope of Cantonese linguistic research. One major advantage of using corpus in language studies is the provision of objective, unbiased quantitative and qualitative data for research and other applications, including the compilation of language materials and natural language processing, such as speech-to-text and text-to-speech algorithms.

The research project started in 2011 with the support of an EdUHK internal research grant and the Early Career Scheme of the Research Grants Council. Dr Chin constructed the corpus in two phases with a size of about one million Chinese characters. The corpus data was collected by transcribing the dialogues of 80 black-and-white movies produced between the 1950s and 1970s, and is now available online.

The corpus won the Gold Medal and Special Award at the Silicon Valley International Invention Festival in 2019. Dr Chin has also developed mobile apps containing the corpus data. The CanPro app, which enables learners to practise Cantonese pronunciation through commonly used expressions in the corpus, won a Silver Medal at the 2021 Inventions Geneva Evaluation Days. Another mobile app called ‘Learn Cantonese with Big Data’, supported by the Language Fund of the Standing Committee on Language Education and Research, was launched in March 2022. One major feature of this app is the provision of linguistic information that Cantonese learners might find relevant and useful, such as the collocation of verb-noun, classifier-noun structures, which cannot be obtained without corpus data.

For the full article of the impact case study, please click here.

Show More

The Education University of Hong Kong

Nestled in a scenic mountain range, just one hour from Hong Kong’s business districts, The Education University of Hong Kong (EdUHK) offers tranquility and world-class scholarship in a vibrant, inclusive community. EdUHK is a publicly funded tertiary institution dedicated to the advancement of teaching and learning through diverse academic and research programmes on teacher education and complementary disciplines, including social sciences and humanities. The University places great emphasis on research capability with the aim of contributing to the advancement of knowledge, scholarship and innovation. EdUHK is committed to creating a sustainable impact on social progress and human betterment and defining the education landscape for not only Hong Kong, but also the Asia Pacific region. Ranked 3rd in Asia and 16th in the world in Education (QS World University Rankings by Subject 2021), EdUHK will continue to make an impact locally, regionally and internationally through high quality research and scholarship. Adopting an Education-plus approach, its primary mission is to lead educational innovation, and to promote and support the strategic development of teaching, teacher education and disciplines complementary to education by preparing outstanding and morally responsible educators and professionals while supporting their lifelong learning. To know more about EdUHK, please visit

Related Articles

Back to top button