Acquiring vocabulary is important when studying English, as it assists in listening, speaking, reading, and writing. In this paper, we develop an English webpage corpus (EWC) and create a word frequency list using web crawler technology. By comparing EWC word lists with the British National corpus (BNC), we find that the BNC word frequency list possesses the feature of timeliness. We also explore primary school students’ English word recognition rates by comparing the word frequency lists of several corpora, including EWC, BNC, SUBTLEX-US, and Subtitle corpus of Children’s BBC (CBBC). The results show that the word recognition rates for primary school children are relatively low in both general language and specific language register. Motivated by the experiment results, we finally propose some word-selection strategies for compiling English textbooks for Chinese primary school students.


概要:词汇是语言学习中的基础任务之一,是语言学习者发展听、说、读、写语言技能的重要前提,在教材课文选择中要覆盖哪些词汇,是教材编写中的基本问题。针对这个问题,本文利用网络爬虫等技术构建英文网页语料库(English webpage corpus, EWC),并进行词频分析;将EWC与英国国家语料库(British National Corpus, BNC)进行词频对比分析,发现词频分布具有一定的时效性。通过我国目前小学英语教材词汇表与EWC,BNC,SUBTLEX-US,CBBC词频表的对比分析,给出了小学生在一般阅读时的英语词汇认识率,分析结果表明,小学生对一般语域和特定语域的词汇认识率都相对较低。通过这些定量分析,本文为我国小学英语教材编写提出了一些词汇选择方面的策略。


