Corpus Linguistics: a data mining methodology or a branch of Linguistics?


  • Tania M.G. Shepherd UERJ/FAPERJ/CNPq


Corpus Linguistics, corpus-driven approach, corpus-based approach.


The present paper problematizes empirical research based on corpus, in the light of the role played by Corpus Linguistics. The article opens with a discussion of the place of Corpus Linguistics itself, i.e., whether it can be considered part of Linguistics proper or whether it is no more than a methodology utilizing the computer in the investigation of linguistic phenomena. Without favoring either position, the discussion focuses on practical examples of corpus-based and corpus-driven approaches carried out in recent studies of both Portuguese and English as a Foreign Language. The work argues in this way that the status of a “branch of Linguistics” or linguistics “data mining methodology” may be established at the onset of the analysis of digital data. Whichever the role opted for, the results yielded make important contributions to the ways we view linguistic phenomena.

Author Biography

Tania M.G. Shepherd, UERJ/FAPERJ/CNPq

Professora adjunta de língua inglesa da Universidade do Estado do Rio de Janeiro, onde atua como coordenadora de língua inglesa e leciona na gradução e no Mestrado em Linguística. Pesquisadora do Programa Prociencia (UERJ/FAPERJ) desde 2003, e do CNPq desde março de 2009, a professora é vice-coordenadora do GrPesq de Linguística de Corpus do CNPq, coordenado pelo Professor Tony Berber Sardinha. Sob a supervisão deste último, e comoresultado de seu pós-doutoramento, a professora está desenvolvendo ferramenta online para correção automática de erros em lingua inglesa. Publica sobre a interface dos estudos da linguagem e tecnologia.



How to Cite

Shepherd, T. M. (2009). Corpus Linguistics: a data mining methodology or a branch of Linguistics?. MATRAGA - Journal Published by the Graduate Program in Letters at Rio De Janeiro State University (UERJ), 16(24). Retrieved from