Jan Wijffels
2018-Jan-15 18:29 UTC
[R] [R-pkgs] Natural Language Processing for non-English languages with udpipe
Dear R users, I'm happy to announce the release of version 0.3 of the udpipe R package on CRAN (https://CRAN.R-project.org/package=udpipe). The udpipe R package is a Natural Language Processing toolkit that provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization', 'morphological feature tagging' and 'dependency parsing' of raw text. Next to text parsing, the R package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at http://universaldependencies.org/format.html. The R package provides direct access to language models trained on more than 50 languages. The following languages are directly available: afrikaans, ancient_greek-proiel, ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech-cac, czech-cltt, czech, danish, dutch-lassysmall, dutch, english-lines, english-partut, english, estonian, finnish-ftb, finnish, french-partut, french-sequoia, french, galician-treegal, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin-ittb, latin-proiel, latin, latvian, lithuanian, norwegian-bokmaal, norwegian-nynorsk, old_church_slavonic, persian, polish, portuguese-br, portuguese, romanian, russian-syntagrus, russian, sanskrit, serbian, slovak, slovenian-sst, slovenian, spanish-ancora, spanish, swedish-lines, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese We hope that the package will allow other R users to build natural language applications on top of the resulting parts of speech tags, tokens, morphological features and dependency parsing output. And we hope in particular that applications will arise which are not limited to English only (like the textrank R package or the cleanNLP package to name a few) Note that the package has no external software dependencies (no java nor python) and depends only on 2 R packages (Rcpp and data.table), which makes the package easy to install on any platform. The package is available on CRAN at https://CRAN.R-project.org/package=udpipe and is developed at https://github.com/bnosac/udpipe A small docusaurus website is made available at https://bnosac.github.io/udpipe/en We hope you enjoy using it and we would like to thank Milan Straka for all the efforts done on UDPipe as well as all persons involved in http://universaldependencies.org all the best, Jan Jan Wijffels Statistician www.bnosac.be | +32 486 611708 [[alternative HTML version deleted]] _______________________________________________ R-packages mailing list R-packages at r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages