search for: corpora

Displaying 20 results from an estimated 24 matches for "corpora".

2012 Jun 12
0
Fwd: [Corpora-List] ACM SIGIR 2012 Workshop on Open Source Information Retrieval
This might be an interesting option for some of you! Regards, Parth. ---------- Forwarded message ---------- From: Andrew Trotman <andrew at cs.otago.ac.nz> Date: Tue, Jun 12, 2012 at 5:12 AM Subject: [Corpora-List] ACM SIGIR 2012 Workshop on Open Source Information Retrieval To: corpora at uib.no ACM SIGIR 2012 WORKSHOP ON OPEN SOURCE INFORMATION RETRIEVAL**** 16 August 2012, Portland, Oregon, USA**** http://opensearchlab.otago.ac.nz/**** ** ** NEWS**** Grant Ingersoll and Jamie Callan confirmed...
2011 Sep 02
2
Classifying large text corpora using R
...d I have been struggling with this issue for a long time. Please consider helping me out, directly or by pointing me to any other software/website that you think may be more appropriate. Many thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Classifying-large-text-corpora-using-R-tp3786787p3786787.html Sent from the R help mailing list archive at Nabble.com.
2009 Aug 13
0
Efficiently Extracting Meta Data from TM Corpora
I'm using text miner (the "tm" package) to process large numbers of blog and message board postings (about 245,000). Does anyone have any advice for how to efficiently extract the meta data from a corpus of this size? TM does a great job of using MPI for many functions (e.g. tmMap) which greatly speed up the processing. However, the "meta" function that I need does not
2011 Nov 17
3
merging corpora and metadata
Greetings! I loose all my metadata after concatenating corpora. This is an example of what happens: > meta(corpus.1) MetaID cid fid selfirst selend fname 1 0 1 11 2169 2518 WCPD-2001-01-29-Pg217.scrb 2 0 1 14 9189 9702 WCPD-2003-01-13-Pg39.scrb 3 0 1 14 2109 2577 WCPD-2003-0...
2009 Sep 15
2
S3 objects in S4 slots
Hello, I am the maintainer of the stringkernels package and have come across a problem with using S3 objects in my S4 classes. Specifically, I have an S4 class with a slot that takes a text corpus as a list of character vectors. tm (version 0.5) saves corpora as lists with a class attribute of c("VCorpus", "Corpus", "list"). I don't actually need the class-specific attributes, I only care about the list itself. Here's a simplified example of my problem: > setClass("testclass", representation(slot=&quo...
2018 Oct 04
2
Indexing Chinese?
...acters and words? Searching online mostly returns results from a decade ago or more, with nothing very conclusive. How close is this to possible? For the time being I'm doing some pre-processing on long strings of Chinese, breaking on punctuation in order to avoid errors. But I have some large corpora of Chinese texts that in the future I'd like to index properly. Thanks, Eric
2011 Sep 02
1
[PATCH 0/7] hivex + hivexml: Add byte runs for nodes and values
...required several new ABI functions, which required new ABI return types. One benefit to the byte run functions is additional sanity checks, which have revealed new data or parsing errors when run on M57 patents images. An example error: Image: Charlie, 2009-12-11, available at <http://digitalcorpora.org/corpora/scenarios/m57-patents-scenario>. hive: C:/WINDOWS/system32/config/SECURITY Address 12624 is processed as a value, but it has a node signature. Alex Nelson (7): generator: Add new return type to ABI: RSize hivex: Split value_key function into value_key and value_key_len genera...
2012 Sep 20
3
(no subject)
...d b from your memory, rm(a); rm(b) # how do you generate the following table only from freq.list.a and freq.list.b, i.e., without any reference to a and b themselves? Before you complain about this question as being unrealistic, consider the possibility that you generated the frequency lists of two corpora (here, a and b) that are so large that you cannot combine them into one (a.and.b<-c(a, b)) and generate a frequency list of that combined vector (table(a.and.b)) ... joint.freqs a b d e f g i j 3 1 3 1 5 5 1 1 joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a), names(freq.lis...
2016 Jun 03
2
Custom assembler subset
...ke to restrain the compiler that I build on my local box from > > picking all but a particular set of opcodes. Is there a way to accomplish > > this in a straightforward way? > > Can you elaborate a bit on what you're really trying to achieve? > Well, I need to construct a corpora. I would like a way to restrain the output binaries and trust that the generated binaries use an assembler subset that I can work with to be able to more precisely measure the results of my work. > > One starting point could be to introduce new subtarget features, and > use them as predi...
2005 May 27
1
logistic regression
Hi I am working on corpora of automatically recognized utterances, looking for features that predict error in the hypothesis the recognizer is proposing. I am using the glm functions to do logistic regression. I do this type of thing: * logistic.model = glm(formula = similarity ~., family = binomial, data = data...
2012 Mar 11
1
CRAN (and crantastic) updates this week
...A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats). #' Note: For full functionality a local installation of TreeTagger is recommended. Be encouraged to send feedback to the author(s)! * networkDynamic (0.2-2) Maintainer: Ayn...
2002 Aug 19
4
Format converters
...e, "Don't do this", and "Let them switch to Ogg Vorbis" are not productive. In that case she would simply switch to the corpus codec, e.g., MP3 at 192 kbs. This is not as far-fetched as it might seem. In Europe and the USA, lots of money is currently spent on building large corpora of small languages. These languages are spoken in Jungles (Amazonia, South-east Asia), Tundra's (Northern Siberia, Kamchatka), and high mountain area's (Himalaya's). Furthermore, large corpora (>100GB) of natural speech are collected by volunteers carrying minidisc equipment. All thi...
2017 Oct 27
2
Why does LLVm 3.8.0 recognize fputs_unlocked as a vararg function?
Considering F represents the function fputs_unlocked() in an LLVM pass, => F->isVarArg() returns true => F->getNumParams() returns 0 => *F returns declare i32 @fputs_unlocked(...) The signature of fputs_unlocked from man page is: int fputs_unlocked(const char *s, FILE *stream); Can anybody explain why fputs_unlocked() is recognized as a vararg method while it accepts two fixed
2018 Mar 12
0
llvm bit code binding use case
...hey have been normalized is a feasible solution, because def use chains tend to be rather small, and equivalence is either by identical expressions or not. Constant folding is one way to achieve this, by tagging the ssa value associated with a particular property. So my questions are: Is there a corpora of known source code targets for which I can compile to both llvm bitcode and executable on linux easily? Is there an api and tool whereby I can compile a llvm plugin or utility that will load a bitcode, transform it to the appropriate data structures and LLVM IR, and feed that IR to my plugin or...
2018 May 17
0
Backend Plugins?
Kenneth Adam Miller wrote: > By address, I mean the selected location in the binary. I need the > ground truth for other static analyses. That's not determined until instructions are encoded for the object file, which is pretty deep down in MC. I can imagine a couple of ways to get the info you want, but it's not pretty. You could emit a label for every instruction, and then work
2018 Oct 04
0
Indexing Chinese?
...line mostly returns > results from a decade ago or more, with nothing very conclusive. How > close is this to possible? > > For the time being I'm doing some pre-processing on long strings of > Chinese, breaking on punctuation in order to avoid errors. But I have > some large corpora of Chinese texts that in the future I'd like to index > properly. > > Thanks, > Eric > >
2011 Jul 19
2
read.csv help
Hi, I'm a new R user and I'm having trouble with the read.csv command. It somehow treats the first column as a row name field even though it's not a row name. there are no missing columns/entries and i'm not sure how to resolve this. the format of my data is A, B, C, D,......(3984 columns) 12, 13, 41,......(all numeric) it either treats column A as rownames or if I explicitly
2018 May 17
2
Backend Plugins?
On Thu, May 17, 2018, 3:31 PM Friedman, Eli <efriedma at codeaurora.org> wrote: > On 5/17/2018 12:22 PM, Kenneth Adam Miller wrote: > > > On Thu, May 17, 2018 at 3:09 PM, Friedman, Eli <efriedma at codeaurora.org> > wrote: > >> On 5/17/2018 10:10 AM, Kenneth Adam Miller via llvm-dev wrote: >> >>> Hello, >>> >>> >>>
2016 Jun 01
2
Custom assembler subset
Hello all, I would like to restrain the compiler that I build on my local box from picking all but a particular set of opcodes. Is there a way to accomplish this in a straightforward way? I'm pretty sure that there is a list of opcodes to semantics mappings. In addition, is there a way to look at an associative mapping of LLVM IR to opcode, and/or vice versa? -------------- next part
2007 Mar 26
1
Problem in loading all packages all at once
...ot;,"CoCoRaw","cocorresp","coda","coin","colorspace","combinat","compositions","concor","concord","cond","conf.design","connectedness","copula","corpcor","corpora","covRobust","coxrobust","cramer","crossdes","crq","c sampling","cslogistic","CTFS","ctv") TEMP <- c(TEMP,"CVThresh","cwhmath","cwhplot","cwhprint","cwhst...