Displaying 20 results from an estimated 24 matches for "corpora".
2012 Jun 12
0
Fwd: [Corpora-List] ACM SIGIR 2012 Workshop on Open Source Information Retrieval
This might be an interesting option for some of you!
Regards,
Parth.
---------- Forwarded message ----------
From: Andrew Trotman <andrew at cs.otago.ac.nz>
Date: Tue, Jun 12, 2012 at 5:12 AM
Subject: [Corpora-List] ACM SIGIR 2012 Workshop on Open Source Information
Retrieval
To: corpora at uib.no
ACM SIGIR 2012 WORKSHOP ON OPEN SOURCE INFORMATION RETRIEVAL****
16 August 2012, Portland, Oregon, USA****
http://opensearchlab.otago.ac.nz/****
** **
NEWS****
Grant Ingersoll and Jamie Callan confirmed...
2011 Sep 02
2
Classifying large text corpora using R
...d I have been struggling with this issue for a long time. Please consider
helping me out, directly or by pointing me to any other software/website
that you think may be more appropriate.
Many thanks in advance.
--
View this message in context: http://r.789695.n4.nabble.com/Classifying-large-text-corpora-using-R-tp3786787p3786787.html
Sent from the R help mailing list archive at Nabble.com.
2009 Aug 13
0
Efficiently Extracting Meta Data from TM Corpora
I'm using text miner (the "tm" package) to process large numbers of blog and message board postings (about 245,000). Does anyone have any advice for how to efficiently extract the meta data from a corpus of this size?
TM does a great job of using MPI for many functions (e.g. tmMap) which greatly speed up the processing. However, the "meta" function that I need does not
2011 Nov 17
3
merging corpora and metadata
Greetings!
I loose all my metadata after concatenating corpora. This is an
example of what happens:
> meta(corpus.1)
MetaID cid fid selfirst selend fname
1 0 1 11 2169 2518 WCPD-2001-01-29-Pg217.scrb
2 0 1 14 9189 9702 WCPD-2003-01-13-Pg39.scrb
3 0 1 14 2109 2577 WCPD-2003-0...
2009 Sep 15
2
S3 objects in S4 slots
Hello,
I am the maintainer of the stringkernels package and have come across
a problem with using S3 objects in my S4 classes.
Specifically, I have an S4 class with a slot that takes a text corpus
as a list of character vectors. tm (version 0.5) saves corpora as
lists with a class attribute of c("VCorpus", "Corpus", "list"). I
don't actually need the class-specific attributes, I only care about
the list itself.
Here's a simplified example of my problem:
> setClass("testclass", representation(slot=&quo...
2018 Oct 04
2
Indexing Chinese?
...acters and words? Searching online mostly returns
results from a decade ago or more, with nothing very conclusive. How
close is this to possible?
For the time being I'm doing some pre-processing on long strings of
Chinese, breaking on punctuation in order to avoid errors. But I have
some large corpora of Chinese texts that in the future I'd like to index
properly.
Thanks,
Eric
2011 Sep 02
1
[PATCH 0/7] hivex + hivexml: Add byte runs for nodes and values
...required several new ABI
functions, which required new ABI return types. One benefit to the byte
run functions is additional sanity checks, which have revealed new data
or parsing errors when run on M57 patents images. An example error:
Image: Charlie, 2009-12-11, available at <http://digitalcorpora.org/corpora/scenarios/m57-patents-scenario>.
hive: C:/WINDOWS/system32/config/SECURITY
Address 12624 is processed as a value, but it has a node signature.
Alex Nelson (7):
generator: Add new return type to ABI: RSize
hivex: Split value_key function into value_key and value_key_len
genera...
2012 Sep 20
3
(no subject)
...d b from your
memory,
rm(a); rm(b)
# how do you generate the following table only from freq.list.a and
freq.list.b, i.e., without any reference to a and b themselves? Before
you complain about this question as being unrealistic, consider the
possibility that you generated the frequency lists of two corpora
(here, a and b) that are so large that you cannot combine them into
one (a.and.b<-c(a, b)) and generate a frequency list of that combined
vector (table(a.and.b)) ...
joint.freqs
a b d e f g i j
3 1 3 1 5 5 1 1
joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a),
names(freq.lis...
2016 Jun 03
2
Custom assembler subset
...ke to restrain the compiler that I build on my local box from
> > picking all but a particular set of opcodes. Is there a way to accomplish
> > this in a straightforward way?
>
> Can you elaborate a bit on what you're really trying to achieve?
>
Well, I need to construct a corpora. I would like a way to restrain the
output binaries and trust that the generated binaries use an assembler
subset that I can work with to be able to more precisely measure the
results of my work.
>
> One starting point could be to introduce new subtarget features, and
> use them as predi...
2005 May 27
1
logistic regression
Hi
I am working on corpora of automatically recognized utterances, looking
for features that predict error in the hypothesis the recognizer is
proposing.
I am using the glm functions to do logistic regression. I do this type
of thing:
* logistic.model = glm(formula = similarity ~., family = binomial,
data = data...
2012 Mar 11
1
CRAN (and crantastic) updates this week
...A set of tools to analyze texts. Includes, amongst others, functions
for automatic language detection, hyphenation, several indices of
lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and
readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import
functions for language corpora are also provided, to enable
frequency analyses (supports Celex and Leipzig Corpora Collection
file formats). #' Note: For full functionality a local installation
of TreeTagger is recommended. Be encouraged to send feedback to the
author(s)!
* networkDynamic (0.2-2)
Maintainer: Ayn...
2002 Aug 19
4
Format converters
...e, "Don't do this", and "Let them switch to Ogg Vorbis" are
not productive. In that case she would simply switch to the corpus
codec, e.g., MP3 at 192 kbs.
This is not as far-fetched as it might seem. In Europe and the USA, lots
of money is currently spent on building large corpora of small
languages. These languages are spoken in Jungles (Amazonia, South-east
Asia), Tundra's (Northern Siberia, Kamchatka), and high mountain area's
(Himalaya's). Furthermore, large corpora (>100GB) of natural speech are
collected by volunteers carrying minidisc equipment. All thi...
2017 Oct 27
2
Why does LLVm 3.8.0 recognize fputs_unlocked as a vararg function?
Considering F represents the function fputs_unlocked() in an LLVM pass,
=> F->isVarArg() returns true
=> F->getNumParams() returns 0
=> *F returns declare i32 @fputs_unlocked(...)
The signature of fputs_unlocked from man page is:
int fputs_unlocked(const char *s, FILE *stream);
Can anybody explain why fputs_unlocked() is recognized as a vararg method
while it accepts two fixed
2018 Mar 12
0
llvm bit code binding use case
...hey have been
normalized is a feasible solution, because def use chains tend to be rather
small, and equivalence is either by identical expressions or not. Constant
folding is one way to achieve this, by tagging the ssa value associated
with a particular property.
So my questions are:
Is there a corpora of known source code targets for which I can compile to
both llvm bitcode and executable on linux easily?
Is there an api and tool whereby I can compile a llvm plugin or utility
that will load a bitcode, transform it to the appropriate data structures
and LLVM IR, and feed that IR to my plugin or...
2018 May 17
0
Backend Plugins?
Kenneth Adam Miller wrote:
> By address, I mean the selected location in the binary. I need the
> ground truth for other static analyses.
That's not determined until instructions are encoded for the object
file, which is pretty deep down in MC. I can imagine a couple of ways
to get the info you want, but it's not pretty. You could emit a label
for every instruction, and then work
2018 Oct 04
0
Indexing Chinese?
...line mostly returns
> results from a decade ago or more, with nothing very conclusive. How
> close is this to possible?
>
> For the time being I'm doing some pre-processing on long strings of
> Chinese, breaking on punctuation in order to avoid errors. But I have
> some large corpora of Chinese texts that in the future I'd like to index
> properly.
>
> Thanks,
> Eric
>
>
2011 Jul 19
2
read.csv help
Hi,
I'm a new R user and I'm having trouble with the read.csv command. It
somehow treats the first column as a row name field even though it's not a
row name. there are no missing columns/entries and i'm not sure how to
resolve this.
the format of my data is
A, B, C, D,......(3984 columns)
12, 13, 41,......(all numeric)
it either treats column A as rownames or if I explicitly
2018 May 17
2
Backend Plugins?
On Thu, May 17, 2018, 3:31 PM Friedman, Eli <efriedma at codeaurora.org> wrote:
> On 5/17/2018 12:22 PM, Kenneth Adam Miller wrote:
>
>
> On Thu, May 17, 2018 at 3:09 PM, Friedman, Eli <efriedma at codeaurora.org>
> wrote:
>
>> On 5/17/2018 10:10 AM, Kenneth Adam Miller via llvm-dev wrote:
>>
>>> Hello,
>>>
>>>
>>>
2016 Jun 01
2
Custom assembler subset
Hello all,
I would like to restrain the compiler that I build on my local box from
picking all but a particular set of opcodes. Is there a way to accomplish
this in a straightforward way? I'm pretty sure that there is a list of
opcodes to semantics mappings.
In addition, is there a way to look at an associative mapping of LLVM IR to
opcode, and/or vice versa?
-------------- next part
2007 Mar 26
1
Problem in loading all packages all at once
...ot;,"CoCoRaw","cocorresp","coda","coin","colorspace","combinat","compositions","concor","concord","cond","conf.design","connectedness","copula","corpcor","corpora","covRobust","coxrobust","cramer","crossdes","crq","c
sampling","cslogistic","CTFS","ctv")
TEMP <-
c(TEMP,"CVThresh","cwhmath","cwhplot","cwhprint","cwhst...