thr3ads.net - CentOS - [CentOS] recursively count the words occurrence in the text files [Dec 2010]

If this information is useful, please help other people find it:
Share via:

S Mathias

2010-Dec-30 18:34 UTC

[CentOS] recursively count the words occurrence in the text files

I just can't google for it:

I'm searching for a "bash" "one liner" (awk, perl, or
anything) for this:

there are text files, in several directories: 

mkdir one
mkdir two
mkdir three

echo "word1 word2 word3" > one/asf.txt
echo "word2 word4, word5" > one/asfcxv saf.txt
echo "word1. word2" > one/dsgsdg.txt

echo "word6, word3!" > two/sdgsd dsf.txt
echo "word6" > two/ergd.txt

echo "asdf, word2" > three/werdf.txt
echo "word7, word8 word9 word10" > three/qwerb erfsdgdsg.txt
echo "word4 word3" > three/web erg as.txt

so it does the magic* "recursively":

$ SOMEMAGIC > output.txt
cat output.txt
asdf 1
word1 2
word2 4
word3 3
word4 2
word5 1
word6 2
word7 1
word8 1
word9 1
word10 1
$



*recursively count the words occurrence in the text files like: "word1
2"
can anyone point to a howto/link? [re: i just can't google for it :\]

Stephen Harris

2010-Dec-30 18:56 UTC

head link

[CentOS] recursively count the words occurrence in the text files

On Thu, Dec 30, 2010 at 10:34:58AM -0800, S Mathias
wrote:> I just can't google for it:
I'm a little concerned about the number of "schoolbook" questions
showing
on this list, recently.  However...
> echo "word1 word2 word3" > one/asf.txt
> echo "word2 word4, word5" > one/asfcxv saf.txt
Yeah, that line won't work like you think.
> $ SOMEMAGIC > output.txt
I'd do something like
  cat */* | tr -c '[:alpha:]' '\012' | grep -v '^$' |
sort | uniq -c

If more than just one layer of subdirs, replace the "cat */*" with
  find . -type f -exec cat {} \;

-- 

rgds
Stephen

Possibly Parallel Threads

Search for more apparently analagous threads

CentOS - Dec 2010 - recursively count the words occurrence in the text files

[CentOS] recursively count the words occurrence in the text files

[CentOS] recursively count the words occurrence in the text files

Possibly Parallel Threads