Hey Listee's
I am trying to write a shell script to sort and compare my blacklist
for squidGuard with the nightly updates that come down in a tar ball.
It should be rather simple but I'm not to grate at this. The script is
to run nightly, it will download the latest blacklist tarball, un tar
it and then add any new entries to the existing black list. The
blacklists work by having a folder for each filtered category so the
folder "db" contains the subfolders "adult",
"gambling", "drugs" etc
and each sub folder has two files, "domains" and "urls"
(pretty self
explanitory). This is how far I have gotten (I haven't tested this
script yet as I haven't had a chance I have only gotten as far as
writting it, this is what I have so far:
#!/bin/bash
#This will be running from home directory
wget http://www.blacklistsite.com/blacklist.tar
tar -cxf blacklist.tar
cd BL
find ./ -type d -maxdepth 1 | while read FOLDER; do
SQUIDDB="usr/local/squidGuard/db/$FOLDER"
sort_db($SQUIDDB)
comm -3 $SQUIDDB/domains $FOLDER/domains > $SQUIDDB/domains.missing
comm -3 $SQUIDDB/urls $FOLDER/urls > $SQUIDDB/urls.missing
cat $SQUIDDB/domains.missing >> $SQUIDDB/domains
cat $SQUIDDB/urls.missing >> $SQUIDDB/urls
rm $SQUIDDB/domains.missing
rm $SQUIDDB/urls.missing
sort_db($SQUIDDB)
done
sort_db(){
sort -f $1/domains > $1/domains.sorted
sort -f $1/urls > $1/urls.sorted
rm $1/domains
rm $1/urls
mv $1/doamins.sorted $1/domains
mv $1/urls.sorted $1/urls
}
Is it obvious I'm new to this? Hehe, I would also love to hear how
people would do this in a more efficient manner because obvisouly this
is pretty sloppy and as I said I haven't tested it yet so it might not
even run?!
Thanks, James ;)
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GIT/MU/U dpu s: a--> C++>$ U+> L++> B-> P+> E?> W+++>$ N
K W++ O M++>$ V-
PS+++ PE++ Y+ PGP t 5 X+ R- tv+ b+> DI D+++ G+ e(+++++) h--(++) r++ z++
------END GEEK CODE BLOCK------
on 5-13-2009 4:21 AM James Bensley spake the following:> Hey Listee's > > I am trying to write a shell script to sort and compare my blacklist > for squidGuard with the nightly updates that come down in a tar ball. > It should be rather simple but I'm not to grate at this. The script is > to run nightly, it will download the latest blacklist tarball, un tar > it and then add any new entries to the existing black list. The > blacklists work by having a folder for each filtered category so the > folder "db" contains the subfolders "adult", "gambling", "drugs" etc > and each sub folder has two files, "domains" and "urls" (pretty self > explanitory). This is how far I have gotten (I haven't tested this > script yet as I haven't had a chance I have only gotten as far as > writting it, this is what I have so far: > > > #!/bin/bash > #This will be running from home directory > > wget http://www.blacklistsite.com/blacklist.tar > tar -cxf blacklist.tar > cd BL > > find ./ -type d -maxdepth 1 | while read FOLDER; do > SQUIDDB="usr/local/squidGuard/db/$FOLDER" > sort_db($SQUIDDB) > comm -3 $SQUIDDB/domains $FOLDER/domains > $SQUIDDB/domains.missing > comm -3 $SQUIDDB/urls $FOLDER/urls > $SQUIDDB/urls.missing > cat $SQUIDDB/domains.missing >> $SQUIDDB/domains > cat $SQUIDDB/urls.missing >> $SQUIDDB/urls > rm $SQUIDDB/domains.missing > rm $SQUIDDB/urls.missing > sort_db($SQUIDDB) > done > > sort_db(){ > sort -f $1/domains > $1/domains.sorted > sort -f $1/urls > $1/urls.sorted > rm $1/domains > rm $1/urls > mv $1/doamins.sorted $1/domains > mv $1/urls.sorted $1/urls > } > > Is it obvious I'm new to this? Hehe, I would also love to hear how > people would do this in a more efficient manner because obvisouly this > is pretty sloppy and as I said I haven't tested it yet so it might not > even run?! > > Thanks, James ;) > > -----BEGIN GEEK CODE BLOCK----- > Version: 3.1 > GIT/MU/U dpu s: a--> C++>$ U+> L++> B-> P+> E?> W+++>$ N K W++ O M++>$ V- > PS+++ PE++ Y+ PGP t 5 X+ R- tv+ b+> DI D+++ G+ e(+++++) h--(++) r++ z++ > ------END GEEK CODE BLOCK------Are you looking to have a custom blacklist, or do you just want to know what changed? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/centos/attachments/20090513/d6e61bf9/attachment-0004.sig>
> to run nightly, it will download the latest blacklist tarball, un tar > it and then add any new entries to the existing black list. Theif you're already going to the effort of downloading the entire blacklist every night, why not dump the old database, and just insert the newly downloaded one?> tar -cxf blacklist.tarthis will suck your computer into a vortex of doom. I recommend either creating a tarball, or extracting one, but not both at the same time. :) In all honesty, you might be better targeting this query to squidGuard users, as this may be something they do regularly. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://lists.centos.org/pipermail/centos/attachments/20090514/a27b93b0/attachment-0004.sig>
> if you're already going to the effort of downloading the entire > blacklist every night, why not dump the old database, and just insert > the newly downloaded one?Because we also add our own entries to the current blacklist so we are just adding any new entries from the nightly updates of our blacklist provides>> tar -cxf blacklist.tar > > this will suck your computer into a vortex of doom. I recommend either > creating a tarball, or extracting one, but not both at the same time. :)Its ok the blacklist is text so its a 10mb tarball of text. Takes about 30 seconds to download and it will take about 2 minutes for the script to run ;)> In all honesty, you might be better targeting this query to squidGuard > users, as this may be something they do regularly.Should be simple text manipulation :( none the less a good idea I will post my question there. Thanks! -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GIT/MU/U dpu s: a--> C++>$ U+> L++> B-> P+> E?> W+++>$ N K W++ O M++>$ V- PS+++ PE++ Y+ PGP t 5 X+ R- tv+ b+> DI D+++ G+ e(+++++) h--(++) r++ z++ ------END GEEK CODE BLOCK------