Hi, Joseph:
What are your priorities regarding the US lobbying database?
At
"http://www.senate.gov/legislative/Public_Disclosure/LDA_reports.htm",
I
see 4 links:
* Search the Lobbying Database (LD-1, LD-2)
* Download a Lobbying Documents Database
* Search the Contributions Database (LD-203)
* Downloadable Contributions Databases
Am I correct that the two "Search" links are to databases that
contain lots of nonsense, and the task is to download the "Lobbying
Documents" and maybe also the "Contributions" database, run a
number of
checks, screen out the nonsense and create search capabilities similar
to what is offered at this web site but without the garbage?
I downloaded one file from
"http://www.senate.gov/legislative/Public_Disclosure/database_download.htm".
I see that it's "xml" inside. I have not worked with XML much
before,
but it doesn't look too difficult just from a casual perusal -- and R
has an "XML" package.
Also, do you have a list publications by others who have done things
with these data? I'd like to contact them to find out what tools they
have they'd be willing to share, the priorities they would suggest for a
project like this, etc. The project already exists on R-Forge at
"https://r-forge.r-project.org/R/?group_id=84". Currently, that only
contains a very brief statement of intent. However, that's clear
evidence that I've done, and it's available in an environment that would
support collaboration from others who might be interested in contributing.
I thought I'd first ask interested researchers for their input on
priorities and the circumstances under which they might use and even
contribute to a project like this. I also plan the 41 packages
contributed to the Comprehensive R Archive Network (CRAN) with
"political science" mentioned on a help page. Some of those identify
political science professors, whom I plan to contact with similar
questions. After I've done this, I plan to send a broader invitation to
"R-help at r-project.org" to see if I can get volunteers there. With
a
modest amount of luck, this will generate both advice on the most
important things to consider here AND volunteers to help produce the
tools needed to make it all happen.
Comments?
Best Wishes,
Spencer
p.s. A journey of a thousand miles can be achieved in a year at 3 miles
per day or 20 miles per week.
#######################################
Does the database you identified
(http://www.senate.gov/legislative/Public_Disclosure/LDA_reports.htm)
pertain to all branches of government (Senate, House, executive) or only
the US Senate?
I ask, because I'd like a terse name for the project like
"USSenateLobbying" or just "USlobbying". Which more
accurately
describes these data? Or would you recommend something different? (The
name should not include blank space, though it can include a period
".".)
I recommend we create the desired software using, at least in
part, the free, open source software language R (www.r-project.org). I
propose we structure the code in a "package" to be developed on a
subversion repository, R-Forge (r-forge.r-project.org), and submitted to
the Comprehensive R Archive Network (CRAN). I have substantial
experience with R, CRAN, R package creation including using R-Forge.
Thanks,
Spencer
p.s. R is the language of choice for a large and growing number of
people engaged in new statistical algorithm development, with almost
3700 contributed packages currently downloadable from any of 84 mirrors
in 38 countries. I like it partly because it promotes good development
practices encouraging simultaneous development of documentation and
code. Creating a package on R-Forge makes it easy to involve a team of
volunteers, none of whom ever need to meet face to face. We can start
as soon as we have a name. After initiation, we can notify developers
of other R packages designed for political science applications to seek
their suggestions and possible collaboration. With a little luck we may
be able to obtain help from professors and similar researchers at
Harvard, Stanford and elsewhere.
--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San Jos?, CA 95126
ph: 408-655-4567
web: www.structuremonitoring.com