Hi all, Sorry for the rather uninformative subject, but the error I get is not very informative either. When using the XML and RCurl package to retrieve the content of an html page, htmlTreeParse fails, printing out the beginning of the HTML: Error in htmlTreeParse(getURL(url)) : File <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de"> <head> <title>Deutsches Krebsforschungszentrum</title> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" /> <meta http-equiv="Content-Style-Type" content="text/css" /> <meta http-equiv="imagetoolbar" content="no" /> <meta name="MSSmartTagsPreventParsing" content="true" /> <meta name="revisit-after" content="5 days" /> <meta name="language" content="de" /> <meta lang="de" content="" xml:lang="de" name="keywords"> <meta lang="de" xml:lang="de" name="description" content="Das Deutsche Krebsforschungszentrum hat die Aufgabe, die Mechanismen der Krebsentstehung systematisch zu erforschen und Risikofaktoren f??r Krebserkrankungen zu erfassen. Aus den Ergebnissen dieser grundlegenden Arbeiten sollen neue Ans? This code reproduces the error: library(RCurl) library(XML) url <- "www.dkfz.de/en/genetics/pages/projects/bioinformatics/Custom_Chip_Definition_File.html" htmlTreeParse(getURL(url)) The issue seems to originate in htmlTreeParse as getURL alone works and returns the expected content. I checked that it could not be an encoding issue and as far as I can tell it seems not to be. Moreover, using htmlParse(paste("http://",url,sep="") works. Note that htmlTreeParse(getURL(paste("http://",url,sep=""))) fails too, the "http://" is important only for htmlParse, so that it identifies it as an URL. This issue is rather new, and as I've been using the same version of XML and RCurl, I suppose it might have to do with some of the content of the website having been updated, but given the error, I can't quite figure out what is raising it. Although it works on that simple example, using htmlParse is not really a work around, as I need to use additional arguments in the getURL call (such as userpwd), which I can't provide to htmlParse. Any hints would be greatly appreciated, Cheers, Nico sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_3.9-4 RCurl_1.91-1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] tools_2.15.0 --------------------------------------------------------------- Nicolas Delhomme Nathaniel Street Lab Department of Plant Physiology Ume? Plant Science Center Tel: +46 90 786 7989 Email: nicolas.delhomme at plantphys.umu.se SLU - Ume? universitet Ume? S-901 87 Sweden ---------------------------------------------------------------