santiago gil
2013-Apr-12 19:20 UTC
[R] Problem with handling of attributes in xmlToList in XML package
Hello all, I have a problem with the way attributes are dealt with in the function xmlToList(), and I haven't been able to figure it out for days now. Say I have a document (produced by nmap) like this:> mydoc <- '<host starttime="1365204834" endtime="1365205860"><status state="up" reason="echo-reply" reason_ttl="127"/><address addr="XXX.XXX.XXX.XXX" addrtype="ipv4"/> <ports><port protocol="tcp" portid="135"><state state="open" reason="syn-ack" reason_ttl="127"/><service name="msrpc" product="Microsoft Windows RPC" ostype="Windows" method="probed" conf="10"><cpe>cpe:/o:microsoft:windows</cpe></service></port> <port protocol="tcp" portid="139"><state state="open" reason="syn-ack" reason_ttl="127"/><service name="netbios-ssn" method="probed" conf="10"/></port> </ports> <times srtt="647" rttvar="71" to="100000"/> </host>' I want to store this as a list of lists, so I do: mytree<-xmlTreeParse(mydoc) myroot<-xmlRoot(mytree) mylist<-xmlToList(myroot) Now my problem is that when I want to fetch the attributes of the services running of each port, the behavior is not consistent:> mylist[["ports"]][[1]][["service"]]$.attrs["name"]name "msrpc"> mylist[["ports"]][[2]][["service"]]$.attrs["name"]Error in trash_list[["ports"]][[2]][["service"]]$.attrs : $ operator is invalid for atomic vectors I understand that the way they are dfined in the documnt is not the same, but I think there still should be a consistent behavior. I've tried many combination of parameters for xmlTreeParse() but nothing has helped me. I can't find a way to call up the name of the service consistently regardless of whether the node has children or not. Any tips? All the best, S.G.
santiago gil
2013-Apr-14 18:09 UTC
[R] Problem with handling of attributes in xmlToList in XML package
Hello all, I have a problem with the way attributes are dealt with in the function xmlToList(), and I haven't been able to figure it out for days now. Say I have a document (produced by nmap) like this:> mydoc <- '<host starttime="1365204834" endtime="1365205860"><status state="up" reason="echo-reply" reason_ttl="127"/><address addr="XXX.XXX.XXX.XXX" addrtype="ipv4"/> <ports><port protocol="tcp" portid="135"><state state="open" reason="syn-ack" reason_ttl="127"/><service name="msrpc" product="Microsoft Windows RPC" ostype="Windows" method="probed" conf="10"><cpe>cpe:/o:microsoft:windows</cpe></service></port> <port protocol="tcp" portid="139"><state state="open" reason="syn-ack" reason_ttl="127"/><service name="netbios-ssn" method="probed" conf="10"/></port> </ports> <times srtt="647" rttvar="71" to="100000"/> </host>' I want to store this as a list of lists, so I do: mytree<-xmlTreeParse(mydoc) myroot<-xmlRoot(mytree) mylist<-xmlToList(myroot) Now my problem is that when I want to fetch the attributes of the services running of each port, the behavior is not consistent:> mylist[["ports"]][[1]][["service"]]$.attrs["name"]name "msrpc"> mylist[["ports"]][[2]][["service"]]$.attrs["name"]Error in trash_list[["ports"]][[2]][["service"]]$.attrs : $ operator is invalid for atomic vectors I understand that the way they are dfined in the documnt is not the same, but I think there still should be a consistent behavior. I've tried many combination of parameters for xmlTreeParse() but nothing has helped me. I can't find a way to call up the name of the service consistently regardless of whether the node has children or not. Any tips? All the best, S.G. -- ------------------------------------------------------------------------------- http://barabasilab.neu.edu/people/gil/