arturs.onzuls at gmail.com
2010-Nov-17 18:19 UTC
[R] Working with "necessary" columns in R (CSV)
Hi all. It will be great if some one will help me to solve my home task. So, the deal : i have .pcap file, i convert it to csv using tcpdump (tcpdump -tt -n -r x.pcap > x.csv) CSV file looks like that : 12890084,761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: UDP, length 12 12890084,761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: TCP, length 12 12890084,761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: HTML, length 12 ... 100000 rows. Now, i need to open csv in R, and solve 5 problems, but i need to work only with "UDP" packets (not TCP,HTMP...). For example i need to count how many "UDP" packets are there, max and min time in UDP and so on. I see only two answers.. i need to scan (but how?) for "UDP" or i need to separate this csv, cut only needed rows, and work with them. Please help. [[alternative HTML version deleted]]
On Wed, Nov 17, 2010 at 08:19:53PM +0200, arturs.onzuls at gmail.com wrote:> Hi all. It will be great if some one will help me to solve my home task. So, > the deal : i have .pcap file, i convert it to csv using tcpdump (tcpdump -tt > -n -r x.pcap > x.csv) > > CSV file looks like that : > > 12890084,761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: UDP, length 12 > 12890084,761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: TCP, length 12 > 12890084,761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: HTML, length 12 > ... > > 100000 rows. > > Now, i need to open csv in R, and solve 5 problems, but i need to work only > with "UDP" packets (not TCP,HTMP...). For example i need to count how many > "UDP" packets are there, max and min time in UDP and so on. I see only two > answers.. i need to scan (but how?) for "UDP" or i need to separate this > csv, cut only needed rows, and work with them. Please help.You can read the file into R and extract only UDP rows for example all <- read.csv("x.csv", stringsAsFactors=FALSE, header=FALSE) # assuming there is no header udp <- all[grep(" UDP$", all[, 2]), ] Using concatenation of three copies of your 3 rows, we get all V1 V2 V3 1 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: UDP length 12 2 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: TCP length 12 3 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: HTML length 12 4 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: UDP length 12 5 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: TCP length 12 6 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: HTML length 12 7 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: UDP length 12 8 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: TCP length 12 9 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: HTML length 12 udp V1 V2 V3 1 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: UDP length 12 4 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: UDP length 12 7 12890084 761659 IP 10.10.20.20.47808 > 10.10.20.255.47808: UDP length 12 Note that there are three columns only, since your input had only three fields per line. If you change the export to .csv so that, for example, column 2 contains only the protocol name, you could use table(all[, 2]) to get the number of occurrences of each protocol or sum(all[, 2] == "UDP") to get the number of UDP rows or udp <- all[all[, 2] == "UDP", ] to extract only UDP rows. If you cannot change the export to .csv, you can use the function strsplit(). Petr Savicky.