Dear all I am now going to do some text analysis using R. However, the data is very noisy that I need to clean it first. I don't have much experience in the text cleaning process. Is anyone would provide help on this? If you are able to provide some similar code which was done before would be greatly appreciated. May content is mainly the Feedback data through *Phone call record*: (usally the structure looks like the below one) *Email:* the common email corresponding , usually got a lot of history , and also some footnote such as "if you are not the intended reciepient... " etal.. I know it's quite a complex problem and can not be solved by a single answer,so, some tips is also very good, I will ...... One example of the data: ######################################### Fyna. <g-ccdfa at adfae.com> 24/06/2012 09:15 AM To <g-ccdfa at adfae.com> cc <g-ccdfa at adfae.com> Subject ase Mewrr asdffID:dde_20120624_15988015_11653024 * (keep this part)* CUSTOMER DETAILS Name : Mr dffa Company : da Address : ff Home No. : Office No. : Payphone Ext : Mobile No. : Fax No. : Email : CASE DETAILS Division : * dsaf (RIM) (keep this part)* Category 1 : * dsaf (RIM) (keep this part)* Category 2 : * dsaf (RIM) (keep this part)* Category 3 : Veh Reg Num : COMMENTS 24/06/2012 09:15:23 AM (Name) - Location @Ddaferdsdaf Rd Caller feedback Content.. ("*This part I need to keep*") NFORMANT STATES Date & Time : 24/06/2012 09:15:31 AM CSO ID : dasf https://MSCCasdfEB/LsdfA/Madsf.htm?pardsnDc?0pAsdoE9.=cS0eiIcp9m ############################################################ -- View this message in context: http://r.789695.n4.nabble.com/clean-Email-format-data-tp4634491.html Sent from the R help mailing list archive at Nabble.com.