Removing Non-ASCII Characters With RegEx

I got some large XML files that had invalid characters in them and would not process. I ran a script on them to replace non-ASCII characters with empty strings. Here is the snippet.

string source = File.ReadAllText(@"C:\SourceFile.xml");

string dest = Regex.Replace(source, @"s/[^\x20-\x7E|\n|\ |\t]//g", string.Empty);

File.WriteAllText(@"C:\DestFile.xml", dest);

For the record, I got the RegEx here

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s