8 Replies - 40710 Views - Last Post: 10 January 2011 - 09:36 AM Rate Topic: -----

#1 cygnusX  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 159
  • Joined: 19-May 07

How to decode quoted-printable encoded string?

Posted 03 April 2008 - 04:20 AM

Here is decoding of base-64 encoded text but what about quoted-printable?

			Match match = Regex.Match(header, @"=\?(?<charset>.*?)\?(?<encoding>[qQbB])\?(?<value>.*?)\?=");

			if (match.Success)
			{
				string charSet = match.Groups["charset"].Value;
				string encoding = match.Groups["encoding"].Value.ToUpper();
				string value = match.Groups["value"].Value;

				byte[] bytes;

				if (encoding.ToLower().Equals("b")) //if the string is base-64
					bytes = Convert.FromBase64String(value);
			   else //if the string is quoted-printable ???
					

				return Encoding.GetEncoding(charSet).GetString(bytes);
			}
			return header;


Is This A Good Question/Topic? 1

Replies To: How to decode quoted-printable encoded string?

#2 PsychoCoder  Icon User is offline

  • Google.Sucks.Init(true);
  • member icon

Reputation: 1641
  • View blog
  • Posts: 19,853
  • Joined: 26-July 07

Re: How to decode quoted-printable encoded string?

Posted 03 April 2008 - 05:39 AM

What do you mean by "quoted printable"?
Was This Post Helpful? 0
  • +
  • -

#6 cygnusX  Icon User is offline

  • D.I.C Head

Reputation: 7
  • View blog
  • Posts: 159
  • Joined: 19-May 07

Re: How to decode quoted-printable encoded string?

Posted 03 April 2008 - 06:42 AM

This is taken from MIME(Multipurpose Internet Mail Extensions) specification.

"The Quoted-Printable encoding is intended to represent data that largely consists of octets that correspond to printable characters in the US-ASCII character set.It encodes the data in such a way that the resulting octets are unlikely to be modified by mail transport.If the data being encoded are mostly US-ASCII text, the encoded form
of the data remains largely recognizable by humans. A body which is entirely US-ASCII may also be encoded in Quoted-Printable to ensure the integrity of the data should the message pass through a character-translating, and/or line-wrapping gateway...blah blah"

Hm,actually i need to decode text encoded in "Q" encoding,not in "quoted-printable".

The "Q" encoding is similar to the "Quoted-Printable" content-
transfer-encoding defined in RFC 2045. It is designed to allow text
containing mostly ASCII characters to be decipherable on an ASCII
terminal without decoding.

(1) Any 8-bit value may be represented by a "=" followed by two
hexadecimal digits. For example, if the character set in use
were ISO-8859-1, the "=" character would thus be encoded as
"=3D", and a SPACE by "=20". (Upper case should be used for
hexadecimal digits "A" through "F".)

(2) The 8-bit hexadecimal value 20 (e.g., ISO-8859-1 SPACE) may be
represented as "_" (underscore, ASCII 95.). (This character may
not pass through some internetwork mail gateways, but its use
will greatly enhance readability of "Q" encoded data with mail
readers that do not support this encoding.) Note that the "_"
always represents hexadecimal 20, even if the SPACE character
occupies a different code position in the character set in use.

(3) 8-bit values which correspond to printable ASCII characters other
than "=", "?", and "_" (underscore), MAY be represented as those
characters. (But see section 5 for restrictions.) In
particular, SPACE and TAB MUST NOT be represented as themselves
within encoded words.

This post has been edited by cygnusX: 03 April 2008 - 06:50 AM

Was This Post Helpful? 1

#10 klaas114  Icon User is offline

  • New D.I.C Head

Reputation: 2
  • View blog
  • Posts: 1
  • Joined: 08-July 09

Re: How to decode quoted-printable encoded string?

Posted 08 July 2009 - 03:09 AM

Quote

I've added the Q-encoded string function to your code
		private string charSet = "";
		private string DecodeMime(string mimeString)
		{
			// Example: mimeString = "=?Windows-1252?Q?Registered_Member_News=3A_WPC09_to_feature_Windows_7=2C_?==?Windows-1252?Q?_Office=2C_Exchange=2C_more=85?="
			// In this example two Q-encoded strings are defined!
			string encodedString = mimeString;
			string decodedString = "";
			while (encodedString.Length != 0)
			{
				Match match = Regex.Match(encodedString, @"=\?(?<charset>.*?)\?(?<encoding>[qQbB])\?(?<value>.*?)\?=");
				if (match.Success)
				{
					charSet = match.Groups["charset"].Value;
					string encoding = match.Groups["encoding"].Value.ToUpper();
					string value = match.Groups["value"].Value;

					
					if (encoding.ToLower().Equals("b")) //if the string is base-64
					{
						byte[] bytes = Convert.FromBase64String(value);

						decodedString += Encoding.GetEncoding(charSet).GetString(bytes);
					}
					else if (encoding.ToLower().Equals("q")) //if string is Q-encoded
					{
						//parse looking for =XX where XX is hexadecimal
						Regex re = new Regex(
							"(\\=([0-9A-F][0-9A-F]))",
							RegexOptions.IgnoreCase
						);
						decodedString += re.Replace(value, new MatchEvaluator(HexDecoderEvaluator));

						decodedString = decodedString.Replace('_', ' ');
					}
					else
					{
						// SNH No decoder defined
						// Match should NOT be successfull
						return mimeString;
					}
					
					// When multiple entries, subtract the currently decoded part
					encodedString = encodedString.Substring(match.Length);

				}
				else
				{
					// Unable to decode (not mime encoded)
					return mimeString;
				}
			}

			// Successfull
			return decodedString;
		}
		private string HexDecoderEvaluator(Match m)
		{
			
			string hex = m.Groups[2].Value;
			int iHex = Convert.ToInt32(hex, 16);

			// Rerutn the string in the charset defined
			byte[] bytes = new byte[1];
			bytes[0] = Convert.ToByte(iHex);
			return Encoding.GetEncoding(charSet).GetString(bytes);

			// This will not work properly on "=85" in example string
  //		  char c = (char)iHex;
	//		return c.ToString();
		}



Was This Post Helpful? 1

#11 ali_selaidin  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 28
  • Joined: 24-March 09

Re: How to decode quoted-printable encoded string?

Posted 11 September 2009 - 02:51 AM

Hm, for some reason this code cannot decode UTF-8, not sure why. Can someone help me with that? It works for other encodings but not for UTF-8.
Was This Post Helpful? 0
  • +
  • -

#12 Guest_Matthew1471*


Reputation:

Re: How to decode quoted-printable encoded string?

Posted 15 February 2010 - 02:30 PM

View Postali_selaidin, on 11 September 2009 - 01:51 AM, said:

Hm, for some reason this code cannot decode UTF-8, not sure why. Can someone help me with that? It works for other encodings but not for UTF-8.


try utf-8 instead of UTF-8 :)

I have also fixed a small bug in this.

Replace:

// When multiple entries, subtract the currently decoded part
encodedString = encodedString.Substring(match.Length);

With:

// When multiple entries, subtract the currently decoded part
encodedString = encodedString.Substring(match.Index + match.Length+1);

To fix a bug where the regular expression is found in the middle of a string, corrupt the remaining encodedString and this throws out the second .Match which then instead returns your unencoded string in error.
Was This Post Helpful? 0

#13 Guest_Chris*


Reputation:

Re: How to decode quoted-printable encoded string?

Posted 26 March 2010 - 05:09 AM

encodedString = encodedString.Substring(match.Index + match.Length + 1);

in my tests, the last char was \r\n so

encodedString = encodedString.Substring(match.Index + match.Length + 2);

worked
Was This Post Helpful? 0

#14 Guest_Steve*


Reputation:

Re: How to decode quoted-printable encoded string?

Posted 29 September 2010 - 09:33 AM

Updated to use anonymous delegate, which removes the global variable. Also works with un-encoded text before, between, and after encoded-words. Enjoy!

Thanks for doing the groundwork for me! Your initial code proved to be very useful.

public static string DecodeEncodedWordValue(string mimeString)
{
    var regex = new Regex(@"=\?(?<charset>.*?)\?(?<encoding>[qQbB])\?(?<value>.*?)\?=");
    var encodedString = mimeString;
    var decodedString = string.Empty;

    while (encodedString.Length > 0)
    {
        var match = regex.Match(encodedString);
        if (match.Success)
        {
            // If the match isn't at the start of the string, copy the initial few chars to the output
            decodedString += encodedString.Substring(0, match.Index);

            var charset = match.Groups["charset"].Value;
            var encoding = match.Groups["encoding"].Value.ToUpper();
            var value = match.Groups["value"].Value;

            if (encoding.Equals("B"))
            {
                // Encoded value is Base-64
                var bytes = Convert.FromBase64String(value);
                decodedString += Encoding.GetEncoding(charset).GetString(bytes);
            }
            else if (encoding.Equals("Q"))
            {
                // Encoded value is Quoted-Printable
                // Parse looking for =XX where XX is hexadecimal
                var regx = new Regex("(\\=([0-9A-F][0-9A-F]))", RegexOptions.IgnoreCase);
                decodedString += regx.Replace(value, new MatchEvaluator(delegate(Match m)
                                                                      {
                                                                          var hex = m.Groups[2].Value;
                                                                          var iHex = Convert.ToInt32(hex, 16);

                                                                          // Return the string in the charset defined
                                                                          var bytes = new byte[1];
                                                                          bytes[0] = Convert.ToByte(iHex);
                                                                          return Encoding.GetEncoding(charset).GetString(bytes);
                                                                      }));
                decodedString = decodedString.Replace('_', ' ');
            }
            else
            {
                // Encoded value not known, return original string
                // (Match should not be successful in this case, so this code may never get hit)
                decodedString += encodedString;
                break;
            }

            // Trim off up to and including the match, then we'll loop and try matching again.
            encodedString = encodedString.Substring(match.Index + match.Length);
        }
        else
        {
            // No match, not encoded, return original string
            decodedString += encodedString;
            break;
        }
    }
    return decodedString;
}


Was This Post Helpful? 0

#15 Guest_Kula*


Reputation:

Re: How to decode quoted-printable encoded string?

Posted 10 January 2011 - 09:36 AM

Guys,

This code works well for all type of decoding except quoted printable chinese characters (GB18030, GB2312). When I try to decode, I am getting '?????' as output. Other charset's are working fine.

Example quoted printable strings:

=?GB18030?Q?Re: =A8=A4=A1=E4=A1=C1?wueqin@163.com=A6=CC?=A8=AE=A8=BA?t?=

=?GB18030?Q?=D2=AA=D5=CB?=


Thanks,
Kula
Was This Post Helpful? 0

Page 1 of 1