3 Replies - 2007 Views - Last Post: 11 July 2009 - 12:43 PM Rate Topic: -----

#1 KuroTsuto  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 42
  • View blog
  • Posts: 182
  • Joined: 13-February 09

[Solved...ish] MIME Decoding... And Such

Posted 10 July 2009 - 03:52 PM

Hey, this project is one of a personal nature and has nothing to do with work or school. The only code in question is that which already belongs to a PHP class freely distributed as freeware, and thus I have not any code to display. If you would like, however, I may post the class's code.

All that jazz aside, I'll tell you about my problem. For my project, I am looking to parse the data of an email retrieved via the IMAP protocol into an array containing the headers and their data as well as the main body of the message and the email's associated attachments... Crazy shwag, right?

Currently I am using the IMAP and MIME Decode classes by Harish Chauhan... I believe that this would be the simplest solution to my challenge (assuming that I could actually get this working right). The classes are fairly beautiful in themselves, however I'm having horrible issues with the MIME Decode class in that I cannot get it to return the headers in an array, even when I have tried the code provided by Harish himself. The classes were apparantly written about three years ago or so, so it could simply be that some of the functions are no longer compatible with the newer versions of PHP (or the newer PHPs' default settings), but I don't really have the experience to track down the source of this issue and correct it.

And thus my question: both of Chauhan's classes have virtually no documentation beyond the comments in the code, and many of the comments are totally sketch and unhelpful; try as I might to find a good tutorial or better documentation on the net, my attempts have failed miserably to turn up anything worthwhile. Is anyone aware of a functional integration/example of these classes? Or, alternately, I'm totally open to other alternatives... I would be happy to use PHP's built-in IMAP functions along with some other MIME decoder/parser if someone could point me towards a good one.

Thanks!
~KuroTsuto

This post has been edited by KuroTsuto: 11 July 2009 - 12:43 PM


Is This A Good Question/Topic? 0
  • +

Replies To: [Solved...ish] MIME Decoding... And Such

#2 Martyr2  Icon User is offline

  • Programming Theoretician
  • member icon

Reputation: 4405
  • View blog
  • Posts: 12,262
  • Joined: 18-April 07

Re: [Solved...ish] MIME Decoding... And Such

Posted 11 July 2009 - 11:18 AM

Well to help us out you can try showing us the mime decode class along with any error messages you are getting. We understand that it may not be working but we need to know how it is not working and PHP errors can tell us that. Also if you are attempting to alter his code to return this array, show us the changes you are trying to make.

Sounds like you have plenty of code you can show us. ;)
Was This Post Helpful? 0
  • +
  • -

#3 KuroTsuto  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 42
  • View blog
  • Posts: 182
  • Joined: 13-February 09

Re: [Solved...ish] MIME Decoding... And Such

Posted 11 July 2009 - 12:36 PM

Aye, the code there is... But the class itself I have left unmodified, returning such an array is supposed to be a part of its functionality, and when it is executed I receive not a single PHP error! Bummer, right? Anyway, here's the MIME Decode class (note that the original author's comments aren't always correct...):

[code] class MIMEDECODE
{

/**
* The raw email to decode
* @var string
*/
var $_input;

/**
* The header part of the input
* @var string
*/
var $_header;

/**
* The body part of the input
* @var string
*/
var $_body;

/**
* If an error occurs, this is used to store the message
* @var string
*/
var $_error;

/**
* Flag to determine whether to include bodies in the
* returned object.
* @var boolean
*/
var $_include_bodies;

/**
* Flag to determine whether to decode bodies
* @var boolean
*/
var $_decode_bodies;

/**
* Flag to determine whether to decode headers
* @var boolean
*/
var $_decode_headers;

/**
* If invoked from a class, $this will be set. This has problematic
* connotations for calling decode() statically. Hence this variable
* is used to determine if we are indeed being called statically or
* via an object.
*/
var $mailMimeDecode;

/**
* Constructor.
*
* Sets up the object, initialise the variables, and splits and
* stores the header and body of the input.
*
* @param string The input to decode
* @access public
*/
function MIMEDECODE($input)
{
list($header, $body) = $this->_split_body_header($input);

$this->_input = $input;
$this->_header = $header;
$this->_body = $body;
$this->_decode_bodies = true;
$this->_include_bodies = true;

$this->mailMimeDecode = true;
}

/**
* Begins the decoding process. If called statically
* it will create an object and call the decode() method
* of it.
*
* @param array An array of various parameters that determine
* various things:
* include_bodies - Whether to include the body in the returned
* object.
* decode_bodies - Whether to decode the bodies
* of the parts. (Transfer encoding)
* decode_headers - Whether to decode headers
* input - If called statically, this will be treated
* as the input
* @return object Decoded results
* @access public
*/
function decode($params = null)
{

// Have we been called statically? If so, create an object and pass details to that.
if (!isset($this->mailMimeDecode) AND isset($params['input'])) {

$obj = new MIMEDECODE($params['input']);
$structure = $obj->decode($params);

// Called statically but no input
} elseif (!isset($this->mailMimeDecode)) {
return $this->_error='Called statically and no input given';

// Called via an object
} else {

//$this->_include_bodies = isset($params['include_bodies']) ? $params['include_bodies'] : false;
//$this->_decode_bodies = isset($params['decode_bodies']) ? $params['decode_bodies'] : false;
//$this->_decode_headers = isset($params['decode_headers']) ? $params['decode_headers'] : false;

$structure = $this->_decode($this->_header, $this->_body);
if ($structure === false) {
$structure =$this->_error;
}
}

return $structure;
}

/**
* Performs the decoding. Decodes the body string passed to it
* If it finds certain content-types it will call itself in a
* recursive fashion
*
* @param string Header section
* @param string Body section
* @return object Results of decoding process
* @access private
*/
function _decode($headers, $body, $default_ctype = 'text/plain')
{
$return = new stdClass;
$headers = $this->_parseHeaders($headers);

foreach ($headers as $value) {
if (isset($return->headers[strtolower($value['name'])]) AND !is_array($return->headers[strtolower($value['name'])])) {
$return->headers[strtolower($value['name'])] = array($return->headers[strtolower($value['name'])]);
$return->headers[strtolower($value['name'])][] = $value['value'];

} elseif (isset($return->headers[strtolower($value['name'])])) {
$return->headers[strtolower($value['name'])][] = $value['value'];

} else {
$return->headers[strtolower($value['name'])] = $value['value'];
}
}

reset($headers);
while (list($key, $value) = each($headers)) {
$headers[$key]['name'] = strtolower($headers[$key]['name']);
switch ($headers[$key]['name']) {

case 'content-type':
$content_type = $this->_parseHeaderValue($headers[$key]['value']);

if (preg_match('/([0-9a-z+.-]+)\/([0-9a-z+.-]+)/i', $content_type['value'], $regs)) {
$return->ctype_primary = $regs[1];
$return->ctype_secondary = $regs[2];
}

if (isset($content_type['other'])) {
while (list($p_name, $p_value) = each($content_type['other'])) {
$return->ctype_parameters[$p_name] = $p_value;
}
}
break;

case 'content-disposition';
$content_disposition = $this->_parseHeaderValue($headers[$key]['value']);
$return->disposition = $content_disposition['value'];
if (isset($content_disposition['other'])) {
while (list($p_name, $p_value) = each($content_disposition['other'])) {
$return->d_parameters[$p_name] = $p_value;
}
}
break;

case 'content-transfer-encoding':
$content_transfer_encoding = $this->_parseHeaderValue($headers[$key]['value']);
break;
}
}

if (isset($content_type)) {
switch (trim(strtolower($content_type['value']))) {
case 'text/plain':
$encoding = isset($content_transfer_encoding) ? $content_transfer_encoding['value'] : '7bit';
$this->_include_bodies ? $return->body = ($this->_decode_bodies ? $this->_decodeBody($body, $encoding) : $body) : null;
break;
case 'text/html':
$encoding = isset($content_transfer_encoding) ? $content_transfer_encoding['value'] : '7bit';
$this->_include_bodies ? $return->body = ($this->_decode_bodies ? $this->_decodeBody($body, $encoding) : $body) : null;
break;

case 'multipart/parallel':
case 'multipart/report': // RFC1892
case 'multipart/signed': // PGP
case 'multipart/digest':
case 'multipart/alternative':
case 'multipart/related':
case 'multipart/mixed':
if(!isset($content_type['other']['boundary'])){
$this->_error = 'No boundary found for ' . $content_type['value'] . ' part';
return false;
}

$default_ctype = (strtolower($content_type['value']) === 'multipart/digest') ? 'message/rfc822' : 'text/plain';

$parts = $this->_boundarySplit($body, $content_type['other']['boundary']);
for ($i = 0; $i < count($parts); $i++) {
list($part_header, $part_body) = $this->_split_body_header($parts[$i]);
$part = $this->_decode($part_header, $part_body, $default_ctype);
if($part === false)
$part = $this->raiseError($this->_error);
$return->parts[] = $part;
}
break;

case 'message/rfc822':
$obj = &new MIMEDECODE($body);
$return->parts[] = $obj->decode(array('include_bodies' => $this->_include_bodies));
unset($obj);
break;

default:
if(!isset($content_transfer_encoding['value']))
$content_transfer_encoding['value'] = '7bit';
$this->_include_bodies ? $return->body = ($this->_decode_bodies ? $this->_decodeBody($body, $content_transfer_encoding['value']) : $body) : null;
break;
}

} else {
$ctype = explode('/', $default_ctype);
$return->ctype_primary = $ctype[0];
$return->ctype_secondary = $ctype[1];
$this->_include_bodies ? $return->body = ($this->_decode_bodies ? $this->_decodeBody($body) : $body) : null;
}

return $return;
}

/**
* Given the output of the above function, this will return an
* array of references to the parts, indexed by mime number.
*
* @param object $structure The structure to go through
* @param string $mime_number Internal use only.
* @return array Mime numbers
*/
function &getMimeNumbers(&$structure, $no_refs = false, $mime_number = '', $prepend = '')
{
$return = array();
if (!empty($structure->parts)) {
if ($mime_number != '') {
$structure->mime_id = $prepend . $mime_number;
$return[$prepend . $mime_number] = &$structure;
}
for ($i = 0; $i < count($structure->parts); $i++) {


if (!empty($structure->headers['content-type']) AND substr(strtolower($structure->headers['content-type']), 0, 8) == 'message/') {
$prepend = $prepend . $mime_number . '.';
$_mime_number = '';
} else {
$_mime_number = ($mime_number == '' ? $i + 1 : sprintf('%s.%s', $mime_number, $i + 1));
}

$arr = &MIMEDECODE::getMimeNumbers($structure->parts[$i], $no_refs, $_mime_number, $prepend);
foreach ($arr as $key => $val) {
$no_refs ? $return[$key] = '' : $return[$key] = &$arr[$key];
}
}
} else {
if ($mime_number == '') {
$mime_number = '1';
}
$structure->mime_id = $prepend . $mime_number;
$no_refs ? $return[$prepend . $mime_number] = '' : $return[$prepend . $mime_number] = &$structure;
}

return $return;
}

/**
* Given a string containing a header and body
* section, this function will split them (at the first
* blank line) and return them.
*
* @param string Input to split apart
* @return array Contains header and body section
* @access private
*/
function _split_body_header($input)
{
if (preg_match("/^(.*?)\r?\n\r?\n(.*)/s", $input, $match)) {
return array($match[1], $match[2]);
}
$this->_error = 'Could not split header and body';
return false;
}

/**
* Parse headers given in $input and return
* as assoc array.
*
* @param string Headers to parse
* @return array Contains parsed headers
* @access private
*/
function _parseHeaders($input)
{

if ($input !== '') {
// Unfold the input
$input = preg_replace("/\r?\n/", "\r\n", $input);
$input = preg_replace("/\r\n(\t| )+/", ' ', $input);
$headers = explode("\r\n", trim($input));

foreach ($headers as $value) {
$hdr_name = substr($value, 0, $pos = strpos($value, ':'));
$hdr_value = substr($value, $pos+1);
if($hdr_value[0] == ' ')
$hdr_value = substr($hdr_value, 1);

$return[] = array(
'name' => $hdr_name,
'value' => $this->_decode_headers ? $this->_decodeHeader($hdr_value) : $hdr_value
);
}
} else {
$return = array();
}

return $return;
}

/**
* Function to parse a header value,
* extract first part, and any secondary
* parts (after;) This function is not as
* robust as it could be. Eg. header comments
* in the wrong place will probably break it.
*
* @param string Header value to parse
* @return array Contains parsed result
* @access private
*/
function _parseHeaderValue($input)
{

if (($pos = strpos($input, ';')) !== false) {

$return['value'] = trim(substr($input, 0, $pos));
$input = trim(substr($input, $pos+1));

if (strlen($input) > 0) {

// This splits on a semi-colon, if there's no preceeding backslash
// Can't handle if it's in double quotes however. (Of course anyone
// sending that needs a good slap).
$parameters = preg_split('/\s*(?<!\\\\);\s*/i', $input);

for ($i = 0; $i < count($parameters); $i++) {
$param_name = substr($parameters[$i], 0, $pos = strpos($parameters[$i], '='));
$param_value = substr($parameters[$i], $pos + 1);
if ($param_value[0] == '"') {
$param_value = substr($param_value, 1, -1);
}
$return['other'][$param_name] = $param_value;
$return['other'][strtolower($param_name)] = $param_value;
}
}
} else {
$return['value'] = trim($input);
}

return $return;
}

/**
* This function splits the input based
* on the given boundary
*
* @param string Input to parse
* @return array Contains array of resulting mime parts
* @access private
*/
function _boundarySplit($input, $boundary)
{
$boundary=trim($boundary);
if(substr($boundary,-1)=='"')
$boundary=substr_replace($boundary,"" ,-1 );
if(substr($boundary,0,1)=='"')
$boundary=substr_replace($boundary,"" ,0,1);
$boundary=trim($boundary);
$tmp = explode('--'.$boundary,$input);//boundary
for ($i=1; $i<count($tmp)-1; $i++) {
$parts[] = $tmp[$i];
}

return $parts;
}

/**
* Given a header, this function will decode it
* according to RFC2047. Probably not *exactly*
* conformant, but it does pass all the given
* examples (in RFC2047).
*
* @param string Input header value to decode
* @return string Decoded header value
* @access private
*/
function _decodeHeader($input)
{
// Remove white space between encoded-words
$input = preg_replace('/(=\?[^?]+\?(q|b)\?[^?]*\?=)(\s)+=\?/i', '\1=?', $input);

// For each encoded-word...
while (preg_match('/(=\?([^?]+)\?(q|b)\?([^?]*)\?=)/i', $input, $matches)) {

$encoded = $matches[1];
$charset = $matches[2];
$encoding = $matches[3];
$text = $matches[4];

switch (strtolower($encoding)) {
case 'b':
$text = base64_decode($text);
break;

case 'q':
$text = str_replace('_', ' ', $text);
preg_match_all('/=([a-f0-9]{2})/i', $text, $matches);
foreach($matches[1] as $value)
$text = str_replace('='.$value, chr(hexdec($value)), $text);
break;
}

$input = str_replace($encoded, $text, $input);
}

return $input;
}

/**
* Given a body string and an encoding type,
* this function will decode and return it.
*
* @param string Input body to decode
* @param string Encoding type to use.
* @return string Decoded body
* @access private
*/
function _decodeBody($input, $encoding = '7bit')
{
switch ($encoding) {
case '7bit':
return $input;
break;

case 'quoted-printable':
return $this->_quotedPrintableDecode($input);
break;

case 'base64':
return base64_decode($input);
break;

default:
return $input;
}
}

/**
* Given a quoted-printable string, this
* function will decode and return it.
*
* @param string Input body to decode
* @return string Decoded body
* @access private
*/
function _quotedPrintableDecode($input)
{
// Remove soft line breaks
$input = preg_replace("/=\r?\n/", '', $input);

// Replace encoded characters
$input = preg_replace('/=([a-f0-9]{2})/ie', "chr(hexdec('\\1'))", $input);

return $input;
}

/**
* Checks the input for uuencoded files and returns
* an array of them. Can be called statically, eg:
*
* $files =& MIMEDECODE::uudecode($some_text);
*
* It will check for the begin 666 ... end syntax
* however and won't just blindly decode whatever you
* pass it.
*
* @param string Input body to look for attahcments in
* @return array Decoded bodies, filenames and permissions
* @access public
* @author Unknown
*/
function &uudecode($input)
{
// Find all uuencoded sections
preg_match_all("/begin ([0-7]{3}) (.+)\r?\n(.+)\r?\nend/Us", $input, $matches);

for ($j = 0; $j < count($matches[3]); $j++) {

$str = $matches[3][$j];
$filename = $matches[2][$j];
$fileperm = $matches[1][$j];

$file = '';
$str = preg_split("/\r?\n/", trim($str));
$strlen = count($str);

for ($i = 0; $i < $strlen; $i++) {
$pos = 1;
$d = 0;
$len=(int)(((ord(substr($str[$i],0,1)) -32) - ' ') & 077);

while (($d + 3 <= $len) AND ($pos + 4 <= strlen($str[$i]))) {
 &nb
Was This Post Helpful? 0
  • +
  • -

#4 KuroTsuto  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 42
  • View blog
  • Posts: 182
  • Joined: 13-February 09

Re: [Solved...ish] MIME Decoding... And Such

Posted 11 July 2009 - 12:43 PM

Hmm... that post cut off about half my message, and a good chunk of the class... Bummer. But I believe I found a solution:

$mimedecoder = new MIMEDECODE($email,'\r\n');
$mimedecoder->decode();
print_r($msg->headers);
$msg=$mimedecoder->get_parsed_message();
print_r($msg);



That probably doesn't mean all that much since you only got to see about half of the class, but maybe it will help someone in the future ;). I don't know why it works, but it does... Apparently, even though the function get_parsed_message() invokes the function decode(), get_parsed_message only seems to return the body while decode() parses the mail into an array... Bizarre, but it will work. I sitll might look for a simpler solution.

Cheers, and thanks :)
~KuroTsuto
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1