Selecting HTML elements by class name?

using cURL and DOMDocument Object

Page 1 of 1

3 Replies - 5432 Views - Last Post: 27 July 2010 - 02:30 PM Rate Topic: -----

#1 DeathStory  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 43
  • Joined: 13-November 09

Selecting HTML elements by class name?

Posted 27 July 2010 - 11:58 AM

Hello. I am currently working on an iPhone web app for the website MyLifeIsAverage.com . MLIA does not have an open API available yet and the RSS feed is very short, only containing the past 10 or so posts. So I decided it would be best to use cURL to take the actual HTML from the page so that way I have access to every page I want, based off of the URL I grab through cURL.

So here is what my source looks like so far:

<?php

$URL = 'http://mylifeisaverage.com';

$ch = curl_init($URL);                          //initialize cURL
curl_setopt($ch, CURLOPT_HEADER, 0);            //do not return http header in string
curl_setopt($ch, CURLOPT_TIMEOUT, 30);          //time out in 30 seconds
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);    //return HTML as a string
$result = curl_exec($ch);                       //execute the cURL statements

$htmlDoc = new DOMDocument;                     //declare the DOMDocument Object
@$htmlDoc->loadHTML($result);                   //load $result into the DOMDocument Object as HTML

?>


This all executes fine with no errors what so ever. *I have the @ on the last line because it is generating a warning because the HTML from the website isn't perfect* but I can echo the result and I have the front page of the website. But here is where I get lost.

As u can see I'm using the DOMDocument Object to parse my HTML. I chose this because I'm familiar with it as it is my favorite way to parse XML. Now the main section of the HTML I need is this:

<div id="s_2244325" class="story s">
    <div class="sc">
        Today I saw a commercial for the Toy Story 3 video game. All of the people in the commercial playing the video game were teenagers, not young children. MLIA
    </div>
    <div class="sf">
        <span class="left">
            <a href="/story/2244325">#2244325</a> Comments:&nbsp;0 <span class="sv votes">Vote: <a class="up active" href="javascript:vote(2244325,1,'6c086d3490d1d22f183df05cf9381f108ba9ed43b2022d4928c700fa7661381d')">average</a>&nbsp;<span class="v_pos">228</span> <a class="down active" href="javascript:vote(2244325,-1,'6c086d3490d1d22f183df05cf9381f108ba9ed43b2022d4928c700fa7661381d')">meh</a>&nbsp;<span class="v_neg">60</span></span>
        </span>
        <span class="right">
            
            
        </span>
    </div>
</div>


There are about 10-15 more of those on each page each with the only differences being the id on the div tag with the class of "story s" and the text between some of the div tags.

My question is how do I select just the text in between the "<div class="sc">" tags? I have tried multiple things but I always end up with blank responses. Should I just call the getElementsByTagName() function with 'div' as the parameter and know to grab every certain number from the array?

Any help would b much apprectiated! (:

If this were XML or an RSS feed I would already have this figured out...

Is This A Good Question/Topic? 0
  • +

Replies To: Selecting HTML elements by class name?

#2 girasquid  Icon User is offline

  • Barbarbar
  • member icon

Reputation: 108
  • View blog
  • Posts: 1,825
  • Joined: 03-October 06

Re: Selecting HTML elements by class name?

Posted 27 July 2010 - 12:21 PM

You say:

View PostDeathStory, on 27 July 2010 - 10:58 AM, said:

If this were XML or an RSS feed I would already have this figured out...


So..why not treat the output you're working with as XML, and deal with it however you would XML? As a general rule, you can parse (and interact with) HTML as if it's XML.
Was This Post Helpful? 0
  • +
  • -

#3 DeathStory  Icon User is offline

  • New D.I.C Head

Reputation: 3
  • View blog
  • Posts: 43
  • Joined: 13-November 09

Re: Selecting HTML elements by class name?

Posted 27 July 2010 - 12:38 PM

The problem was that the tag I needed was the code i needed was the div tag and that is repeated many many times within each parent and I needed to select one's with a certain class but there is no funtion to do that. I solved my own problem though.

here is the code i came up with :

<?php

$posts = array('1' => 1, '2' => 4, '3' => 7, '4' => 10,
'5' => 13, '6' => 16, '7' => 19, '8' => 21, '9' => 24,
'10' => 27); 

$data = 'http://mylifeisaverage.com';

$ch = curl_init($data);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);


$htmlDoc = new DOMDocument;
@$htmlDoc->loadHTML($result);


$parent = $htmlDoc->getElementById('stories');

foreach($posts as $post) {
echo $parent->getElementsByTagName('div')->item($post)->nodeValue;
echo "<br />";
}

?>

This post has been edited by DeathStory: 27 July 2010 - 12:39 PM

Was This Post Helpful? 0
  • +
  • -

#4 Dormilich  Icon User is offline

  • 痛覚残留
  • member icon

Reputation: 4128
  • View blog
  • Posts: 13,021
  • Joined: 08-June 10

Re: Selecting HTML elements by class name?

Posted 27 July 2010 - 02:30 PM

View Postgirasquid, on 27 July 2010 - 07:21 PM, said:

As a general rule, you can parse (and interact with) HTML as if it's XML.

nope. any XML parser will choke on (nearly all) valid HTML. you may work with the DOM tree as if it were XML, though.

there is a reason why XHTML was created.

This post has been edited by Dormilich: 27 July 2010 - 02:31 PM

Was This Post Helpful? 0
  • +
  • -

Page 1 of 1