Hacker News Saved Stories

Posted on

For the Favorites section of my site I wanted to centralize all the stars/bookmarks I was collecting all over the web. Most sites and services have APIs to make gathering this data easy. Hacker News, not so much.

Luckily Hacker News' markup is pretty simple to navigate. I wanted to share a quick script to grab "Saved Stories" from HN. The script depends on @fabpot's Goutte.

<?php

require_once __DIR__.'/vendor/autoload.php';

$username = 'YourUsername';
$password = 'YourPassword';
$client = new \Goutte\Client();

$baseUrl = 'http://news.ycombinator.com/';
$savedStoriesUrl = $baseUrl . 'saved?id=' . $username;

/** @var $crawler \Symfony\Component\DomCrawler\Crawler */
$crawler = $client->request('GET', $baseUrl);

/** @var $link \Symfony\Component\DomCrawler\Link */
$link    = $crawler->selectLink('login')->link();
$crawler = $client->click($link);

/** @var $form \Symfony\Component\DomCrawler\Form */
$form    = $crawler->selectButton('login')->form();
$crawler = $client->submit($form, array(
    'u' => $username,
    'p' => $password,
));

/** @var $crawler \Symfony\Component\DomCrawler\Crawler */
$crawler = $client->request('GET', $savedStoriesUrl);

/** @var $linkRows \Symfony\Component\DomCrawler\Crawler */
$linkRows = $crawler->filter('td.title a');

/** @var $commentRows \Symfony\Component\DomCrawler\Crawler */
$commentRows = $crawler->filter('td.subtext a');

$stories = array();

foreach ($linkRows as $i => $row) {
    /** @var $row \DOMElement */

    $commentLinkPosition = (($i * 2) + 1);

    /** @var $comment \Symfony\Component\DomCrawler\Crawler */
    $comment = $commentRows->eq($commentLinkPosition);

    if ($comment->count() > 0 && $row->nodeValue != 'More') {
        $stories[] = array(
            'title'       => $row->nodeValue,
            'link'        => $row->getAttribute('href'),
            'comment'     => $comment->text(),
            'commentLink' => $baseUrl . $comment->attr('href'),
        );
    }
}

print_r($stories);