Unleash the power of node.js for Shell Scripting (Part 2)

Ready for our first proper node.js Script!


In a previous post, we learned about some tools that helped us create a script in node.js. It is now time to put this into practice by implementing a script that connects to a few online newspapers, searches in the news for specific keywords and returns those articles.

Our new script will need to accept the following parameters:

    • A file with the list of newspapers (one URL per line)
    • A file with a list of keywords (a keyword per line)

First, let’s create the following files: news_watcher.js and package.json. Make sure you remember to add the execution rights to your file. We will use three external modules and make sure they are added to our package.json (see Part 1 for details).

The initial package.json should look like this in its empty state:

"name": "news­logger",
"version": "0.1.0",
"description": "Access multiple newspapers and find news using specific keywords",
"author": "Raul Martin"

Then, you need to add the dependencies as follows:

npm install cheerio ­­save
npm install request ­­save
npm install commander ­­save

As you can see, we will use the Cheerio, Request and Commander modules. You already know about Commander (see Part 1 if you don’t). We’ll use Request to easily access content from URLs (based on a callback function). Finally, Cheerio is a great library that creates a DOM from a string and allows you to use some JQuery functionalities from then on. It can be very useful to manipulate HTML and web scraping.

Here’s what I came up with:

#!/usr/bin/env node
/*jshint node: true */
"use strict";

//get the external modules
var fs = require("fs"),
  request = require('request'),
  cheerio = require('cheerio'),
  program = require('commander');

//set the params options
  .usage('­­newspapers newspapers.txt ­­keywords keywords.txt')
  .option('­n, ­­newspapers ', 'Newspapers list separated by \'\\n\'')
  .option('­k, ­­keywords ', 'Keywords list separated by \'\\n\'')

var newspapersFile = program.newspapers,
  keywordsFile = program.keywords;

if (!newspapersFile || !keywordsFile) {

var file2Array = function(fileName){
  return (
    fs.readFileSync(fileName, "utf8")
      .map(function(value) {
        return value.trim();
        return !!element;

var getFullUrl = function(baseUrl, url) {
  url = url.trim();
  if (url.indexOf("http") !== 0) {
    url = baseUrl + url;
  return url;

//The script

var duplicateControl = {},
  completed_request = 0,
  result = [],
  newspapers = file2Array(newspapersFile),
  keywords = file2Array(keywordsFile)
    .map(function(value) {
      return value.toLowerCase();

var addNews = function(title, url, keyword) {
  if (!duplicateControl[url]) {
    duplicateControl[url] = true;
      'url': url,
      'title': title.trim(),
      'keyword': keyword

var processRequest = function(url, error, response, html) {
  if (!error && response.statusCode === 200) {
    var $ = cheerio.load(html);

    $('a').each(function() {
      var a = $(this),
        text = a.text().toLowerCase(),
        href = a.attr('href');

    if (!href) {

    href = getFullUrl(url, href);

  //using every to stop after I match with a keyword
  keywords.every(function(keyword) {
    if (text.indexOf(keyword) !== ­1) {
      addNews(a.text(), href, keyword);
      return false;
    return true;
  if (completed_request === newspapers.length) {

  request(url, processRequest.bind(this, url));

To be able to test this script, we can start with the following input files:





And finally, here is what happens when you run it!


Usage: news_watcher ­­newspapers newspapers.txt ­­keywords keywords.txt


­h, ­­--help output usage information
­V, ­­--version output the version number
­n, ­­--newspapers  Newspapers list separated by '\n'
­k, --­­keywords  Keywords list separated by '\n'

./news_watcher.js ­n newspapers.txt ­k keywords.txt

[ { url:



Potential Improvements

      1. You can add a library for promises. Then you won’t have to use the hacky condition to print the results, for example with promised­io.
      2. Maybe, you can directly print the result in a human readable form. I like printing them as an array so that I can re­use it in another node.js script as input (see Pipes section).
      3. The newspapers.txt could be a structure with the URL and the specific link selector to get only the news section of the newspaper (less noise).
      4. You should consider error handling.
      5. You can add for the Logentries logger in order to get your logs in your Logentries account (https://github.com/logentries/le_node).


If you are need a scripting language to run from your command line and you feel strong using Node.js, I think I have given you a really interesting option to create, run and even share your scripts. Practically speaking, I use this a lot to create quick tests and benchmarking scripts as I know I can leverage javascript capabilities fast and bring complex algorithms to my shell.

The other great aspect is to be able to use all those external modules: npm oficial page.

Here is a list of the ones that I like and use regularly:

Finally, if you ever end up doing katas to improve your program skills, this is a really nice way to get going fast! (don’t forget your unit tests).

Ready to start getting insights from your applications? Sign up for a Logentries free trial today.

Tagged with: , ,
Posted in Node.js

Leave a Reply