I have a node.js application that is talking HTTPS with some server I don't have access to... How can I capture the traffic so that I can see the contents of the requests and responses?

Yes, this has become a series! Welcome to part 2, where we focus on node.js applications. If you haven't already, now would be a great time to skim the introduction in Part 1, as I'm not going to repeat it here.

The history

When I was first trying to solve this problem, I came across this GitHub issue on the nodejs github repository: Optionally log master secrets for TLS connections. It was a feature request for the developers of node.js to implement this functionality.

One user, jhford, had published a potential solution in his comment:

https://github.com/jhford/node-https-wireshark

The solution looked promising, but no one had posted an example of how to use it. Of course, I set out to try.

My experience

At the time, I was trying to help my development teams get a grasp on some seemingly random authentication issues they were experiencing between yarn and a private NPM registry. (For the un-initiated, yarn is essentially a souped up version of the npm command line tool with added security features.)

I thought that if I could see the requests flowing between yarn on our Jenkins CI server, and the artifactory server that hosts the NPM registry, I could nail down whether the failure was yarn's fault or artifactory's fault.

So, I wrote a script that I could invoke inside the Jenkins build to attempt to inject Mr. jhford's code into yarn and start a wire capture with tcpdump before yarn was invoked. Of course, it was a dirty hack, so I had to give it a cool-sounding name:

yarnshark.sh

#!/bin/bash 

YARN_RUNTIME_LOCATION="$(dirname "$(readlink -f "$(which yarn)")")"

if [ ! -f "$YARN_RUNTIME_LOCATION/sslkeylogger.js" ]; then

  curl -s "https://raw.githubusercontent.com/forestjohnsonpeoplenet/node-https-wireshark/master/index.js" > "$YARN_RUNTIME_LOCATION/sslkeylogger.js"

fi
cp "$YARN_RUNTIME_LOCATION/yarn.js" "$YARN_RUNTIME_LOCATION/yarn.js.bak"

YARN_CLI_LINE_NUMBER="$(cat "$YARN_RUNTIME_LOCATION/yarn.js" | grep -n -e "^ *var cli = require" | sed "s/\\([0-9][0-9]*\\):.*/\\1/")"
YARN_CLI_LINE_NUMBER=$(($YARN_CLI_LINE_NUMBER - 1))

FIRST_HALF=$(cat "$YARN_RUNTIME_LOCATION/yarn.js" | head -n $YARN_CLI_LINE_NUMBER)
LAST_HALF=$(cat "$YARN_RUNTIME_LOCATION/yarn.js" | tail -n +$(($YARN_CLI_LINE_NUMBER + 1)) ) 

echo "$FIRST_HALF" > "$YARN_RUNTIME_LOCATION/yarn.js"
echo "require(\"./sslkeylogger\")" >> "$YARN_RUNTIME_LOCATION/yarn.js"
echo "console.log(\"This yarn is logging HTTPS session keys using https://github.com/forestjohnsonpeoplenet/node-https-wireshark\")" >> "$YARN_RUNTIME_LOCATION/yarn.js"
echo "$LAST_HALF" >> "$YARN_RUNTIME_LOCATION/yarn.js"

#echo "$YARN_RUNTIME_LOCATION/yarn.js"
#cat "$YARN_RUNTIME_LOCATION/yarn.js"

tcpdump -i any -s 65535 -w yarn.pcap &

TCPDUMP_PID=$!

SSLKEYLOGFILE="$(pwd)/SSLKEYLOG" yarn $@

kill $TCPDUMP_PID

rm "$YARN_RUNTIME_LOCATION/sslkeylogger.js"
rm "$YARN_RUNTIME_LOCATION/yarn.js"
mv "$YARN_RUNTIME_LOCATION/yarn.js.bak" "$YARN_RUNTIME_LOCATION/yarn.js"

This script lazy-man's bash for :

"Download sslkeylogger.js from github, inject

require("./sslkeylogger")
console.log("This yarn is logging HTTPS session keys using https://github.com/forestjohnsonpeoplenet/node-https-wireshark")

into the code right before app kicks off, then run tcpdump, run yarn, and finally stop tcpdumping and put everything back the way it was."

So, the user would call ./yarnshark.sh install instead of yarn install and then collect the capture file and SSLKEYLOG file.

Unfortunately, it didn't work at first, and I almost gave up (live commentary of my dispair can be seen here).

But after banging my head on it for a while, I began to discover issues with jhford's original code -- and after fixing them on my own fork of the repository, it worked! I could see the captured HTTP requests clear as day in Wireshark after loading in the pcap file and SSLKEYLOG file. (See part 1)

It turned out to be a perennial issue with yarn not always honoring authentication settings.

The aftermath

So, we have a way to capture and decrypt HTTPS traffic from node apps. But it will only work if the node app is the HTTP client, not server. Specifically, it will only work if the app or one of its dependencies does require('https'); ... https.request(....)

A few weeks after I published my result, someone else contributed a new solution to the original GitHub issue in the form of fully fledged npm package. It looks like their solution focuses on the opposite scenario, where the node app is server. As a disclaimer, I haven't tried this one so I have no idea if it works or not.

Comments