Identifying individuals in public events with social networks and computer vision

Written by  on May 5, 2018 

Do you use to participate in public events such as demonstrations, rallies, strikes, illegal voting events, violent revolutions or such? Are you a fugitive ex secret agent? Then you have to read this and, maybe, worry a little bit.

So tonight I was doing a marathon of Jason Bourne movies and I came to scenes like this  where the CIA uses a lot of cameras and OSINT/SOCMINT tools some of them with facial recognition systems to automatically detect Mr. Bourne, I started thinking about the truth vs the myth on that so I finally decide to replicate that myself with a small proof of concept.

So, jokes aside, today I want to talk about how easy it is (or it may be) to identify people using their social network information in public events via the usage of, for example, the twitter hashtag of the event and then the twitter api and a little bit of computer vision /machine learning. In python.

 

RETRIEVING DATA FROM TWITTER

As I said, big public events like large demonstrations or strikes use a twitter hashtag for communicating what’s happening in real time. In public events, you can expect a lot of people commenting on a hashtag, and also a lot of them posting photos of themselves there. The thing here is that some of those who are uploading photos may voluntarily or not include some other people like friends or anonymous pedestrians in their photos. Of course these “third persons” may be susceptible to be detected, confirming their presence in that event. So I’ve decided to take this approach. I’ve arrived to the conclusion that a hashtag based search of photos may be an interesting way of detecting people in public events.

So, the first thing that we are going to need here is to be able to speak with twitter for retrieving information such as all the tweets related to a particular hashtag.

You can install tweepy with:

pip install tweepy

A simple script for getting our profile information with tweepy may be something like:

import tweepy

consumer_key = '00'
consumer_secret = '00'
access_token = '00'
access_token_secret = '00'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

user = api.me()

print('Name: ' + user.name)
print('Location: ' + user.location)
print('Friends: ' + str(user.friends_count))

What the program does is to take our developer information and use it to authenticate into the platform, then it shows the personal information of the user. Look how we access a property of the structure user with the “.”

That will give something like:

But that doesn’t make any sense for what we want. We will need to get tweets and parse them for some interesting information. Let’s retrieve a tweet and see what kind of information we can get from it:

Something like:

stuff = api.user_timeline(screen_name = 'devilafant', 
count = 2, include_rts = True)

for t in stuff:
    print t._json

Will print all the information inside the “status” data structure that we get after asking for a tweet in JSON format, so we’ll be able to see some interesting fields:

Here we can see that the information that we get here is huge. So after looking at that I started doing some research about the fields inside the status structure and I came with the most interesting for this case of study:

  • status.user.screen_name
  • status.text
  • status.user.location
  • status.entities[‘media’][‘media_url’]

Here we see that we can retrieve information about a tweet in two forms. The first form is directly calling the field using as a “property” with the “.” so status.user.screen_name will get us the author username, status.text will get us the tweet text and status.user.location will get us the location of the tweet if available. But also we can retrieve information using the “json-like” way, directly looking for a field or key in the structure, the last expression will get us the URL of the image which is the most interesting part  here, all the other fields except, maybe, the screen_name are optional. You can quickly realize that we can use some kind of library like wget in python to download the media_url from twitter and then process it, thats what we’ll do later.

I’ve personally decided to retrieve these fields of the tweet because of course the image is what we want, and we may want to relate that image to a particular username if we want to start a complete osint gathering process, the status text probably may give us some information about other third persons involved or maybe about the context of the picture and of course the user location is also interesting, specially if we are trying to analyze a global event such as the world day of something or a worldwide protest.

Now that we know how to extract fields from a tweet, we need to know how to start watching a stream of tweets and “mining” all the interesting information. The twitter api can help with that via tweepy’s streaming function.

class TWListener(StreamListener):
     def on_status(self, status):
        print status.text

    def on_error(self, status):
        print (status)
        return True

twitter_stream = Stream(auth, TWListener())
twitter_stream.filter(track=[hashtag])

What’s happening here is basically that, we first define a TWListener using StreamListener inside the tweepy library. And we define a function called on_status that will perform some user defined action for EVERY status retrieved in REAL TIME from the hashtag, here I simply told the program to show the text of that status. We can also define a “on_error” function, no need to explain what it does. Then we use our previously obtained auth data with our TWListener class to call Stream from tweepy. Instead of filtering ALL of the tweets in our timeline, we get those matching a user defined hashtag, stored inside the hashtag variable. I personally suggest you to play with trending topics. Here’s a video showing how it works.

As we see we keep receiving “live” information about all the tweets that are being delivered in real time, using that hashtag.

RETRIEVING IMAGES FROM TWITTER

The next thing we need to learn how to do is how to physically download the images of the people, as we’ll need to analyze these images for detecting the face of our targets. As we saw, the media_url field contains the url to the image. Now that we have the url we only need something like wget to put that image on our hard disk.

apt-get install wget-python

Will get us the library that does that. Now the procedure is so simple, we just need to call wget with that field for every status that contains an image. As python is a really simple language, we can check if a post contains an image with “if ‘media’ in status.entities”, status.entities stores the media information if available.

I wrote the following and it worked for me:

class TWListener(StreamListener):

   def on_status(self, status):
      if 'media' in status.entities:

      for image in status.entities['media']:

         print image['media_url']

      filename = wget.download(image['media_url'], 
      out='/var/www/html/sample')

      print "IMAGE DOWNLOAD"
      print filename

      else:
         print "no image"

What the script will do at this point is to go through every new post, check if the post contains an image and then if an image is found it will download that image in the “/var/www/html” folder. Why that folder? We may want to display these images in some way using a web app, but don’t mind, any other folder should do the job as well.

Here’s a video showing how it works in real time:

 

EXTRACTING THE FACE FROM AN IMAGE

If we are going to search for individuals in our platform using these images, the first question we have to ask ourselves is: How do we recognize a person? This question has an easy answer for now, we’ll detect the face of each individual appearing on our downloaded photos and then we’ll take one of these approaches:

  • We’ll compare the face to all other (known) faces we have to see who the individual is.
  • We’ll compare the image to the image of the individual we are looking for to get the degree of similarity.
  • We’ll launch the image to a (trained) neural network to see who the individual is.

In the PoC that I wrote for this example I wrote a small “sub-system” for cropping the face of a photo and then storing the face in another folder.

I’ve used the HaaR classifier from the python OpenCV library for doing the face detection, and then the extraction. As I said I used a HaaR classifier? But what is this?

That classifier is an algorithm based on a machine learning system created by Paul Viola and Michael Jones that was trained with a lot of images of faces (positive) and also a lot of images of non faces(negative) so “it knows” how to detect if a particular image is a face (and where it is, of course) or not.

That algorithm starts by extracting the Haar features from each image as shown here:

Each window is placed on the picture to calculate a single feature. This feature is a single value obtained by subtracting the sum of pixels under the white part of the window from the sum of the pixels under the black part of the window.

Now, all possible sizes of each window are placed on all possible locations of each image to calculate plenty of features.

For example, in above image, we are extracting two features. The first one focuses on the property that the region of the eyes is often darker than the area of the nose and cheeks. The second feature relies on the property that the eyes are darker than the bridge of the nose.

But among all these features calculated, most of them are irrelevant. For example, when used on the cheek, the windows become irrelevant because none of these areas are darker or lighter than other regions on the cheeks, all sectors here are the same.

So we promptly discard irrelevant features and keep only those relevant with a fancy technique called Adaboost. AdaBoost is a training process for face detection, which selects only those features known to improve the classification (face/non-face) accuracy of our classifier.

In the end, the algorithm considers the fact that generally: most of the region in an image is a non-face region. Considering this, it’s a better idea to have a simple method to check if a window is a non-face region, and if it’s not, discard it right away and don’t process it again. So we can focus mostly on the area where a face is.

Extracted from here (!)

TL;DR

We are using a pre trained neural network to kown if an image is a face or not and where it is. The system was trained with a lot of faces and non faces. The system knows if the picture is a face because of features like: darker shapes like the eyes, clearer areas like the cheeks and so on. We’ll just load this using a library. There are other algorithms like LBP but we’ll use this cause its more accurate for what we want.

So I wrote this code using the haar classifier:

def facecrop(image):

    test1 = cv2.imread(image)
    gray_img = cv2.cvtColor(test1, cv2.COLOR_BGR2GRAY)

    haar_face_cascade = cv2.CascadeClassifier
    ('data/haarcascade_frontalface_alt.xml')

    faces = haar_face_cascade.detectMultiScale(gray_img, 
    scaleFactor=1.1, minNeighbors=5);
    facesfound = [] # if you want to do some storage

    for f in faces:
        print "face detected"
        x, y, w, h = [ v for v in f ]
        cv2.rectangle(test1, (x,y),
        (x+w,y+h), (255,255,255))

        sub_face = test1[y:y+h, x:x+w]

        cv2.imshow('aaa',test1)
        cv2.waitKey(500)
        cv2.destroyAllWindows()

The code I just presented first loads the haar classifier and then launches the classifier against one image. The module detects ALL the faces in the foto and for each one of them it prompts a message and then draws a rectangle creating a sub image containing only the face that has been detected on our photo. It shows that image for half a second in the screen and then of course we can save that sub-image into our faces folder (not implemented in this example).

You can see it working here:

 

BUILDING AN AUTOMATED EXTRACTION SYSTEM

So now we know how to parse live tweets using the hashtag of a public event, detect if these tweets contain photos or not, download all the photos and even detect and store the faces detected on those photos, we may want to have all of this fully automated before proceeding. So basically I’ve created the following database.

 

The final result we want to have is the face, as we will be working with that, so going bottom up from that a face is a part of a photo and that photo was extracted from an event using a hashtag and belongs to some user, also a photo is part of a tweet that contains some text (status) and some location (if we are lucky enough).

What I’ve built for the PoC is a script that does all the tweet parsing and image cropping and then it sends that via a JSON post request to a API server written using FLASK that just loads that information inside a mysql DB.

You can see the code here:

Also you can watch a video of the system working here (I choose a screenshot so you can appreciate the indentation more easily):

From the code above we can see that facecrop will do all the face cropping and storing work for us.  The thing I wanted to show you here is how easily we send all the information to our API server via a simple json post request using python requests and flask (on the other side).

The code on the other side handling this may be something like:

Anyway, you can get the full system here.

And a video showing the client-server architecture working with the faces:

 

(BASIC) IMAGE COMPARISON, MANHATTAN & ZERO DISTANCE

Now that we have different faces and we can relate them to a username, let’s start searching!

In this PoC I’ve been playing with two basic methods of image comparison the manhattan distance between two images and then the zero norm.

I mainly use the Manhattan distance for comparing two images and calculating some distance between them, being 0 the maximum degree of similarity.

The Manhattan distance function computes the distance that would be traveled to get from one data point to the other if a grid-like path is followed. The Manhattan distance between two items is the sum of the differences of their corresponding components.

But the l0 norm method were we look for normalized values not being zero in the difference.

Note that I use those methods presented, at a pixel level.

I’ve used this function:

def compare_images(img1, img2):
    # normalize to compensate for exposure difference
    img1 = normalize(img1)
    img2 = normalize(img2)
    # calculate the difference and its norms
    diff = img1 - img2
    m_norm = sum(abs(diff)) # Manhattan norm
    z_norm = norm(diff.ravel(), 0) # Zero norm
    return (m_norm, z_norm)

So if I compare the same image with itself:

And if I compare the image with another one, being the other one a face, same individual:

Finally if we compare the image with a image of another face, different individual we get:

So as we can see, this time the Manhattan diff is higher the zero norm is lower. As I previously said, we’ll be trusting the Manhattan distance this time, but note that this method is pretty uneffective! It’s useful, basically for detecting matches, images that are basically the same, or really really similar. Also note that having the face “cropped” in our filesystem do help a lot into avoiding false positives.

FACE RECOGNITION WITH NEURAL NETWORKS

The other approach for detecting people in images is to use a neural network based system. Basically a neural network for image classification is trained with a lot of images corresponding to one individual and another ton of images corresponding to another one, at the end the neural network should be able to distinguish between the A individual and the B one.

In this approach I’ve used a neural network based on the local binary histogram pattern.

As a brief definition:

Local binary patterns (LBP) is a type of visual descriptor used for classification in computer vision. LBP is the particular case of the Texture Spectrum model proposed in 1990. LBP was first described in 1994. It has since been found to be a powerful feature for texture classification; it has further been determined that when LBP is combined with the Histogram of oriented gradients (HOG) descriptor, it improves the detection performance considerably on some datasets. A comparison of several improvements of the original LBP in the field of background subtraction was made in 2015 by Silva et al. A full survey of the different versions of LBP can be found in Bouwmans et al.

But this is not a computer vision/mathematics post, this is a hacking post so I just learned about how it works, how to used and I applied that to my algorithm. I strongly suggest you to read this and this

I’ve implemented this algorithm by using opencv2, numpy and PIL in python.

The trainer module goes like this:

def get_images_and_labels(path):

  image_paths = [os.path.join(path, f) for f in os.listdir(path)] 
  images = []
  labels = []
  for image_path in image_paths:
   image_pil = Image.open(image_path).convert('L')
   image = np.array(image_pil, 'uint8')
   nbr = 2

   print nbr

   faces = faceCascade.detectMultiScale(image)

   for (x, y, w, h) in faces:
    images.append(image[y: y + h, x: x + w])
    labels.append(nbr)

    cv2.imshow("Adding", image[y: y + h, x: x + w])
    cv2.waitKey(10)
return images, labels

images, labels = get_images_and_labels(path)
cv2.imshow('test',images[0])
cv2.waitKey(0)

recognizer.train(images, np.array(labels))
recognizer.save('trainer/trainer.yml')

With that code we are basically sending a lot of images to the neural network to be trained using the approach we just presented. The network trains with this image and then it gives a label (1 in this case) to it, this is important, we’ll associate every individual to a label. The label can be an integer, for example.

Note that we save the model we just created using a yml format.

We’ll then use the network to detect an unknown image, with a detector module such as:

recognizer = cv2.createLBPHFaceRecognizer()
recognizer.load('trainer/trainer.yml')
cascadePath = "Classifiers/face.xml"
faceCascade = cv2.CascadeClassifier(cascadePath);
path = 'dataSet'

font = cv2.cv.InitFont(cv2.cv.FONT, 1, 1, 0, 1, 1) 
im = cv2.imread("test-data/iniesta.jpg")

gray=cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
faces=faceCascade.detectMultiScale(gray, scaleFactor=1.2, 
minNeighbors=5, minSize=(50, 50), 
flags=cv2.CASCADE_SCALE_IMAGE)

for(x,y,w,h) in faces:

   nbr_predicted, conf = recognizer.predict(gray[y:y+h,x:x+w])
   cv2.rectangle(im,(x-50,y-50),(x+w+50,y+h+50),(225,0,0),2)
   print conf

   print nbr_predicted
   if(nbr_predicted==1):
   nbr_predicted='elvis'
   if(nbr_predicted==2):
    nbr_predicted='iniesta'
   if(nbr_predicted==None):
    print "nada"
   if conf < 40:
    print "not so accurate"
   else:
    print "probably right person"
   cv2.cv.PutText(cv2.cv.fromarray(im),
   str(nbr_predicted)+" probability: "+str(conf), 
   (x,y+h),font, 255) #Draw the text
   cv2.imshow('im',im)
   cv2.waitKey(0)

As we see, we load the trained neural network.We get an image, we extract the face from it and then we go through faces, for each face we try to use the network to detect who’s face is. Note that the conf variable may help us in knowing how accurate the detection is. Also note that here I set the label 1 for Elvis and the label 2 for Iniesta, as I use data of both public known persons to train the network.

Let’s see some results:

SOME CONCLUSIONS

I’ve came to some conclusions during the time I spend in this “weekend project”. Basically here I saw that it is so easy to associate a public event with a virtual one via twitter/instagram hashtags and also, one can get a lot of information in a matter of seconds about a public event using these platforms. On the second point I learn that working with social network data is not a hard job, I can even automate a whole data extraction process in a couple of hours of less! Now there goes the hard part, doing image processing is an incredibly interesting topic, but it’s not as easy as working with social data and clicking a couple of buttons, there are a lot of different methodologies about comparing and recognizing images. As some conclusions related to the face recognition part I have to say that:

Simple methods such as the Manhattan distance will help us a lot looking for almost identical images, anyway we’ll have to spend a lot of time (depending on the event, and its activity) on comparing every image with the image of the individual we are looking for. On the other hand doing a neural network based approach may be hard when it comes to train a network. One strategy, related to the neural net part may be to train the net with a lot of images of random people, and then with a lot of images of the individual we are looking for, anyway, we need a lot of images of the individual we are looking for for this to work, we may get those images from social networks by doing some osint but..

A solution for the problem of not having enough images of the individual we are looking for may be using a video of him, we can extract a lot of different frames related to an individual from a video, a security camera tape about the individual may help a lot. Also we have to think about is that we may need a lot of storage and cpu power to run a system like that, as we’ll need to extract a lot of data and process it quickly to train the network and compare images in real time.

Finally, another conclusion we have to keep in mind is the following: When working with neural network based systems or even functions like Manhattan and zero norms, we need to get fuzzy and work with lower and upper limits as on the average case we’ll have to establish a limit such as if a detection comes with a conf > 80 then we can be “sure” that it’s a match.

HALEFALCON, AN API BASED AUTOMATED SYSTEM

I’ve been working on an automated system based on this approach. I’ll be uploading the full source code here:

https://github.com/dc170/HaleFalcon

I hope you enjoyed this post, and now you know, don’t get caught!