Aligning, resizing and normalization

It’s time to start implementing support for recognizing gestures with neural networks. As I mentioned in the previous post, I had seen some potential problems. After two days of work, I can finally write that those problems are solved. In this post, I’ll describe how I solved the problem of different drawing area location. In the second paragraph, I’m going to describe implemented resizing strategy. At the end, in the third part, I’ll write some words about flattering gestures and values normalization.

I would like to remind you that all of the sources are available in my Github repository. Moreover, in the README file, you can find short project description with its main assumptions.

Aligning

Depending on a drawing area location, jQuery returns different values for events performed with the mouse. It means that if on the page A a login form is placed at the top and on the page B a login form is embedded to pop up in the middle of the page, then identical gestures drawn on both pages will consist of completely different points. Coordinates of those points matter for the neural network. Knowing that we have to find a way to somehow normalize them.

The very first solution I found is to move whole gesture to the center of coordinates system. What does it mean? Let’s consider it in a few really easy steps. For all of the points for gesture:

  1. Find maximum and minimum values of x and y
  2. Find the center of a gesture using following equation:

  1. Move all of the gesture’s points using vector [-mX, -mY]

After going through those 3 steps our gesture is placed in the center of coordinates system.

To make it more clear:

The class responsible for aligning is named CentricPointsAligner:

Resizing

The neural network requires a constant number of inputs. I had to decide how many of them the proxy application should support. To avoid passing a random number I draw plenty of gestures and I checked how many points are required to build them. I started with lines, then I moved to squares and triangles and I ended up with a circle. I made up my mind that 200 of inputs for the network will be enough to represent a gesture. In this point, one very important note should be made. I thought only about gesture representation, the efficiency of NN was not even considered. It is possible that in next few days it will come up that it is impossible for the network to process all the data.

If we know that we need 200 points for each of gestures, then we should provide a strategy to resize them. Sometimes gesture will have less, sometimes more points.

Expanding a gesture

Extending strategy is very easy. The algorithm counts how many points we have and how many are missing. When those numbers are ready then it starts to generate new points between those already known. A gesture is decoupled into a series of ranges and in those ranges, new points are added (red ones).

Reducing a gesture

What about gestures that are longer than 200 points?

Firstly, we eliminate duplicated points. Duplicated points are points that are neighbors on the list of points and have same x and y coordinates. It is important to do not remove identical points that are not located next to each other.

Secondly, if there are still too many points then we remove points whose coordinates, both vertically and horizontally, differ by maximum 1 px.

Take a look at the test. It may be more descriptive:

Both extending and shortening are implemented in the RepeatingResizer class.

Normalization

Neural networks are designed to accept floating-point numbers as their input. Usually these input numbers should be in either the range of -1 to +1 or 0 to +1 for maximum efficiency. The choice of which range is often dictated by the choice of activation function, as certain activation functions have a positive range and others have both a negative and positive range.

For the purpose of normalization, I used tools provided by Encog Framework:

 

In the first line of code above we can see that formattedInputSupplier object is used. What is it? What it does?

The neural network as an input takes a flat array of values. It implies that we cannot pass into it two-dimensional array let alone a raw list of points. The common solution, which is used e.g. in the MNIST recognition, is to transform the data into a one-dimensional array. It is important to stay consistent across all of the data passed into the network. The FlattenPointsSupplier (formattedInputSupplier object) class performs described transformation.

When the list of points is transformed then we can pass it through the normalization process.

 

To sum everything up, the Aksesi Proxy is able to normalize gestures received from different pages with different gesture drawing area locations. Moreover, it is possible to resize a gesture to the required length. Day by day we are getting closer to the neural network implementation.

You may also like

Comments