Predicting Website Credibility Using a DNN

Over the last few weeks I’ve been working on a deep neural net to predict website credibility (i.e. how “reliable” it is). The features consist of basic website features such as its domain and a bag-of-words model.

Website Credibility

Website credibility is determined by a lot of things and a lot of the time there isn’t a right or wrong answer. Wikipedia, for example, is a notorious source because it can be edited by anyone. Nonetheless, Wikipedia does contain a lot of correct and is still considered unreliable.

Although there is no exact answer, we can often predict the credibility through many features such as the author, the “purpose” of the text, and even the date. (More can be found here)

Continue reading →

Encode Categories into Int Arrays

I had a hard time trying to encode categories into int arrays manually since it can get a bit overwhelming if there are a lot of features.

The basic logic here:

  1. Separates data into a 2 dimension string array
  2. Find every column that doesn’t parse into a double (This would not work when the category can be parsed e.g. Category 1)
  3. Loops every item in the column and adds it if it doesn’t already exist
  4. Encode categories into int arrays by setting an int in an int array to one. i.e. result[the index of the category] = 1;

Let’s get to the code, shall we?

void GetStringOptions(string[] lines, char separator)
        {
            string[][] splitedData = new string[lines.Count()][]; //2 dimension array, total data * features

            for (int i = 0; i < lines.Count(); ++i)
            {
                splitedData[i] = lines[i].Split(separator);
            }

            List<int> stringColumns  = new List<int>();
            

            for (int i = 0; i < splitedData[0].Count(); ++i)
            {
                if(!double.TryParse(splitedData[0][i], out double value)) //if value does not parse as double
                    stringColumns.Add(i);
            }

            Console.WriteLine(stringColumns[0]);

            foreach (int i in stringColumns)
            {
                int options = 0;
                int startingIndex = TypeIntsList.Count(); //index of the first item

                for (int j = 0; j < splitedData.Count(); ++j)
                {
                    if(!TypeIntsList.Any(x => x.Name == splitedData[j][i]))     // adds value to list, also counts how many total options there are
                    {
                        TypeIntsList.Add(new TypeInts() { Name = splitedData[j][i], ValueString = "", Index = i });
                        options++;
                        Console.WriteLine(splitedData[j][i]);
                    }
                }

                for(int option = 0; option < options; option++)     //generates an int array with a single 1 to activate different inputs
                {
                    int[] test = new int[options];
                    test[option] = 1;
                    TypeIntsList[startingIndex + option].ValueString = string.Join(",", test);
                }
            }



            TypeToIntGrid.Items.Refresh();  //refreshes the grid (new items won't show without this)
        }

Example output:

Iris-setosa       ->   [1,0,0]
Iris-versicolor   ->   [0,1,0]
Iris-virginica    ->   [0,0,1]
Auto-Encode Categories

I do have to admit that it isn’t the most elegant way of doing it; I will try to improve it in the future. (In other words, tomorrow.)

In addition, it is also very limited to what it can process. For example, male/female would not fit this encoding as it would output [1,0] and [0,1]. In reality, I would personally use 1 and -1 instead.

I’ve tried to combine the two for loops within foreach (int i in stringColumns); yet, it seems like I can’t get the size of the array without actually looping once and find out how many options there are.

This code is part of Neural Network GUI Demo, it’s full source code can be found here on github.

Neural Network GUI Demo

Neural Network GUI Demo

Recently, I’ve been messing around with neural networks, especially the Iris data set. Yet, building a neural network from scratch seems a bit too much work… So I started this project —— Neural Network GUI Demo. (Yes it is a demo)

I will be updating this as well as Endless Launcher.

Supported Features:

  • User-defined input/node/output number
  • Epoch/learn rate/momentum/weight decay/exit error values
  • Separator and data reading
  • Single layered neural network
  • Automatically encode categorical inputs
  • And a random text box that I use as a console

Note: There are currently no labels for the time being. This is for development purposes as it is much easier to move a text box rather than a text box AND a label.

(I’m totally not being lazy _(:зゝ∠)_)

 

This project is open source under the GPL-3.0 license, source code can be found here on github.

 

Special Thanks:

A lot of the code for the neural network are from http://quaetrix.com/Build2014.html

Endless Launcher WPF

Endless Launcher WPF – A New Start

A while ago, I stopped developing Endless Launcher due to it being way too badly executed. The config was running on registry, magnets were using drag drop… So, I started a new project, Endless Launcher WPF.

Changes:

  • UI (Completely changed) – Thanks to a friend of mine.
  • JSON config – And a public static class
  • Better magnets – A LOT more complicated
  • Better performance
  • Using WPF than Winform
  • Open source on github

Endless Launcher is a project that I started 3 years ago. It changed from java to VB6, to C# Winform, and eventually to WPF. Although I don’t play Minecraft anymore, Endless Launcher WPF is still something that I will continue to work on.

(Don’t forget to check out the source code on github)