Project Waifu: Speaker Verification

Project Waifu

Project Waifu is a long-term machine learning/deep learning project I will be working on. I will not reveal too much about it, but here’s the first part of the pipeline: speaker verification.

Text-Independent Speaker Verification

Speaker verification is the process of recognizing the identity of the speaker which in this case, is either 1 (is who we want to identify) or 0 (not the person). A lot of algorithms online uses GMMs and/or creates profiles for speakers. For this project, a MLP (multi-layer perception – regular feed-forward neural network) is used and because of the way it is structured, the algorithm performs pretty well.

Continue reading →

Predicting Website Credibility Using a DNN

Over the last few weeks I’ve been working on a deep neural net to predict website credibility (i.e. how “reliable” it is). The features consist of basic website features such as its domain and a bag-of-words model.

Website Credibility

Website credibility is determined by a lot of things and a lot of the time there isn’t a right or wrong answer. Wikipedia, for example, is a¬†notorious source because it can be edited by anyone. Nonetheless, Wikipedia does contain a lot of correct and is still considered unreliable.

Although there is no exact answer, we can often predict the credibility through many features such as the author, the “purpose” of the text, and even the date. (More can be found here)

Continue reading →

Encode Categories into Int Arrays

I had a hard time trying to encode categories into int arrays manually since it can get a bit overwhelming if there are a lot of features.

The basic logic here:

  1. Separates data into a 2 dimension string array
  2. Find every column that doesn’t parse into a double (This would not work when the category can be parsed e.g. Category 1)
  3. Loops every item in the column and adds it if it doesn’t already exist
  4. Encode categories into int arrays by setting an int in an int array to one. i.e. result[the index of the category] = 1;

Let’s get to the code, shall we?

void GetStringOptions(string[] lines, char separator)
        {
            string[][] splitedData = new string[lines.Count()][]; //2 dimension array, total data * features

            for (int i = 0; i < lines.Count(); ++i)
            {
                splitedData[i] = lines[i].Split(separator);
            }

            List<int> stringColumns  = new List<int>();
            

            for (int i = 0; i < splitedData[0].Count(); ++i)
            {
                if(!double.TryParse(splitedData[0][i], out double value)) //if value does not parse as double
                    stringColumns.Add(i);
            }

            Console.WriteLine(stringColumns[0]);

            foreach (int i in stringColumns)
            {
                int options = 0;
                int startingIndex = TypeIntsList.Count(); //index of the first item

                for (int j = 0; j < splitedData.Count(); ++j)
                {
                    if(!TypeIntsList.Any(x => x.Name == splitedData[j][i]))     // adds value to list, also counts how many total options there are
                    {
                        TypeIntsList.Add(new TypeInts() { Name = splitedData[j][i], ValueString = "", Index = i });
                        options++;
                        Console.WriteLine(splitedData[j][i]);
                    }
                }

                for(int option = 0; option < options; option++)     //generates an int array with a single 1 to activate different inputs
                {
                    int[] test = new int[options];
                    test[option] = 1;
                    TypeIntsList[startingIndex + option].ValueString = string.Join(",", test);
                }
            }



            TypeToIntGrid.Items.Refresh();  //refreshes the grid (new items won't show without this)
        }

Example output:

Iris-setosa       ->   [1,0,0]
Iris-versicolor   ->   [0,1,0]
Iris-virginica    ->   [0,0,1]
Auto-Encode Categories

I do have to admit that it isn’t the most elegant way of doing it; I will try to improve it in the future. (In other words, tomorrow.)

In addition, it is also very limited to what it can process. For example, male/female would not fit this encoding as it would output [1,0] and [0,1]. In reality, I would personally use 1 and -1 instead.

I’ve tried to combine the two for loops within foreach (int i in stringColumns); yet, it seems like I can’t get the size of the array without actually looping once and find out how many options there are.

This code is part of Neural Network GUI Demo, it’s full source code can be found here on github.