Why Unsupervised Learning beats Neural Networks in Financial Analysis

what's up guys this is irene aldridge and today we're going to talk about svd uh which is singular value decomposition and its cousin pca principal component analysis now a lot of you may be rolling your eyes right now and saying oh my goodness not the svg and pca again but the truth is as we show in our new book big data science svd and pca are the optimal methodologies for doing big data they're way better than doing machine learning many machine learning techniques are susceptible to overfitting big um svd and pca are a lot lot better at handling overfitting than than things like neural networks for example and the reason for that is because svd and pca deliver optimal decomposition and you can see again the proof in our book i'm not going to talk about the proof today what i'm going to do today is really show you what uh that means so how it optimally uh works with data and and what does it do so in traditional analysis there's a researcher who is like you and that researcher has to come up with a hypothesis and this is something that a lot of phd students have struggled over the years like where do i get my idea how do i get this hypothesis to work what what kind of idea do i need to come up with and the traditional response is oh you have to read a lot of literature because you know this is where the ideas come from et cetera et cetera so what you end up is a lot of repetitive ideas really um that reinforce potential biases or reinforce our opinions that may not necessarily be reflective of the true reality that's out there so to get to the true reality svd or pca are the tools because what these tools are doing they're taking your data and then they're finding the optimal factors in your data without your interference so they're doing it using what's called eigenvectors and eigenvalues and eigen and german means intrinsic so these are the factors that are intrinsic to the data itself okay and and as we again which we show in our book you can prove that these factors are absolutely optimal so this these are the factors that you're supposed to be working with uh you don't even need to dream up any other factors okay and the the added bonuses these factors are linear so your data is related to the linear so there is lower risk of overfitting it's compared to non-linear models where you torture the data to the point where it finally fits but then of course it doesn't fit out of sample so what i'm going to show you is an application that we have on our website i'm going to share my screen now as you know we have a book website called big data finance book dot com um and we're going to share the screen okay um okay here we go all right so first i'm going to show you the code okay so there's a code on our website uh again it's big data financebook.com um and under the section code if you go to chapter five you have to sign up first for the website but here's the code of how to implement what i'm going to show you next okay and this is what i'm going to show you next this is you don't need to sign up you just go to our website and then you type slash eigen portraits okay and i can for eigenvalues and portraits for portraits because the easiest way to illustrate the power of svg and vca is really to look at the image decomposition so here's we have an image um which is a mama i have been collecting different names for a while related to ai uh but here's uh uh captain i forgot what his name is and he's saying artificial intelligence all right so we're going to use svd uh to actually decompose this image okay so this image right now contains 550 columns um it's it's and after we decompose it we get what's called the scree plot of singular values and it's all described in detail in chapter five in our book okay so but this this is the summary of um uh basically different components that comprise this image and and with these components are um ultimately they represent linear combinations of our columns okay so you can think of of this image as a two-dimensional data set so you have 500 um columns and i don't know how many rows there are less than 500 in this case okay so um and the interesting part is each column is like a column and a table right because it's it's a two-dimensional image so uh and each uh value like each pixel represents a value two it represents uh in grayscale represents a color right from zero to 155 where zero is black 255 is white so anyways if we take only the first of five of these principal components what we get is this image here which is barely intelligible okay so there's something there but not really if we go to 20 right here we're already getting uh an image that we can see and if we use what's called marchenko poster limits which are optimal ways of determining the number of important eigenvalues that we should take into account in this case for this image it's 73.

We we discussed marchenko buster in chapter 7 of our book as well um so and how to compute it but you can see that with merchant capacitor optimality we're already getting basically the entire image very very crisp okay and and that's the whole point so what what i want you to do here uh what i'm showing you this for you can do this for any kind of image so for example i have a different image which is actually the three covers of uh my books i i have more books by the way but but these are some of the books that they've written okay so and and now we're going to process them and we're doing the same thing okay so these are the three covers now there are 495 columns the optimal number of columns now according to marchand combustor is only 63.

This is discrete plot it's a lot more kind of looking like a right angle here's again the construction with five principal components reconstruction with 20 principal components and the reconstruction uh with 63 principal components which is produces the best results almost very close to the original the color is messed up i mean it is doing it in grayscale but this is this is really uh to show how you can reproduce the content using just so many factors now what does this have to do with finance well as i'll show you always as we show in the book okay also in chapter five and chapter six you can build really really solid portfolios that are uh consistent with with a bunch of um traditional metrics like cap-m and the ross's apt because they're linear factors com basically linear factor models where the factors are optimally determined by big data okay and that in that case you're producing just much more sound outcomes than if you took some factors and you guessed which one goes where uh and then you just torture the data until it fits so bottom line is this is a superior method and supervised learning and this this this belongs to the unsupervised learning uh it's it's a very very solid uh idea the true artificial intelligence that people are referring to and and something that's here to stay and i hope you check it out also don't forget to download the code uh it's all available for you there and i'll see you soon thank you and have a great day you

test attribution text

Add Comment