hey guys this is Aayushi from in Edureka and today's session we would be discussing algorithm that helps us analyze the past

trends and lets us focus what is to unfold next so this will go rhythm is

time series analysis now let's quickly jump onto our agenda and let's see what

all we are going to cover in today's training so we'll start off this session

by understanding why do we need time series analysis and then we'll

understand what exactly it is now once we clear with time series will then see

the different components that we need to take care while we apply time series

then we'll also discuss when should you not use time series analysis or what are

the cases when you should not apply time series analysis moving ahead in the

session we also discuss what is stationarity or what are the tests that

are used to perform to check the stationerity of the data next we'll be

discussing the ARIMA models now ARIMA model is one of the best model that has

been used in time series so we'll have a discussion on that and we'll finally go

ahead with the demo part we're in and implement all these things and help you

guys to forecast the future as well so I hope you guys are clear with the agenda

so kindly drop me a quick confirmation or you can just write it down in your

chat box so that I can proceed all right Monica says I yes okay so what they gave

me a thumbs up naman Shivani all right since you guys are clear so

let's begin with the very first topic that is why should you use time series

analysis so first of all in time series analysis you just have one variable that

is time now you must have seen there is a lot of algorithms present then why do

we need one more algorithm that is time series so let me explain you this with

an example now let's take an example of a supervised learning so under

supervised learning we have linear regression or logistic so there we have

an independent variable and we have a dependent variable so there what we do

we deduce a function or you can say a mapping function of how one variable is

related to another and then we can go ahead with analysis part but in time

series analysis you just have one variable that is time so for example you

own a coffee shop it's quite a successful coffee shop in the town so

what do you do you try to see how many number of cups of coffee you sell every

month for that what you will do you add up all the sales of your coffee now

let's say you started this coffee shop in the first month that is the January

so what you'll do you record the data month wise and then you'll sum it up so

you will have all the data till the present month but what if you want to

know the sales the next month or the next year

now imagine guys you just have one variable that is sales and you need to

predict that variable in accordance with time so in such cases we're just

halftime and you need to predict the other variable you need time series

analysis now we know why do we need time series analysis let's move ahead and

understand what exactly time series is so time series is a set of observations

or you can say data points which are taken at a specified time now over here

at your x-axis you have the time and on the y-axis you have the magnitude of the

data so if you try to plot time series plot on the x axis you will always get

the time which is divided in two equal intervals so cannot create a time series

in one data point is at week level and other are different this should be equal

interval let's say a day a week a month a year a decade and a century so that is

the constant thing that a time series require now let us see the importance of

time series analysis now first and foremost is business forecasting because

your pass defines what is going to happen in future so let's say you'll be

seeing a lot of traders in the same six who are trying to predict what will be

the price of the stock market tomorrow so that is nothing but a business

forecasting you also see a lot of retailers who tries to know how many

number of goods they are going to sell the next day so all of this can be

achieved with time series analysis now this is not just limited to one domain

like retail or finance but it is applicable almost everywhere now it is

also help us to analyze a past behavior so here you can analyze in which man did

the sales went up or when was the dip so here you can easily understand your past

data so with every dip and a peak there is a business reason attached to it so

you can understand this with respect to time for example some festival is there

and you're selling chocolates so your sales will increase during a festival so

you need to think about the seasonality part also now don't worry guys we'll be

having a complete discussion on seasonality as well so now coming bad it

also helps you to plan the future operations

so you can analyze the past and then you can forecast your future using this

algorithm that is time series analysis now apart from all these we can also

evaluate current accomplishment so this means you can deter my

which goals you have met in the current scenario let's say you have predicted

okay I am going to sell around hundred chocolates in a day but didn't you

actually do that so all of this can be analyzed using time series analysis

moving ahead let us see the different components of time series now most of

time series have trained seasonality and irregularity associated with them and

some of them do have cyclic patterns also but it is not compulsory that there

has to be pattern present so let us discuss each one of them in detail now

the first is train the trend is nothing but a movement to relatively higher or

lower values over a long period of time so when the time series analysis shows a

general pattern that is up firt we call it an uptrend also if the trend exhibits

a lower pattern that is down we'd call it as a downtrend and if there was no

Train we call it as a horizontal train or you can say a stay steady train so

now let me explain you better with an example so there is a new Township that

has been constructed okay and people are going to come and live over there so

what happens a hardware guy comes up and opens up a shop there so people will be

coming up will definitely buy a stuff from there now once all these houses are

settled up or it's been occupied the mean of hardware reduces so the train

may go down so let's say the sales were up in the first year and buy another one

here or maybe in six months it has gone down so that is the trend guys so for

some amount of time selling was high and then it got down but this is not a

pattern this is something that is happening here on here but trend is

something that happens for some time and then it disappears then we have

seasonality so Hill season and it is basically upward or downward swings but

this is quite different it's a repeating pattern within a fixed time period so

for example Christmas happens every year 2050 simple let's say you're on the

business of chocolates so every year on year chocolates are served more and more

in the last week of December now this is because Christmas is there and you've

been able to sing this across to you that is from past two years four years

six years ten years and so on so it's a repeating pattern within a fixed time

period while in trend that is not the case now let me take another example

let's say ice cream this time so ice cream sales will go comparatively higher

in summers rather than in winter so that is again a seasonality

then we have irregularity or it is also called as noise so these are erratic in

nature or you can say unsystematic it is also called as residual so this happens

basically for short duration and is non repeating so here let me give you an

example so let's say there is a natural disaster let's say there is a flood in

your town out of nowhere in one year now a lot of people are buying medicines and

oil meant for relief but after some time when everything is settled up the sales

of those oilman's have gone down so this is something that no one could have

predicted it's going to happen erratically you don't know how much

number of sales are going to happen so you cannot force you about the event

that the flood is happening okay so this is some random variation so this is what

a regularity is now moving ahead we have cyclic so cyclic is basically repeating

up and down movements so this means you can go over more than a year so they

don't have a fixed pattern so they can happen anytime let's say in two years

then fourth year then maybe in six months so they keep on repeating and

they are much harder to predict now moving ahead let's discuss when not to

apply time series analysis so first of all you cannot see time series analysis

when the values are constant so let me take the same coffee example over here

so let's say the sales of number of coffee in the previous month were 500

then this month also the sales number is almost the same that is 500 I wanted to

predict the number of sales in the next month now in such cases where the values

are constant as in our case the number of sales so 500 in the previous month

and then in this month also we have the same number and now we want to predict

it for the next month so in such cases where the values are constant time

series cannot be applied similarly if you have values in the form of functions

let's say you have sine of X or cos of X so for example in this case you have X

value and you can get the value by just putting it in the function so there is

no point of applying time series analysis where you can calculate the

values by just using a function now you can apply time series to these as well

but again there is no point of applying it if you have a formula before that or

the values are just constant so these are the cases when you should not apply

it I am series moving ahead let us see what is stationarity so no matter what

guys how much you try to avoid the stationarity part it will

always be there in dying cities so here time-series requires the reader to be

stationary so any kind of statistical model that will apply on time series the

data should be stationary so let's understand what exactly it is now most

of the models work on the assumption that time series is stationary

now if the time series has a particular behavior over time there is a very high

probability that it will have then it will follow the same in the future also

the theories and formulas that are related to stationary series are more

mature and easier to implement as compared to non stationary series now

there are two major reasons behind the non stationary of a time series so first

is train which is basically the wearing mean over time secondly we have

seasonality so this is the variation of a specific time frame but did you guys

get the answer to this question what exactly is stationary or how exactly

Society is defined so stationarity basically has a very strict criteria the

first one is it should have a constant mean now here the mean should be

constant according to the time secondly we have constant variance so again

beading should be equal at different time intervals and thirdly we have auto

covariance that does not depend on time so for those of you who don't know what

mean is I not go into the details but I'll just explain you in a nutshell so

mean is basically the average then variance is just the distance from the

mean so each points distance from the mean should be equal and then we have

Auto covariance that should not depend on time or it should be equal as well so

for example let's say you're standing at time T okay and your previous time

period was P minus 1 or P minus 2 let's say there are previous two time periods

so the values at P minus 2 or P minus 1 P they should not have any kind of

correlation between them which is basically dependent on your time period

so that is nothing but auto covariance so when these three conditions are met

then we can say at series is stationary and then we can apply time series

analysis over it now to check the stationerity in python we have two

popular tests now first is rolling statistics and second is a DCF Oregons

are augmented by key for your test now in rolling statistics we can plot the

moving average or you can say moving variance and see if it varies with time

now by moving average or variance I mean that any instance T

take the average or variance of a time window let's say if you want to know for

the last year that is for last 12 months or anything and also guys this is more

of a visual technique so you cannot deploy this kind of stuff on production

but it is quite useful for the POC purpose then we have a DCF or you can

say augmented dickey-fuller test in the world of data science so Dickey for your

days which is another statistical test for checking stationarity now here you

have the null hypothesis which is time series is non stationary and once you

perform this test you will get a result which comprises of a test statistic and

some critical values for different confidence level now here it is said

that if the test statistic is less than the critical value we can reject the

null hypothesis and say that the series is stationary so don't worry guys I will

be explaining this again when we go to a demo part but I hope you guys are clear

with what exactly stay steady and how we can check the stationerity all right so

now let me just move on to my next topic so now I will discuss what exactly is

ARIMA model now ARIMA is one of the best model to work with time series data so

this is basically the combination of two models that is AR plus MA and it's quite

powerful guys so once you combine both of these model you get the ARIMA model

now your AR model stands for auto regressive part an MA model stands for

moving average so AR is a separate model MA is a separate model and what binds it

together is the integration part that is indicated by I so air is nothing but the

correlation between the previous time period to the current so what does this

mean now let's take this into consideration that you are standing at a

time period t and there are previous time periods like t minus 1 t minus 2 t

minus 3 now if you find any correlation between p minus 3 and t that is nothing

but the auto regressive part so as i told you earlier that there is always

some kind of noise or irregularity attached in a time series so need to

figure out that noise in fact we need to average that out now whenever we try to

average it out the cross and drop set of prison in that noise smoothen out and we

can have average focused of that noise you can actually never predict when a

next customer is going to come in and buy hundred items at once so try to

soothin it up by taking its average now ARIMA model has three parameters it has

p it has Q and has D so P basically refers

to your auto regressive lags then Q stands for moving average and D is the

order of differentiation so we have each parameter for each of the models so if

we take the integration by just one order then the value of D would be one

if we differentiate it in the order of two then we have the value D equals to

two so that is how we can predict these values PQ and D and each of them has a

different method to it so if you want to predict the value of P you will be using

and PS EF graph that is nothing but a partial autocorrelation graph then to

predict Q value we need to plot a CF lot that is autocorrelation plot and D I

have already told you to make data stationary we use some kind of

differentiation so the order differentiation defines the value D so I

guess enough of theory part so now let's quickly jump onto the demo and let's see

how you can implement all of these things so now we'll have a look to a

demo and we'll focus the future so here we have a problem statement with is a

line which has the data of passengers across months so here what you need to

do you need to build a forecast to determine how many number of passengers

are going to abort these Airlines at the month level in the future so here we

have month or you can say dates so here we have dates from 1949 till 1960 and we

have the number of passengers traveling per month so now we have this kind of

data and we need to analyze what will be the number of passengers if you have to

predict it for next ten years so now let me just go to my jupiter notebook and it

is how my predictions look like so guys this is my jupiter notebook pen i have

the code and we'll be implementing all the things that we have discussed till

now so first of all we'll be inputting all the necessary libraries so here we

have imported numpy then we have imported pandas for data analysis part

and you can say data processing then we have imported Madrid live for data

visualization creating plots and all those things then in order to implement

matplotlib we have also written percentage matplotlib in line for

jupiter notebook so not get a particular plot open in a new window everything

will be there in your jupiter notebook itself and then i have just defined the

size so now let me just run this next what I've done I have imported my

air passengers data using pandas so we have a function of read CSV in bundle

that is represented with PD so we have substituted this in a variable data set

and then what we have done we have just passed those strings in a date-time

format so here we have set our data month wise so using pandas we have a

function to date/time so over here you can specify a month and then you can

just set this as your index so here you have index variable as month next what I

have done I have imported date/time and then I have just printed the top five

values so now let me just run this this is how my data looks like I have

month asthma index and then I have number of passengers asthma second

column so this data have already showed you in the presentation where I have the

data from 1949 until 1960 so I have just printed the head of it so now let me

explain the pain so let's say I want to know the last five data entries so here

we have data till 1960 and we have the number of passengers next what we have

done we have simply plotted a graph between them so guys in time series we

have date and we have another variable so here my other variable is number of

air passengers so here we have date on my x-axis and number of passengers on my

y-axis and then we have simply plot that graph so now let me just run this so

this is how your data look like so here if you notice you have a trend so our

next step is to check the stationerity so I'll give you 10 second guys and

think whether this data is stationary or not

so just think and give me a reply whether this data is stationary or not

right Shivani so this data is non-stationary so here you can see the

trend is going up so let's say if you want to calculate the mean at 1951 so

here your mean will lie somewhat over here and let's say we want to calculate

the mean of this year that is 1960 so here your mean will be somewhere here so

here you can see that you have up for train and the mean is not constant so

this tails mean your data is not stationary so now I have told you guys

that there are two tests which basically helps you in checking the Society of the

data so here we have rolling statistics as well as we have a DCF let us go

through each one of them so I will be first going to the rolling statistics so

here we have rolling mean and we are rolling standard deviation so here as

you can see we have a window of 12 that is nothing but the window of 12 months

so let's say we have Jan of 1949 and you place the value of Jan 1950 with the

value of 1949 so this gives you the rolling mean at a yearly level and you

have to do the same with the standard deviation as well so in Python to

calculate mean and standard deviation you have a function dot mean and you

have got STD so this will automatically calculate mean and standard deviation so

now let me just run this so here if you notice your first 11

roses na n that is not a number now this is because we have guys created all the

averages of these 11 and given over here and similarly you can do the same for

the next ones next if you just scroll a little bit you see it's a long data set

and you have the same result for standard deviation as well so it's the

same procedure guys average has been calculated and then just give an hour so

here must be having a question by only 11 values are in here so over here we

have just given a window of 12 lets have given a window at daily basis or you

have data at a day level then your window size would be 365 so here my data

is at monthly levels so the focus will be on monthly only now similarly if you

have data at day level then probably your window can be 365 so I hope you get

the reason why I am giving the wind as 12 and by via calculating the mean and

standard deviation then what we have done we have simply plotted this rolling

statistics bar so here we have the original data which is just plotted by

the color view then we have the mean data so here we have just plotted the

mean for what we have just calculated above and then we have given the color

red to it similarly we have plotted the same for standard deviation and we have

given a color black to it after that we have just given a legend we have given a

title to it and now let me just run this code for it so over here you can see we

have a plot somewhat like this so nice blue line is my original data and as you

can see I have my mean in red and I have a strolling standard deviation in black

color so over here you can conclude that your mean and even your standard

deviation is not constant so our data is not stationary so guys this is my

rolling statistics method is again a visual technique so here we have already

concluded that this is not a stationary data set now let me perform

dickey-fuller test as well so to perform dickey-fuller test in python you have to

import from stats modeled or TSH scat tools input a be fuller now this is the

function which has been provided in Dickey fuller test so here I have a

function that is ad filler I have passed the data set into it which is the number

of passengers and then I have just given a lag which is equals to a I see now

AIC is basically a chi k information criterion now what does this AI c mean

so a IC gives you the information about what you wanted and I

Cirie's the exact values the actual value and analyzes from the difference

between them so don't just worry about these guys for now just think about this

as a metric and see what happens when we just run this particular test so when we

run this we'll have values to test statistics we have key value number of

lags that has been used and number of observations used and then we have

printed the values in a loop so now let me just run the cell as well

so this false statement will basically pin all the values now I have a state

statistic value a p-value number of lags use number of observation and we have

critical value at different percentages so here your null hypothesis says that

your p-value should be always less so here we have a very large value that is

0.9 so this should be somewhat around 0.5 so that would be a great thing also

a critical value should also be more than the test statistic so here we

cannot reject the null hypothesis and we can say that data is not stationary then

what we'll do we estimate the trend so here also with the results of Dickey

fuller we got to note that the data is not stationary then what we'll do we'll

estimate the Train so here what we have done we have taken a log of the index

data set so index data set is nothing but the data set which has index has

time or the data which has been set monthly wised so here we have just taken

a log and let me just run this for you now if you see here numbers on your

y-axis half gene because the scale itself has change here we have taken the

log but here your trends remains the same whereas the value of y has been

changed next let us calculate the moving average with the same window but keep in

mind guys at this time we'll be taking up with the log time series so again

we'll be having windows will show 12 that is nothing but the twelve months

and then we'll be just plotting the graph with a long time series so here

data is already in the Log form so now let me just print it

so here you can conclude that mean is not stationary but it is quite better

than the previous one but again it is not stationary because it's moving with

the time and this train is again an upward train so we can say that the data

is not stationary again next what we'll do we'll get the difference between the

moving average and the actual number of passengers so we have mean and the

actual time series that we have now why are we doing this now the reason is that

unless we perform all this transformation will not get the time

series are stationary so now you must be having a question as to whether it's the

standard way to make a time sea stationary no it's not guys because it

depends on your time series as in how you can make it stationary like

sometimes you have to take log sometimes you might want to take a square of it

some time cube roots so it all depends on data what it holds so here we're

going to log scale so we are going to take MA and then subtract both of them

so here we have the log scale and we have the moving average and then we have

just painted the head of it that is the top 12 values then what we have done we

have just removed them na n values so that is done by just typing drop na and

the brace you can write in place not true and then just print the head of it

so now let me just run this so here we have the month and we have the number of

passenger so here we have the numbers which is basically the difference then

moving ahead I have purposely put an actual code of this a DCF test so a DCF

is augmented dickey-fuller test so above I have just applied a simple a DCF

function but this is the whole code guys so you have to perform this whenever you

have to determine whether time series is stationary or not so here I have defined

a function which is pair stationary and I have performed both the tests I have

determined rolling statistics as well as performed dickey-fuller test so over

here I have used the windows 12 and then I have plot rolling statistics as well I

have performed the dickey-fuller test ezreal so let me just run this and I'll

just land of action as well so now if you see you have the original data as

blue lines then you have standard deviation in black line and you have

rolling mean in red line so here you can visually notice that there is no such

trend or you can say it is much better than what we use to see earlier so here

we have rolling standard deviation and we have rolling me

now let me see that a DCF results as well so here if you notice your p-value

is relatively less in only cases we used to have 0.9 something and where you have

P value at 0.02 now if you notice your critical value and your test statistics

values are almost equal which basically helps you to determine whether your data

is stationary or not so I hope by now you got the idea between the

dickey-fuller test and the rolling statistics text as to how you can

determine whether the data is stationary or not next what I have done I've

calculated the weighted average of time series now why I have done this because

we need to see the trend that is present inside a time series so that is why if

you have calculated the weighted average of time series so now let me just run

this I didn't get to know why I'm talking about this so as you can see

here as the time series is progressive the average is also progressing towards

the higher side so here your trend is upward ants and keeps on increasing with

respect to time moving ahead let's see another transformation where we have a

log scale and then we'll subtract the weighted average from it so in a

previous scenario we have subtracted simple mean but in this will be using

weighted mean and then we'll check for stationarity so here we have just

subtracted them and then pass the variable in the test stationarity

function that we'll just define it over here so over here it will go through

both of the tests and then it will display the results so over here I'll

just run the cell so over here you can notice that your standard deviation is

quite flat it is not moving here and there and in fact you can also say that

this doesn't have any trend also if you notice the rolling mean it is quite

better than the previous one now let me go see the results of a VCF test as well

so over here you have a very list value of P that is P is equal to 0.005 so your

TS is again stationary which means that your time series is again stationary so

here you can use both this transformation to check whether your

data is stationary or not so now we know that a data is stationary now what we'll

do we'll shift the values into time series so that we can use it in the

forecasting so what we have done earlier we have subtracted the value of mean

from the actual value now what we'll do we'll use the function called a shift to

shift all of those values so here let me just run this plot so this is how the

plot looks like now here we have taken a lag of bun so here we have just shift

the values by 1 or you can say difference your time series ones so why

is if you remember I talked about the ARIMA model so a Rhema model has three

models in it that is the AR model which stands for auto regressive then we have

ma model that is for moving average and is for the integration so re model

basically takes three parameters and B there stands for the integration part or

you can say how many times you have differentiated a time series so here

your value becomes one now next what I have done I have simply dropped the NA n

values so here if you just run this code you will see that output is quite flat

so here your null hypothesis or the augmented dickey-fuller test whaling

will take the null hypothesis is rejected and hence we can say that your

time series is stationary now so here you can say that you again have blue as

the original data you have red as you're rolling mean and you have black as your

standard deviation so visually also we see that there is no train presence

quite flat so here we can say that your time series is stationary now let us see

the components of time series so here you first need to import from stats

model to TSA or seasonal input seasonal decompose so here your seasonal

decomposed segregates three components that is train seasonal and residual so

here what we have done we have simply plotted these graphs and let us see how

all these graphs looks like let me just run this

so this is how your output look like this is my original data which we saw

that there was a trend so this is my trend line so this is going upward in

which you can say it's quite linear in nature along with that we have

seasonality also present in high scale so we have a seasonality graph over here

and then we have the residuals as well so residuals are nothing guys the

irregularities that is present in your data so they do not have any shape any

size and it cannot find out what is going to happen next so it's quite

regular in nature now what we are going to do we'll check the noise if it's

stationary or not so overlay we take the residual and we'll save it in a variable

that is decomposed log data and again I just pass it to the same function that

we have just created above which is test stage theory and inside the stay

stationary function we have to test that is rolling statistics and a DCF test so

now let me just run this cell and this is how your graph looks like so looking

at the output visually you can say that this is not stationary that is why we

have to have your moving average parameter in place so that it's smooth

and set out to predict what will happen next

now we know the value of D but how can you know the value of P and Q that is

the value of autoregressive lags and the value of moving average so here as I

told you guys we need to plot a CF graph and P ACF graph so in order to calculate

the values of p we need to plot PS EF graph and in order to calculate the

value of Q we need to calculate a CF graph so is here basically refers to a

autocorrelation graph and a PS you have stands for partial autocorrelation graph

so in Python we first need to import these two graphs that is from stats

model tortillas a dot stat tools input ACF and P ACF then using this function

ACF and PS EF Piazza's pass in a data set and we have preferred a method that

is OLS so there are various methods but we usually prefer OLS so where is his

ordinary least square method then what we have done we have simply plot a CF

graph and we have plotted the PS EF Roth so now let me just run this and let's

determine how you can calculate p-value and Q value so guys this is my

autocorrelation graph and this is my partial autocorrelation graph now in

order to calculate the P and Q values you need to check that what is the value

where the graph cuts off you or you can set drops to zero for the

first time so if you look closely you have it touches the confidence level

over here so here if you see your p-values almost around two and similarly

if you look at this graph you see that it cuts it over here or drops to zero

over here and then the value of Q also becomes two so this is how you can

calculate the value of P and Q using PS here graph and a CF graph next we have

the value of P if you have the value of Q and we have the value of D so what we

can do we can simply substitute these values in the Rhema model so here what I

have done I first imported the model ARIMA and then using the function edema

I have the order listed over here so I have P value as – I have differenced it

1 so my D value becomes 1 and my Q value is again 2 so here I have just plotted

the graph and then calculate the RSS which is the residual sum of squares so

here let me just run this graph so here you can see the residual sum of

square is quite good that is one point zero two so here you have plotted the

values of P Q and D as two two and one now you can also play around with these

P and Q parameters now let's say I want to change the parameters to two one zero

so if I do that let me just run this again so here if you see once I have

just changed the value to two one zero my RSS score has been increased so

greater than our essence the bad it is for you now let me again change it to

zero one two now in that case also my RSS has been increased to one point four

so here you need to take care of the RSS part so the greater the RSS the bad it

is for you so here we'll just revert back to 2 1 2 wherein we have the value

of P as 2 Q as 2 and we have taken only one difference so the value would be 1

now let's take the moving average model in consideration so here a p value is 0

now for our model you have to do 2 1 0 next for a our model what you can do you

have to do 2 1 0 wherein you have the value of Q as 0 so here I have 2 1 0 and

let me just run it for you so here you can see that if RSS has

again reached 1.5 now we have seen that with respect to a R that is your auto

regressive part your RSS is 1.5 now affair again go to ma wherein I have the

values as 0 1 2 the RSS score is 1.4 so here we conclude that with respect to

auto regressive part we have the RSS as 1.5 with respect to moving average we

have the RSS is 1.4 and if we combine both of them and make a rim out of it

that is this part that is 2 1 2 we have very less RSS so let me just run this as

well so here when I substitute the values as 2 1 2 that is P and Q value is

equal to 2 and D we have taken as 1 so here your Rima model gives you RSS of 1

point 0 2 which is quite good next what we'll do let's fit them in a combined

model that is ARIMA so here we have seen that with respect to a R we have RSS is

1.5 with respect to ma that is moving average we have RSS as 1.4 and when we

apply the combined model that is ARIMA the RSS or you can say the residual sum

of square is dropped to 1 point so here let's do some fitting on the

time series on what data we have so here we have just converted the fitted values

into a series format and then we have just printed the head of it so now let

me just run this so over here we have the month as well as the predictions

over here next what we'll do we'll find it the cumulative sum and then we'll

find them and then we're going to have the predictions done for the fitted

values so now – Cal – the cumulative sum we have the function called has come sum

and then again we have just printed the head so this is my result and finally

we're going to have the predictions done for the fitted values and then we have

just printed the head of it so now let me just run this next few also keep in

mind that after performing these transformations we also need the

exponential of the whole data so that it comes back to the original form from

where we have just started using it so in order to know the values in that form

you need to take the exponent of it so these are the three steps which are very

important for data transformation so you'll be finding cumulative sum we'll

do the predictions and we'll and we'll also calculate the exponent of it so as

to get your data in your original format now after that we just plot the actual

values to how our model has fitted so now let me just run this so you can see

that the orange line is basically the model that we have fitted and here you

can see at only the magnitude is varying whereas the shape has been properly

captured by the Rema model now how we can do predictions guys now there is a

function in Python that is predict now before predicting the values let me

first see is my data that how many rows are there in Benares a so this is my

data set name so now let me just run this so here we have the data set from

1949 we have the number of passengers it will go on to 1960 and we have 144 rows

into one column so here we got to know that we have 144 rows so what if I want

to predict it for next ten years so what will be my prediction now here you have

to see that how many number of data points would you want so let's see if

you want to grid it for ten years so the number of data points would be 120 that

is 12 into pen so here if you want to predict it for 10 years you have 120 so

using that plot dot predict function I can actually predict the future so here

using this function I'll give the first index of the time sees and then the

number of data points you want the time series flow so

I have 144 rows plus 120 because I wanted for 10 years so 144 plus 120 is

equal to 264 so I'll write it over here now let me just comment this for now and

let me just run it so over here if you can see it my blue

is the forecasted value and this gray part is your confidence level so now

whatever happens or however you do the forecasting this value will not exceed

the confidence level so this is how you can see that for the next ten years you

have the prediction somewhat like this so this is how you can do prediction and

if you don't want to see the graph you can actually write in the data point so

here I want the prediction for ten years so I have just type in the steps that is

equals to 120 and you get the result in an array format so that is how you can

perform a lot of operations with this data and predict it for let's say six

months 12 months next year 10 years and it's totally up to you guys whatever

topics that I've covered I hope these are clear to you so now let me just go

back to a presentation and let's see what all we are left with so here we

have just build a model wearing we have forecasted the demand for the next 10

years so in a data set we have the date in the monthly basis and we have the

number of passengers so that's all for today guys now let me just recap what

all we have covered till now so we have started off by discussing what exactly's

time series and we've also gone through the various components that are trend

seasonality cyclic and irregularity then we have understood what is stationarity

and one of the different tests to check the stationerity of the data then we

discussed one of the best models which is used in the time series analysis that

is the ARIMA model so here we have understood that ie my model is a

combination of three models that is the AI model which stands for auto

regression we have MA for moving average and i's for the integration part and

then we have implemented all these things and we have forecasted the data

for the next ten years so I hope you guys are clear with whatever concepts

that have taught in the session so do you guys have any questions or any

doubts with respect to any other topics that I have discussed till now all right

so I don't see any doubts over here all right no problem guys this takes time so

just go home just practice just go through the code again practice as much

as you can and in case you have any doubt or any error you can always come

back to me or you can simply ask me in my next session I hope you guys found

the session informative well thank you so much bye-bye

I hope you have enjoyed listening to this video please be kind enough to like

it and you can comment any of your doubts and queries and

we will reply them at the earliest do look out for more videos in our playlist

and subscribe to Eddie Rica channel to learn more happy learning

# Time Series Analysis in Python | Time Series Forecasting | Data Science with Python | Edureka

9 months ago
No Comments