Hi
I'm using SQL Server 2005. The problem I have is as follows. I have several production lines and as with everything parts in the line tend to break. I have data from all the breaks that occurred in the last 2 years. What I want to do is predict the next break and the production line it's going to happen on. I would also like to go to a future date and check what possible breaks might occur on that date. I've run quite a few models but none of them helps me with future events. I think I might be using the wrong algorithm or I’m just not doing it right. If somebody can please suggest an algorithm and maybe help me with a web site that has a tutorial similar to my problem
Thanks
Elmo
This sounds like an interesting topic. Would you please provide us more information: like how are you current current data structured, what are you trying to predict, what algorithms have you tried, and why the results are not good enough, etc. It will also be helpful if you could provide a small sample of your data.
Thanks,
|||
Hi
Thanks for the reply. I've mailed an Excel spreadsheet with some data on it to yiminwu@.online.microsoft.com. The things I would like to predict as follows: Plant, Circuit, Start Date and End Date. I would also like to be able to go to a future date and see what possible breaks will happen on that day. The aim is to do preventative maintenance.
I’ve tried the following algorithms: Decision Trees & Neural Networks. The problem I have with these is that it predicts that if something breaks on say Circuit 1 it is in Plant B. This is quite obvious. Thing is I need a way to look at future dates. I also tried the Sequence Clustering algorithm to see if could predict the next break but it detected no sequence.
Elmo
|||Thanks a lot for your information. You need to transform your data a little for data mining algorithms to do meaningful predictions on your problem. Instead of using the plain date, you may transform it into various ranges like this:
Date interval (from start date to end date) Date interval value for mining model
0-6 monthBrand New
6-12 month Used for a while
12-24 monthSort of Worn
>2YearsWorn
The above is just an example. You should decide the best mapping based on your domain knowledge. The structure of your training data will then look like the following:
PlantIDKey Long
CircuitIDKey Long
Device IntervalText Discrete Input
Break Boolean Discrete Predict
And your data will look like this:
PlantIDCircuitIDDevice IntervalBreak
11BrandNewNo
……
11WornYes
In the above, I assume that PlantID and CircuitID comprise the composite key of your product lines. You can then train Microsoft Decision Tree or Microsoft Neural Network to predict whether your product lines will break.
Moreover, you may also bring in more information that affects the status of your product lines, such as: humidity and temperature, etc. As a reminder, you want to identify the features that: 1) contributed the failure of your product lines; 2) were different on various breaks. This basically requires you to use your domain knowledge to identify the key factors of product line breaks.
BTW, if possible, please post your sample data directly in your post in the future.
Good luck,
|||As Yimin says, this is not an algorithm selection issue, rather a data preparation issue. If you want to predict if a break will happen in the next day, for instance you need to create a variable that says "a break happened in this day" and variables describing the time period leading up to the break, e.g. the previous day, previous week, etc.
For example
Next Day Break (predict)
Today's Volume
7 day average volume
30 day average volume
days since last break
# of breaks in last month
# of breaks in last year
personnel info
product info
etc.
HTH
-Jamie
|||Hi
Sorry for only replying now but I was not at work for the most of last week.
Thank you both for the replies I’m sure I’ll be able to sort it out now. If I can’t I know where to find you.
Thanks again
Elmo
No comments:
Post a Comment