A good introductory book for managers and business analysts is:
Bigus, J.P. (1996), Data Mining with Neural Networks: Solving Business
Problems--from Application Development to Decision Support, NY:
McGraw-Hill.
For engineers and technically-minded people we’d recommend
to start with: Fausett, L. (1994), Fundamentals of Neural Networks:
Architectures, Algorithms, and Applications, Englewood Cliffs, NJ:
Prentice Hall.
For financial specialists, bankers and traders we recommend starting
with: E. Michael Azoff (1994). Neural Network Time Series: Forecasting
of Financial Markets NY: John Wiley and Sons, Inc.
How could I improve things to get better
forecasting?
You have two ways to improve results:
1) improve you input data (for more information please read Preparing
Data Sets section in Advanced Issues chapter)
2) improve network topology selection and network training (for
more information please read Selecting Network Topology and Training
Network sections in Advanced Issues chapter). When neural networks are a bad choice for my forecasting?
Neural networks cannot create or digest the information that is
not contained in your data. To properly train a neural network you
need to have a lot of data. You data should contain input parameters
(signals, attributes, correlated values) that affect the target
value. Change of input parameters should lead to change of target
one.
So, if you have small amount of historical data or if you do not
know, which parameters influence your target value, better use some
other forecasting method.
In addition, there exist some problems that in principle cannot
be solved by neural networks. Do not use neural networks (as well
as other numerical methods) for problems like:
predicting random or pseudo-random numbers, like lottery
numbers
forecasting cash flow, volumes of sales, etc. if your business
isn’t stable and your market situation often changes dramatically.
any problem where historical data have no use due to unbiased,
rapid and significant changes in the problem environment. top
Data Analysis and Preprocessing
How much historical data do I need?
You definitely need to have more records in the training subset
than the total number of input columns.
The number of records needed for training depends on the complexity
of your problem and amount of noise in your data. There are no exact
rules. Typically, it’s recommended to have at least 10 times
as many records for training as input columns.
This may not be enough for problems with subtle and complex dependencies
in data. Try to add more data if your network has poor results.
Why some columns are grayed after Data
Analysis and cannot be selected as targets?
The grayed columns cannot be converted for the use with neural networks.
These are typically text columns, data/time columns, or columns
that have a lot of misplaced or missing data.
You may control the process of column accepting/ignoring in Expert
Mode:
The handling of missing and misplaced values can be specified
at Data Analysis step.
columns identified as containing text can be considered
as categorical by ordering Alyuda Forecaster to accept them.
The date/time columns may be used only after specifying
the required periodicity of their encoding.
What is a categorical column?
Each value of a categorical column represents a certain category.
For example, categorical is a column that contains only “Male”
or “Female” as its values. Typically, the number of different
values in a categorical column is much less than the number of records.
Categorical data should be encoded in a special way to be suitable
for a neural network.
You may manually mark a column as categorical in Expert Mode (using
Details button at Data Analysis Progress step). This feature may
be beneficial for some cases. For example, your data has a column
“Model” that has values “1”, “2”,
“3”. By default, this column will be considered as a
numeric, but it will be more beneficial to encode it as a categorical
one. How can I see which records and columns were removed
from analysis?
During the Data Analysis step click the “Details” button
and you will see your data with grayed columns and rows. All colored
cells will be removed from further use. In the Details window you
may also see a reason of removing a record. The cells containing
missing, misplaced data or outliers are painted with different colors.
You can control this process in Expert Mode. In this mode you can
set your preferences for data analysis. What is your algorithm of removing misplaced data?
If all data in one of your columns contain numbers with the exception
of several values, Wizard will identify this column as numeric.
These several values will be identified as misplaced and records
containing them will be removed. The same is true for other types
of columns.
The main question is this algorithm is “How many these “several”
can be?” If you suspect that your data may have misplaced
values, you need to give the Wizard a clue of how much misplaced
values can be in your noisiest column. You can do it during Data
Analysis step in Expert Mode.
There is no misplaced data handling in Standard Mode. All columns
are considered to be free of misplaced values, and if a numeric column
contains at least one text value, it will be considered a text one.
What is network training?
Network training means adjusting neural network weights. During
training the network analyzes the data you have provided and changes
weights between network units to reflect dependencies found in your
data. What is the best training algorithm for my problem?
If your data have up to 10 input columns, the best training algorithm
will be Levenberg-Marquardt. It is fast and quite reliable.
If you have a data set with hundreds of thousands of records and
more, we recommend trying Incremental Back Propagation first.
For all other cases it fully depends on your type of problem and
dependencies inside your data. We recommend to start with Conjugate
Gradient Descent and then try Quick Propagation and as the last
step Batch Back Propagation or Incremental Back Propagation. Why the absolute error became disabled during the
Network Preparation step?
When your target column is not numeric, it is hard to define unambiguously
what the absolute error is. For such cases it is better to use only
relative errors, which is enough to completely control the training
process.
In Expert Mode you may use CCR (Correct Classification Rate) instead
of error threshold definition. What is “minimum improvement in error”?
Minimum improvement in error specifies the minimum error change
during each iteration (or during several last iterations). This
parameter is useful for detection of situations where the network
cannot further improve its performance and training should be stopped
to save time.
Although one should be careful with this parameter because in certain
cases the error can be decreased after a lot of “motionless”
iterations. It’s impossible to automatically detect such cases.
We recommend to set 10 iterations, which is enough for most of of
problems. For certainty you can set up to 100 iterations.
How much time is required for network selection?
The time required for network selection depends on the number of
inputs, amount of data, complexity of the task and capability of
your computer. The network selection can last from several seconds
to several hours. How could I speed-up network selection?
The first way is to select the “Rough search” method,
which is the quickest one but does not guarantee the best results.
The second way is to specify the minimum and maximum number of hidden
units your problem may require (Expert Mode only). This way requires
some experience in neural networks and at least approximate estimation
of problem complexity. How much hidden layers and units do I need?
In our experience, the majority of problems (ca. 80%) have a good
solution with 1 hidden layer, another part (ca. 20%) has a good
solution with 2 layers, and only 1-2% of problems need 3 layers
or more. More than two hidden layers are typically beneficial only
for special problems, such as ZIP code recognition.
If you have a small number of hidden units you will get a big error
during forecasting, because there is not enough power to find and
encode dependencies of your data. If you have a big number of hidden
units neural network tends to memorize your data rather than encode
dependencies and this will also lead to a big error during forecasting.
For majority of problems, there is only one way to find the best
number of hidden units: train several networks with different number
of hidden units and find the best network by comparing forecasting
errors on testing subset.
Alyuda Forecaster uses several proprietary algorithms of searching
for the best number of hidden units. These algorithms, in out point
of view, strike the best balance between the need to reduce the
search time and to find the best variant.
To search among all variants you may start exhaustive search, but
be prepared to wait a long time. How much time is required for training?
The time required for network training depends on the number of
inputs, number of hidden units, amount of data, complexity of the
task and capability of your computer. Complete network training
can continue from several seconds to several hours. Can I change the network parameters after training?
Yes, you can press the “Back” button and change network
parameters, but you will need to train your network again. The previous
network will be lost unless you saved it in a file.
How could I forecast several values at
once without entering them manually?
Alyuda Forecaster doesn’t have this feature. How could I change report format?
During the Reporting step press the “Show Report” button.
You will see report preview. Click “Save As…”
in the “File” menu and select desired format in the
“Save as type” dropdown list. top