three layer neural network for MNIST with Python -

- August 15, 2013

i'm writing own code implement single-hidden-layer neural network , test model on mnist dataset. got wired result(nll unacceptably high) though checked code on 2 days without finding what's went wrong.

here're global parameters:

layers = np.array([784, 300, 10]) learningrate = 0.01 momentum = 0.01 batch_size = 10000 num_of_batch = len(train_label)/batch_size nepoch = 30

softmax function definition:

def softmax(x):     x = np.exp(x)     x_sum = np.sum(x,axis=1) #shape = (nsamples,)     row_idx in range(len(x)):         x[row_idx,:] /= x_sum[row_idx]     return x

sigmoid function definition:

def f(x):     return 1.0/(1+np.exp(-x))

initialize w , b

k = np.vectorize(math.sqrt)(layers[0:-2]*layers[1:]) w1 = np.random.uniform(-0.5, 0.5, layers[0:2][::-1]) b1 = np.random.uniform(-0.5, 0.5, (1,layers[1])) w2 = np.random.uniform(-0.5, 0.5, layers[1:3][::-1]) b2 = np.random.uniform(-0.5, 0.5, (1,layers[2]))

and following core part each mini-batch:

for idx in range(num_of_batch):      # forward_vectorized     x = train_set[idx*batch_size:(idx+1)*batch_size,:]     y = y[idx*batch_size:(idx+1)*batch_size,:]      a1 = x     a2 = f(np.dot(np.insert(a1,0,1,axis=1),np.insert(w1,0,b1,axis=1).t))     a3  = softmax(np.dot(np.insert(a2,0,1,axis=1),np.insert(w2,0,b2,axis=1).t))      # compute delta     d3 = a3-y     d2 = np.dot(d3,w2)*a2*(1.0-a2)      # compute grad     d2 = np.dot(d3.t,a2)     d1 = np.dot(d2.t,a1)      # update_parameters     w1 = w1 - learningrate*(d1/batch_size + momentum*w1)     b1 = b1 - learningrate*(np.sum(d2,axis=0)/batch_size)     w2 = w2 - learningrate*(d2/batch_size+ momentum*w2)     b2 = b2 - learningrate*(np.sum(d3,axis=0)/batch_size)      e = -np.sum(y*np.log(a3))/batch_size     err.append(e)

after 1 epoch(50,000 samples), got following sequence of e, seems large:

out[1]:     10000/50000     4.033538     20000/50000     3.924567     30000/50000     3.761105     40000/50000     3.632708     50000/50000     3.549212

i think back_prop code should correct , couldn't find what's going wrong. has tortured me on 2 days.

Search This Blog

ITEMscalal

three layer neural network for MNIST with Python -

Comments

Post a Comment

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

unity3d - In a Unity canvas a button and an image hide each other even though they don't overlap -

c# - Get rid of xmlns attribute when adding node to existing xml -