lucene - Optimization of solr indexing by removing redundancy -

- July 15, 2014

i'm working on production scenario, having less data, going in millions.
scenario: i'm having folder contains multiple students' data (student_id, rol etc).

now, 1 student's data can in different folders (yes our requirement). @ current system, details of student indexed under each folder. data less, duplicacy doesn't create problem right now. but, if continue @ same process, same student's data indexed multiple times (depends on number of folders containing student data), thereby increasing redundancy , index size.

want minify index size , dont want data redundancy.please provide simpler solution achieving task in solr.

as long have uniquekey field defined, document same key previous document overwrite existing document, , you'll avoid having duplicates in index.

if don't have unique value identify students, you're going have hard time merging (outside of solr well), , might have write custom code merge entries appropriately outside of solr.

Search This Blog

ITEMscalal

lucene - Optimization of solr indexing by removing redundancy -

Comments

Post a Comment

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

python - RuntimeWarning: PyOS_InputHook is not available for interactive use of PyGTK -

unity3d - In a Unity canvas a button and an image hide each other even though they don't overlap -