lucene - Optimization of solr indexing by removing redundancy -


i'm working on production scenario, having less data, going in millions.
scenario: i'm having folder contains multiple students' data (student_id, rol etc).

now, 1 student's data can in different folders (yes our requirement). @ current system, details of student indexed under each folder. data less, duplicacy doesn't create problem right now. but, if continue @ same process, same student's data indexed multiple times (depends on number of folders containing student data), thereby increasing redundancy , index size.

want minify index size , dont want data redundancy.please provide simpler solution achieving task in solr.

as long have uniquekey field defined, document same key previous document overwrite existing document, , you'll avoid having duplicates in index.

if don't have unique value identify students, you're going have hard time merging (outside of solr well), , might have write custom code merge entries appropriately outside of solr.


Comments

Popular posts from this blog

java - Date formats difference between yyyy-MM-dd'T'HH:mm:ss and yyyy-MM-dd'T'HH:mm:ssXXX -

c# - Get rid of xmlns attribute when adding node to existing xml -