lucene - Optimization of solr indexing by removing redundancy -
i'm working on production scenario, having less data, going in millions.
scenario: i'm having folder contains multiple students' data (student_id, rol etc).
now, 1 student's data can in different folders (yes our requirement). @ current system, details of student indexed under each folder. data less, duplicacy doesn't create problem right now. but, if continue @ same process, same student's data indexed multiple times (depends on number of folders containing student data), thereby increasing redundancy , index size.
want minify index size , dont want data redundancy.please provide simpler solution achieving task in solr.
as long have uniquekey field defined, document same key previous document overwrite existing document, , you'll avoid having duplicates in index.
if don't have unique value identify students, you're going have hard time merging (outside of solr well), , might have write custom code merge entries appropriately outside of solr.
Comments
Post a Comment