Questions tagged [lucene]

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

14 questions
16
votes
1 answer

How is machine learning incorporated into search engine design?

I am currently building a small in-house search engine based on Apache Lucene. Its purpose is simple - based on some keywords, it will suggest some articles written internally within our company. I am using a fairly standard TF-IDF scoring as a base…
8
votes
1 answer

Good technique for search text tokenization

We are looking for a way to tokenize some text in the same or similar way as a search engine would do it. The reason we are doing this is so that we can run some statistical analysis on the tokens. The language we are using is python, so would…
Chris Dutrow
  • 463
  • 1
  • 4
  • 9
7
votes
1 answer

How important is index size when searching?

My company has recently began using Apache Solr to search its data. As we learn how to use it we have gone down the path of indexing multiple fields to get the results we need. Most of these are either N-Grammed or Edge-N-Grammed (N-grammed, but…
Michael K
  • 15,539
  • 9
  • 61
  • 93
4
votes
0 answers

Incorporating custom Algorithm in SOLR-LUCENE , before Indexing?

CURRENT FLOW: I am using a custom algorithm(presently in php) to rank the MYSQL records before INDEXING it to SOLR . WHAT I WANT : Is it possible implementing this ALGORITHM(may be in JAVA) inside LUCENE library.Is it a good way ? WHY I WANT TO DO…
Dimag Kharab
  • 149
  • 5
3
votes
2 answers

Text search - big data problem

I have a problem I was hoping I could get some advice on! I have a LOT of text as input (about 20GB worth, not MASSIVE but big enough). This is just free text, unstructured. I have a 'category list'. I want to process the text, and cross-reference…
Duncan
  • 131
  • 4
3
votes
3 answers

NLP - Queries using semantic wildcards in full text searching, maybe with Lucene?

Let's say I have a big corpus (for example in english or an arbitrary language), and I want to perform some semantic search on it. For example I have the query: "Be careful: [art] armada of [sg] is coming to [do sg]!" And the corpus contains the…
2
votes
2 answers

Lucene + Joins == RDBMS?

Now that Lucene supports joins (at indexing time and at querying time) can one use Lucene as a databse (a NoSQL one, with Eventual Consistency)? Note: I was pondering on that for sometime and this is an idea that comes around again and again from…
Kaveh Shahbazian
  • 332
  • 1
  • 13
1
vote
2 answers

Creating a web application with full text search on dynamic data

Even after thorough requirements engineering we end up with users wanting to attach 'notes' to their otherwise well-structured data records, in other words: arbitrary key-value pairs. Their primary interest is to find records later based on this…
A.M.
  • 111
  • 2
1
vote
1 answer

Is lucene.net/solrnet a good solution for searching a list of names with fuzzy matching?

At the moment, we're using sql server full text search, but it's too inflexible. The main thing we do is look up names of people from a database based on a search query. The searches need to be fast, and they need to be fuzzy. SQL Full Text Search…
NibblyPig
  • 2,995
  • 3
  • 16
  • 19
0
votes
1 answer

Problems with evaluating results of search engine by comparison

We're building a search engine at a client's place. To evaluate the results, the client is comparing top N results of our search engine to top N results of a competitor. And they want me to get at least some "X" percent common results with…
0
votes
3 answers

Strategy to update search index after fixing index generation

Describing the situation I'm working on an application (based on the Spring Framework) using a search index (lucene if that matters) to make content of that application searchable. Documents are added/updated in that index whenever the content of…
lucash
  • 288
  • 2
  • 6
0
votes
0 answers

Using Google for full text database search

I have a lot of text and I am storing it in Elastic search. Using Lucene, NLP and Wordnet filters the search is good but not as good as Google's because none of these methods use AI for the search so that it can understand questions or some of the…
arisalexis
  • 409
  • 3
  • 10
0
votes
1 answer

Lucene vs Solr - Indexing pdf/word documents reisiding on a NAS drive using .Net

Using ASP.Net, I want to implement full text search using Lucene/Solr on a LARGE number of docs (word, pdf etc.) residing in a directory on a NAS drive. The NAS drive would be mapped as a network drive on the server. The list of documents get…
0
votes
1 answer

Lucene full text search of 6 million records

I want to implement Lucene for full text search. I have a table with 6 million records in an sql database. Each minute around thousand new rows will be added from the application. Index creation in Lucene takes a lot of time. Each time I delete or…