Questions tagged [natural-language-processing]

Natural language processing draws knowledge from a diverse collection of fields including computer science, linguistics, and statistics in order to extract pertinent information from the spoken or written word.

Most modern natural language processing requires the use of statistics and machine learning to determine the characteristics. Features such as sentences and words can be parsed, along with derivation of grammar trees. Topics and entities can be discerned from the text. Input in the form of natural language can be transformed for output or used as input to another stage of algorithms.

Libraries for NLP include:

Books containing fundamental information:

44 questions

144

votes

14 answers

Simple method for reliably detecting code in text?

GMail has this feature where it will warn you if you try to send an email that it thinks might have an attachment. Because GMail detected the string see the attached in the email, but no actual attachment, it warns me with an OK / Cancel dialog…

asked Jun 28 '11 at 08:04

Jeff Atwood

6,757
10
45
49

votes

2 answers

How to find hard to misspell given names?

Here is a question that I believe could be solved with some data mining and a sophisticated algorithm, but I don't quite know how. Any pointers as to what data sources to use and what algorithm to apply are welcome. Background: I'm a…

algorithms artificial-intelligence natural-language-processing data-mining

asked Dec 08 '15 at 22:29

user1202136

votes

2 answers

Persisting natural language processing parsed data

I've recently started experimenting with natural language processing (NLP) using Stanford's CoreNLP, and I'm wondering what are some of the standard ways to store NLP parsed data for something like a text mining application? One way I thought might…

database parsing persistence natural-language-processing

asked Oct 17 '12 at 20:59

user25791

votes

6 answers

How to teach a script to detect sarcasm?

I'm currently building a fun script, that basically matches given phrases and gives a predefined response based on the match-points. You can ask it to retrieve some information based on live feeds, run tasks, tell anecdotes or just chat with her. I…

algorithms natural-language-processing

asked Sep 11 '11 at 23:00

Kalle H. Väravas

votes

3 answers

What algorithm(s) can be used to achieve reasonably good next word prediction?

What is a good way of implementing "next-word prediction"? For example, the user types "I am" and the system suggests "a" and "not" (or possibly others) as the next word. I am aware of a method that uses Markov Chains and some training…

algorithms artificial-intelligence machine-learning natural-language-processing

asked May 12 '12 at 14:49

yati sagade

2,089
2
19
27

votes

2 answers

How do personal assistants typically generate sentences?

This is sort of a follow up to this question about NLG research directions in the linguistics field. How do personal assistant tools such as Siri, Google Now, or Cortana perform Natural Language Generation (NLG)? Specifically, the sentence text…

algorithms data-structures natural-language-processing

asked Jan 18 '15 at 18:44

Lance

2,537
15
34

votes

2 answers

Guess if a time is AM or PM

I'm currently in the process of writing a human date parser. By human date, I mean it should be able to interpret strings as "tomorrow at 2" and return a valid date depending on the current time. The issue I'm facing is the automatic detection of…

language-agnostic natural-language-processing

asked Sep 13 '11 at 18:10

Vivien Barousse

votes

1 answer

Identifying plagiarized jokes?

I'd like to be able to identify duplicate jokes posted on a website. I can build up a reasonably large database of previously-posted jokes, and then I'd like to look at each new joke as it comes in and pick out the most "similar" jokes from the…

algorithms natural-language-processing

asked Dec 06 '15 at 03:33

Patrick Collins

2,165
18
24

votes

6 answers

Are there any algorithms for splitting or combining words into their more common form?

Are there any existing algorithms which can look through a list of words and split or combine words into their more common form? For example, I have a list of many business names in the health care industry. The word "healthcare" is often written…

natural-language-processing

asked Mar 19 '14 at 23:01

Buttons840

1,856
1
18
28

votes

2 answers

Database structure for word co-occurrence frequencies in a large corpus

I would like to store the frequencies with which words co-occur with each other over a variety of contexts in a large (> 1 billion tokens) text corpus. I need to store the word pair, the type of co-occurrence (e.g. word1 in the same sentence as…

architecture database text-processing natural-language-processing

asked May 20 '19 at 13:36

pgtn

votes

1 answer

How can I test a search engine for an uncommon human language?

We are writing a search engine from scratch in a quite uncommon language, Aramaic, mostly for learning purposes but also because few resources are available in given language. The engine is/will be written in Python, and: It is a human language…

testing search search-engine natural-language-processing

asked Jul 11 '14 at 07:51

vallllll

votes

1 answer

Sentence Tree vs. Words List

I was recently tasked with building a Name Entity Recognizer as part of a project. The objective was to parse a given sentence and come up with all the possible combinations of the entities. One approach that was suggested was to keep a lookup table…

parsing text-processing natural-language-processing

asked Dec 21 '13 at 10:04

Rohit Jose

votes

1 answer

Designing the schema for a database of Spanish language words?

For a project I'm working on that will help people learn Spanish, I would like to create a standalone service to handle the retrieval of data about words. For this, I've captured and codified data for a few thousand words from Wiktionary. …

database-design data-structures natural-language-processing

asked Sep 01 '19 at 04:26

Al Avery

votes

0 answers

Software design strategy for a machine learning tool that outputs a subset of the text input (Information Extraction)?

Let's say I have thousands of pdfs that are each about 30k words written in conversational English. In each of the pdfs there is a name / names of a person/people who snowboard. There are also many other names. I need to extract the name(s) of the…

python machine-learning python-3.x natural-language-processing neural-networks

asked Feb 01 '18 at 18:05

Hiding

votes

1 answer

What approaches can I take to figure out the "relevancy" of certain terms in a string?

I'm not even sure "relevancy" is the most accurate word, so I'll just describe the problem: I'm building an app that needs to somehow parse product descriptions from a popular website (let's just say it's Amazon) and figure out which certifications…

parsing natural-language-processing keywords

asked Jan 14 '18 at 08:32

Benjewman

2 3 Next