Questions tagged [text-processing]

55 questions
24
votes
4 answers

How can I extract words from a sentence and determine what part of speech each is?

I want to write something that takes a sentence and identifies each word it contains and defines what part of speech each word is. For example Hello World, I am a sentence would return this verb noun, pronoun verb adjective noun Ideally, I'd…
Vinny
  • 259
  • 1
  • 2
  • 5
9
votes
7 answers

Algorithm for determining transactions among weekly data series?

I'm trying to develop a small reporting tool (with sqlite backend). I can best describe this tool as a "transaction" ledger. What I'm trying to do is keep track of "transactions" from weekly data extract: "new" (or add) - resource is new to my app…
Swartz
  • 141
  • 3
9
votes
4 answers

How should I implement a command processing application?

I want to make a simple, proof-of-concept application (REPL) that takes a number and then processes commands on that number. Example: I start with 1. Then I write "add 2", it gives me 3. Then I write "multiply 7", it gives me 21. Then I want to…
Nini Michaels
  • 283
  • 1
  • 7
8
votes
2 answers

What is best practice to handle whitespaces when letting the user edit the configuration, the name=value pairs?

For instance, you let the user define the notorious path variable. How do you interpret apppath = C:\Program Files\App? This looks like a programming language adopted practice to ignore the white spaces and you leave them around the equality mark…
6
votes
2 answers

How does the Arabic typographic layout system work at a high level?

I have some Arabic content that is justified according to western conventions. I justified it because it is justified in ancient sources: However, the way Arabic text justification works is by stretching the cursive words, instead of the…
Lance
  • 2,537
  • 15
  • 34
6
votes
1 answer

Finding occurrences of a useful words and phrases in strings

I am building an app that analyzes posts by people by pulling their Tweets and Facebook posts. I need to process all the posts and find useful phrases. What I mean by useful is that, any word or phrase that is a noun/adjective/verb that would…
6
votes
1 answer

Tools for modelling data and workflows using structured text files

Consider a case when I want to try some idea of an application. But I want to avoid investing a lot of effort in coding UI/work flows/database schema etc before I see that it's going to be useful to me (as example of potential user). My idea is stay…
Alexey
  • 1,199
  • 1
  • 9
  • 18
5
votes
1 answer

How advanced are author-recognition methods?

From a written text by an author if a computer program analyses the text, how much can a computer program tell today about the author of some (long enough to be statistically significant) texts? Can the computer program even tell with "certainty"…
4
votes
2 answers

Database structure for word co-occurrence frequencies in a large corpus

I would like to store the frequencies with which words co-occur with each other over a variety of contexts in a large (> 1 billion tokens) text corpus. I need to store the word pair, the type of co-occurrence (e.g. word1 in the same sentence as…
4
votes
1 answer

Sentence Tree vs. Words List

I was recently tasked with building a Name Entity Recognizer as part of a project. The objective was to parse a given sentence and come up with all the possible combinations of the entities. One approach that was suggested was to keep a lookup table…
4
votes
1 answer

How to process an endless XML data stream

There is an endless data stream of XML messages (and "heartbeats"), that I receive via a telnet connection and through a site-to-site VPN IPsec tunnel. I'm still pondering. What is the best/most elegant solution to process the XML messages without…
derphil
  • 859
  • 1
  • 8
  • 9
4
votes
1 answer

Text comparison algorithm using java-diff-utils

One of the features in our project is to implement a comparison algorithm between two versions of text and provide a % change between the two versions. While I was researching, I came across google java-diff-utils project. Has anyone used this for…
java_mouse
  • 2,627
  • 15
  • 23
3
votes
1 answer

Integrating TeX into a Java desktop application

Looking to integrate TeX equations in a TeX-agnostic fashion, suitable for either ConTeXt or LaTeX, into a Java-based desktop Markdown editor. The possibilities are numerous, but I'm not sure what approach to take. JMathTex outputs to MathML, which…
Dave Jarvis
  • 743
  • 6
  • 28
3
votes
4 answers

Windows compatibility with Unix/Linux newline "\n"

A follow-up to Difference between '\n' and '\r\n'. It's been few decades since the schism was introduced. Nowadays, when documents are being exchanged over the internet, typically with no prior knowledge of the client's preference of line endings,…
Ondra Žižka
  • 267
  • 3
  • 6
3
votes
1 answer

What method for storing a text file in memory (c not c++) would allow me to open any format(UTF-8, Binary, etc) and a file of any size?

My first thought here is to use a dynamic array, but I am looking for something better. Currently I have the text files open into "chunks". Every word or group of spaces makes up a "chunk". Then I have a line number in this structure and a chunk…
Joe
  • 339
  • 4
  • 14
1
2 3 4