Questions tagged [data-mining]

Data mining is the process of locating patterns and trends within a given data-set, that may not be immediately obvious or intuitive.

27 questions
17
votes
2 answers

How to find hard to misspell given names?

Here is a question that I believe could be solved with some data mining and a sophisticated algorithm, but I don't quite know how. Any pointers as to what data sources to use and what algorithm to apply are welcome. Background: I'm a…
6
votes
2 answers

Clustering Strings on the basis of Common Substrings

I have around 10000+ strings and have to identify and group all the strings which looks similar(I base the similarity on the number of common words between any two give strings). The more number of common words, more similar the strings would be.…
pk188
  • 61
  • 1
  • 2
5
votes
5 answers

Data Mining Books

I'm passionate about data mining, I have read some books like Programming Collective Intelligence, and I would like to know more good books, specially practical ones, about data mining in conjunction with AI. Any tips will be appreciated as well.…
BrainStorm
  • 169
  • 5
5
votes
3 answers

How do I cluster strings based on a relation between two strings?

If you don't know WEKA, you can try a theoretical answer. I don't need literal code/examples... I have a huge data set of strings in which I want to cluster the strings to find the most related ones, these could as well be seen as duplicate. I…
Tamara Wijsman
  • 8,259
  • 14
  • 58
  • 94
3
votes
1 answer

Modern approaches to retrieve useful content from a web page?

What are the modern ways to (effectively) determine which part of page contains useful text, data tables, etc. and which are not (e.g. ads, navigation, etc.)? What were the last valuable researches/result/papers in this field in latest years? Thank…
lithuak
  • 131
  • 2
3
votes
1 answer

Distinction between AI, ML, Neural Networks, Deep learning and Data mining

I have recently started exploring the field of machine learning (ML). I think I understand the difference between ML and AI at high level, but I wanted to understand more accurately the differences between these commonly used concepts. After…
3
votes
1 answer

How to perform data mining efficiently (in PHP)?

The moment of working on a system that gives you statistics based on some data gathered from the database has arrived in my company. How do you efficiently gather statistics from a database in such a way that does not add too much overhead in…
GiamPy
  • 235
  • 2
  • 13
3
votes
0 answers

mining (& searching in) github projects

Is there any way to find every github project which: is GPL licensed has over one million lines of C++ code I am imagining that the Github API was designed to automate such requests, but I am not sure at all. In other words, is the GitHub API…
Basile Starynkevitch
  • 32,434
  • 6
  • 84
  • 125
3
votes
0 answers

How do I visualise the feature space partitioning in random forest?

I am learning about random forest and found this video https://www.youtube.com/watch?v=gdnIqGbqiYs&list=UUb9svouAi1XHRqlOs8LXbBg very useful. The first 8 minutes explain how to visualise how the feature space is partitioned as the tree is populated…
brucezepplin
  • 131
  • 2
2
votes
1 answer

Efficiently determining many-to-many subset relation

I'm doing market basket analysis. I have a set of transactions. Every transaction is a set of items that were bought. I then have a set of itemsets (i.e. a set of items) that I want to determine the support for. The support of an itemset is defined…
user76821
2
votes
2 answers

How do I find the JavaScript that is invoked when I click on a button or a link in a web-page (part of a data mining project)?

I tried to use the 'inspect element' of the firebug addon for Firefox but it doesn't give me any link to the javascript. For example I got this from the firebug add-on: text of the link But there is no link to the…
aste123
  • 399
  • 1
  • 3
  • 8
2
votes
2 answers

Suggesting albums based on friends' top ten lists

Just wondering if someone could suggest a good algorithm in the collaborative filtering vein that I could use to suggest music choices based on top ten lists. This is a personal project, I am a member of a private music blog where most of the users…
user976092
1
vote
1 answer

Pattern for SQL data mining app

We have an app that is used for data mining on our client database. Typical uses include getting a list of clients and their email addresses, running reports about user transactions between certain dates and returning clients that live in a…
woggles
  • 111
  • 2
1
vote
1 answer

Social Analytics in your current data

By now everyone is aware of the massive boom in social-networking (Twitter, Facebook, LinkedIn) and obviously a big part of its business model revolves around being able to mine this data to create information that can be used to make money for…
Dan McGrath
  • 11,163
  • 6
  • 55
  • 81
1
vote
1 answer

Recommending new products using k-means clusters?

I'm trying to figure out the best way to recommend images based on past classifications using k-means clusters. What I have done is mapped the RGB values of a set of images, performed a k-means cluster analysis on those RGB values, and attached a…
crockpotveggies
  • 205
  • 1
  • 9
1
2