Data mining is the process of locating patterns and trends within a given data-set, that may not be immediately obvious or intuitive.
Questions tagged [data-mining]
27 questions
17
votes
2 answers
How to find hard to misspell given names?
Here is a question that I believe could be solved with some data mining and a sophisticated algorithm, but I don't quite know how. Any pointers as to what data sources to use and what algorithm to apply are welcome.
Background: I'm a…

user1202136
- 281
- 1
- 5
6
votes
2 answers
Clustering Strings on the basis of Common Substrings
I have around 10000+ strings and have to identify and group all the strings which looks similar(I base the similarity on the number of common words between any two give strings). The more number of common words, more similar the strings would be.…

pk188
- 61
- 1
- 2
5
votes
5 answers
Data Mining Books
I'm passionate about data mining, I have read some books like Programming Collective Intelligence, and I would like to know more good books, specially practical ones, about data mining in conjunction with AI. Any tips will be appreciated as well.…

BrainStorm
- 169
- 5
5
votes
3 answers
How do I cluster strings based on a relation between two strings?
If you don't know WEKA, you can try a theoretical answer. I don't need literal code/examples...
I have a huge data set of strings in which I want to cluster the strings to find the most related ones, these could as well be seen as duplicate. I…

Tamara Wijsman
- 8,259
- 14
- 58
- 94
3
votes
1 answer
Modern approaches to retrieve useful content from a web page?
What are the modern ways to (effectively) determine which part of page contains useful text, data tables, etc. and which are not (e.g. ads, navigation, etc.)?
What were the last valuable researches/result/papers in this field in latest years?
Thank…

lithuak
- 131
- 2
3
votes
1 answer
Distinction between AI, ML, Neural Networks, Deep learning and Data mining
I have recently started exploring the field of machine learning (ML). I think I understand the difference between ML and AI at high level, but I wanted to understand more accurately the differences between these commonly used concepts.
After…

user3222249
- 457
- 7
- 14
3
votes
1 answer
How to perform data mining efficiently (in PHP)?
The moment of working on a system that gives you statistics based on some data gathered from the database has arrived in my company.
How do you efficiently gather statistics from a database in such a way that does not add too much overhead in…

GiamPy
- 235
- 2
- 13
3
votes
0 answers
mining (& searching in) github projects
Is there any way to find every github project which:
is GPL licensed
has over one million lines of C++ code
I am imagining that the Github API was designed to automate such requests, but I am not sure at all.
In other words, is the GitHub API…

Basile Starynkevitch
- 32,434
- 6
- 84
- 125
3
votes
0 answers
How do I visualise the feature space partitioning in random forest?
I am learning about random forest and found this video https://www.youtube.com/watch?v=gdnIqGbqiYs&list=UUb9svouAi1XHRqlOs8LXbBg very useful.
The first 8 minutes explain how to visualise how the feature space is partitioned as the tree is populated…

brucezepplin
- 131
- 2
2
votes
1 answer
Efficiently determining many-to-many subset relation
I'm doing market basket analysis. I have a set of transactions. Every transaction is a set of items that were bought. I then have a set of itemsets (i.e. a set of items) that I want to determine the support for. The support of an itemset is defined…
user76821
2
votes
2 answers
How do I find the JavaScript that is invoked when I click on a button or a link in a web-page (part of a data mining project)?
I tried to use the 'inspect element' of the firebug addon for Firefox but it doesn't give me any link to the javascript.
For example I got this from the firebug add-on:
text of the link
But there is no link to the…

aste123
- 399
- 1
- 3
- 8
2
votes
2 answers
Suggesting albums based on friends' top ten lists
Just wondering if someone could suggest a good algorithm in the collaborative filtering vein that I could use to suggest music choices based on top ten lists.
This is a personal project, I am a member of a private music blog where most of the users…
user976092
1
vote
1 answer
Pattern for SQL data mining app
We have an app that is used for data mining on our client database.
Typical uses include getting a list of clients and their email addresses, running reports about user transactions between certain dates and returning clients that live in a…

woggles
- 111
- 2
1
vote
1 answer
Social Analytics in your current data
By now everyone is aware of the massive boom in social-networking (Twitter, Facebook, LinkedIn) and obviously a big part of its business model revolves around being able to mine this data to create information that can be used to make money for…

Dan McGrath
- 11,163
- 6
- 55
- 81
1
vote
1 answer
Recommending new products using k-means clusters?
I'm trying to figure out the best way to recommend images based on past classifications using k-means clusters. What I have done is mapped the RGB values of a set of images, performed a k-means cluster analysis on those RGB values, and attached a…

crockpotveggies
- 205
- 1
- 9