Data Mining Projects

DATA MINING

Technofist provides latest IEEE software projects for final year students in data mining. Data mining is process of mining the large set of data in which we are research or mining the large volumes of data, in order to get a particular knowledge out of it. Latest Data mining projects in Technofist is completely belongs to large volumes of data.

Technofist is one of the best final year project institute for CSE ,IT engineering students for implementing data mining based project. All the data minig related projects in Technofist related to latest IEEE base paper. Here is a list of project ideas related to Data Mining. Students belonging to third year or final year can use these projects as one of the academic projects. This list has been complied after referring to the project ideas that have come across the forum since last few years. If you have questions regarding these projects feel free to ask them in the replies below. This projects is implemented by data mining experts in Technofist.

TDM001
SOCIALQ&A: AN ONLINE SOCIAL NETWORK BASED QUESTION AND ANSWER SYSTEM

ABSTRACT -Question and Answer (Q&A) systems play a vital role in our daily life for information and knowledge sharing. Users post questions and pick questions to answer in the system. Due to the rapidly growing user population and the number of questions, it is unlikely for a user to stumble upon a question by chance that (s) he can answer. Also, altruism does not encourage all users to provide answers, not to mention high quality answers with a short answer wait time. The primary objective of this paper is to improve the performance of Q&A systems by actively forwarding questions to users who are capable and willing to answer the questions. To this end, we have designed and implemented SocialQ&A, an online social network based Q&A system. Contact:
 +91-9008001602
 080-40969981

TDM002
EFFICIENT PROCESSING OF SKYLINE QUERIES USING MAPREDUCE

ABSTRACT -The skyline operator has attracted considerable attention recently due to its broad applications. However, computing a skyline is challenging today since we have to deal with big data. For data-intensive applications, the MapReduce framework has been widely used recently. In this paper, we propose the efficient parallel algorithm SKY-MR+ for processing skyline queries using MapReduce. We first build a quadtree-based histogram for space partitioning by deciding whether to split each leaf node judiciously based on the benefit of splitting in terms of the estimated execution time. In addition, we apply the dominance power filtering method to effectively prune non-skyline points in advance. Contact:
 +91-9008001602
 080-40969981

TDM003
FIDOOP-DP: DATA PARTITIONING IN FREQUENT ITEMSET MINING ON HADOOP CLUSTERS

ABSTRACT - Traditional parallel algorithms for mining frequent itemsets aim to balance load by equally partitioning data among a group of computing nodes. We start this study by discovering a serious performance problem of the existing parallel Frequent Itemset Mining algorithms. Given a large dataset, data partitioning strategies in the existing solutions suffer high communication and mining overhead induced by redundant transactions transmitted among computing nodes. We address this problem by developing a data partitioning approach called FiDoop-DP using the MapReduce programming model. The overarching goal of FiDoop-DP is to boost the performance of parallel Frequent Itemset Mining on Hadoop clusters. Contact:
 +91-9008001602
 080-40969981

TDM004
USER-CENTRIC SIMILARITY SEARCH

ABSTRACT - User preferences play a significant role in market analysis. In the database literature there has been extensive work on query primitives, such as the well known top-k query that can be used for the ranking of products based on the preferences customers have expressed. Still, the fundamental operation that evaluates the similarity between products is typically done ignoring these preferences. Instead products are depicted in a feature space based on their attributes and similarity is computed via traditional distance metrics on that space. In this work we utilize the rankings of the products based on the opinions of their customers in order to map the products in a user-centric space where similarity calculations are performed. Contact:
 +91-9008001602
 080-40969981

TDM005
PRACTICAL PRIVACY-PRESERVING MAPREDUCE BASED K-MEANS CLUSTERING OVER LARGE-SCALE DATASET

ABSTRACT - Clustering techniques have been widely adopted in many real world data analysis applications, such as customer behavior analysis, medical data Analysis, digital forensics, etc. With the explosion of data in today’s big data era, a major trend to handle a clustering over large-scale datasets is outsourcing it to HDFS platforms. This is because cloud computing offers not only reliable services with performance guarantees, but also savings on in-house IT infrastructures. However, as datasets used for clustering may contain sensitive information, e.g., patient health information, commercial data, and behavioral data, etc, directly outsourcing them to any Distributed servers inevitably raise privacy concerns. Contact:
 +91-9008001602
 080-40969981

TDM006
SECURE BIG DATA STORAGE AND SHARING SCHEME FOR CLOUD TENANTS

ABSTRACT - The Cloud is increasingly being used to store and process big data for its tenants and classical security mechanisms using encryption are neither sufficiently efficient nor suited to the task of protecting big data in the Cloud. In this paper, we present an alternative approach which divides big data into sequenced parts and stores them among multiple Cloud storage service providers. Instead of protecting the big data itself, the proposed scheme protects the mapping of the various data elements to each provider using a trapdoor function. Contact:
 +91-9008001602
 080-40969981

TDM007
SENTIMENT ANALYSIS OF TOP COLLEGES USING TWITTER DATA

ABSTRACT - In today’s world, opinions and reviews accessible to us are one of the most critical factors in formulating our views and influencing the success of a brand, product or service. With the advent and growth of social media in the world, stakeholders often take to expressing their opinions on popular social media, namely twitter. While Twitter data is extremely informative, it presents a challenge for analysis because of its humongous and disorganized nature. This paper is a thorough effort to dive into the novel domain of performing sentiment analysis of people’s opinions regarding top colleges in India. Besides taking additional preprocessing measures like the expansion of net lingo and removal of duplicate tweets Contact:
 +91-9008001602
 080-40969981

TDM008
CONNECTING SOCIAL MEDIA TO E-COMMERCE: COLD-START PRODUCT RECOMMENDATION USING MICROBLOGGING INFORMATION

ABSTRACTUnsupervised Cross-domain Sentiment Classification is the task of adapting a sentiment classifier trained on a particular domain (source domain), to a different domain (target domain), without requiring any labeled data for the target domain. By adapting an existing sentiment classifier to previously unseen target domains, we can avoid the cost for manual data annotation for the target domain. We model this problem as embedding learning, and construct three objective functions that capture: (a) distributional properties of pivots (i.e., common features that appear in both source and target domains), (b) label constraints in the source domain documents, and source and target domains. Contact:
 +91-9008001602
 080-40969981

TDM009
FRAPPE: DETECTING MALICIOUS FACEBOOK APPLICATIONS

ABSTRACT Communication technology has completely occupied all the areas of applications. Last decade has however witnessed a drastic evolution in information and communication technology due to the introduction of social media network. Business growth is further achieved via these social media. Nevertheless, increase in the usage of online social networks (OSN) such as Facebook, twitter, Instagram etc has however led to the increase in privacy and security concerns. Third party applications are one of the many reasons for Facebook attractiveness. Regrettably, the users are unaware of detail that a lot of malicious Facebook applications provide on their profile.Contact:
 +91-9008001602
 080-40969981

TDM010
A NOVEL RECOMMENDATION MODEL REGULARIZED WITH USER TRUST AND ITEM RATINGS

ABSTRACT -We propose TrustSVD, a trust-based matrix factorization technique for recommendations. TrustSVD integrates multiple information sources into the recommendation model in order to reduce the data sparsity and cold start problems and their degradation of recommendation performance. An analysis of social trust data from four real-world data sets suggests that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model.Contact:
 +91-9008001602
 080-40969981

TDM011
BUILDING AN INTRUSION DETECTION SYSTEM USING A FILTER-BASED FEATURE SELECTION ALGORITHM

ABSTRACTRedundant and irrelevant features in data have caused a long-term problem in network traffic classification. These features not only slow down the process of classification but also prevent a classifier from making accurate decisions, especially when coping with big data. In this paper, we propose a mutual information based algorithm that analytically selects the optimal feature for classification. This mutual information based feature selection algorithm can handle linearly and nonlinearly dependent data features. Its effectiveness is evaluated in the cases of network intrusion detection. Contact:
 +91-9008001602
 080-40969981

TDM012
SENTIMENT ANALYSIS OF TOP COLLEGES USING TWITTER DATA

ABSTRACT - In today’s world, opinions and reviews accessible to us are one of the most critical factors in formulating our views and influencing the success of a brand, product or service. With the advent and growth of social media in the world, stakeholders often take to expressing their opinions on popular social media, namely twitter. While Twitter data is extremely informative, it presents a challenge for analysis because of its humongous and disorganized nature. This paper is a thorough effort to dive into the novel domain of performing sentiment analysis of people’s opinions regarding top colleges in India. Besides taking additional preprocessing measures like the expansion of net lingo and removal of duplicate tweets Contact:
 +91-9008001602
 080-40969981

 

download abstract

FOR DATA MINING PROJECTS