Kranthi206

HADOOP DATA DICTIONARY

1) Is it internal or external? From what system does it came from?

The data is external from the third party data source platform Kaggle, which was eventually extracted from Twitter network.

Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.

2) Is it going to change? What data are you going to use ?

The data is static, which won't get change as it is not the real time data. Precisely, it was complete real time data which was extracted from Twitter but there won't be any further changes going to happen for the complete project.

The data we are using is "How ISIS uses Twitter?" we gathered the data set which describes the list of users and the followers along with the content which had been tweeted using the Twitter platform. By analysing the data, we can fetch the ISIS supporters and predict the attack

3) Data Description (Describe the different data types?)

Defining the data variables...
Field Name	Data Type	Field Length	Description
name	String	15	Names of the twitter homepage. Total of 112 unique names
username	String	15	Twitter usernames, which are similar to actual names.
description	String	60	Subject of a tweet with video link
location	String	20	Location of the user
followers	Integer	3	Followers the person had for an individual tweet
Numberstatuses	Integer	3	The count of an individual person account
time	Date	10	Time stamp of the tweet
tweets	String	140	Content of tweet with maximum of 140 characters