HADOOP DATA DICTIONARY

1) Is it internal or external? From what system does it came from?

The data is external from the third party data source platform Kaggle, which was eventually extracted from Twitter network.

Kaggle is a platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective.

2) Is it going to change? What data are you going to use ?

The data is static, which won't get change as it is not the real time data. Precisely, it was complete real time data which was extracted from Twitter but there won't be any further changes going to happen for the complete project.

The data we are using is "How ISIS uses Twitter?" we gathered the data set which describes the list of users and the followers along with the content which had been tweeted using the Twitter platform. By analysing the data, we can fetch the ISIS supporters and predict the attack

3) Data Description (Describe the different data types?)

Defining the data variables...
Field Name Data Type Field Length Description
name String 15 Names of the twitter homepage. Total of 112 unique names
username String 15 Twitter usernames, which are similar to actual names.
description String 60 Subject of a tweet with video link
location String 20 Location of the user
followers Integer 3 Followers the person had for an individual tweet
Numberstatuses Integer 3 The count of an individual person account
time Date 10 Time stamp of the tweet
tweets String 140 Content of tweet with maximum of 140 characters