Type: | Package |
Title: | Media News Extraction for Text Analysis |
Version: | 0.2.1 |
Date: | 2020-11-25 |
Author: | Vatsal Aima [aut, cre] |
Maintainer: | Vatsal Aima <vaima75@hotmail.com> |
Description: | Extract textual data from different media channels through its source based on users choice of keywords. These data can be used to perform text analysis to identify patterns in respective media reporting. The media channels used in this package are print media. The data (or news) used are publicly available to consumers. |
License: | LGPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
OS_type: | windows |
Depends: | R (≥ 3.5.0) |
Imports: | rvest (≥ 0.3.5), xml2 (≥ 1.2.2), lubridate (≥ 1.7.4), stats (≥ 3.6.1), utils (≥ 3.6.1), stopwords (≥ 1.0) |
RoxygenNote: | 7.1.1 |
NeedsCompilation: | no |
Packaged: | 2020-11-26 07:38:24 UTC; vaima |
Repository: | CRAN |
Date/Publication: | 2020-11-26 10:50:10 UTC |
Text Cleaning: Custom Method
Description
Cleans text and introduce custom stopwords to remove unwanted words from given data.
Usage
ClearText(Text, CustomList = c(""))
Arguments
Text |
A String or Character vector, user-defined. |
CustomList |
A Character vector (Optional), user-defined vector to
introduce stopwords ("english") in |
Value
Returns Character
Author(s)
Vatsal Aima, vaima75@hotmail.com
See Also
TOI_News_Articles
, TOI_News_Dataset
Examples
################### Methodology #####################
###### For DataFrame ######
#### Creates Dataset based on keysword
NewsData = TOI_News_Articles("Goibibo")
## Identify any potential factor columns
vc = sapply(NewsData, is.factor)
## Convert factors to characters
NewsData[vc] = lapply(NewsData[vc], as.character)
## Clean text on specific character columns
for (i in 1:nrow(NewsData)) NewsData$News[i] = ClearText(NewsData$News[i])
######## For Character Variable #### Ex2 ####
para = "Moreover, the text data we get is noisy. But, if we can learn some
methods useful to extract important features from the noisy data, wouldn't
scandal that be amazing ? In this tuto23rial, you'll saadc@ruby.com
learn #world all ab33out regu12lar expressions from scratch. At first, 32324
detective you might find these confusing, or complicated, but after
https://anaconda.com/anaconda-enters-new-chapter/ expressions tricky,
scooby-doo doing practical hands-on exercises (done below)
you should feel bcc: @MikeQuindazzi quite comfortable with it.
In addition, we'll also cartoon-network learn about string 121manipulation
functions in R. This formidable combination of #DL #4IR #Robots
#ArtificialIntelligence string manipulation functions and regular
expressions will prepare you for text mining."
clearpara = ClearText(para,
CustomList = c("scooby-doo",
"cartoon-network",
"detective",
"scandal"))
########### For List #############
paraList = list(para, 1213, factor('aasd;kasdioasd'))
paraList = lapply(paraList, as.character)
for (x in 1:length(paraList)) paraList[[x]] = ClearText(paraList[[x]])
Extract Media News
Description
Creates a DataFrame or Write files to disk by extracting text data from source based on user's keywords.
Usage
TOI_News_Articles(
keywords,
AsDataFrame = TRUE,
start_date = NULL,
end_date = NULL
)
Arguments
keywords |
A String, user-defined. |
AsDataFrame |
Boolean Value, to determine whether the outcome should be a Dataframe or files written to disk. if set to FALSE then retuns the files will be written to disk at stated working directory (default TRUE). |
start_date |
Date (Character) Value, provide the starting date FROM where
the data should be extracted. NOTE: only provide
|
end_date |
Date (Character) Value, provide the ending date TO where the data
should be extracted. NOTE: only provide |
Value
Returns DataFrame or write files to the disk based on keywords
Author(s)
Vatsal Aima, vaima75@hotmail.com
See Also
Examples
#### Creates Dataset by filtering 31 days from current date
NewsDataset1 = TOI_News_Articles(keywords = "Politics In US",
start_date = Sys.Date()- 31,
end_date = Sys.Date())
# Creates Dataset by custom filtering through dates
NewsDataset2 = TOI_News_Articles(keywords = "BaseBall",
start_date = "2019-09-20",
end_date = "2019-10-20")
# Write files to disk
TOI_News_Articles(keywords = "AirLines", IsDataFrame = FALSE)
Creates Interim Dataset
Description
Creates an interim news dataset based on user-defined keywords for all possible links extracted from respective source.
Usage
TOI_News_Dataset(keywords)
TOI_News_Links(keywords)
Arguments
keywords |
A String, user-defined. |
Value
Returns DataFrame based on keywords
Functions
-
TOI_News_Links
: Extracts Source Links
Author(s)
Vatsal Aima, vaima75@hotmail.com
See Also
Examples
#### Creates Dataset based on keysword
NewsData = TOI_News_Dataset("Goibibo")
Emoji Data
Description
Emoji Data
Usage
emoji_Data
Format
Dataframe with columns:
- C1,C2,C3,C4,C5,C6,C7,C8
Uni-code in text.
- Browser
Code applicable on Web Browser.
- Appl
Code applicable on Apple.
- Goog
Code applicable on Google.
- FB
Code applicable on FaceBook.
- Wind
Code applicable on Windows Devices.
- Twtr
Code applicable on Twittwer.
- Sams
Code applicable on Samsung.
- Gmail
Code applicable on Gmail.
- Joy,SB,DCM,KDDI
Code applicable on other Platforms.
- Description
Code description
Source
<https://unicode.org/emoji/charts/full-emoji-list.html>