Making Data Analysis Convenient and Customizable Through an Open-Source R Programming Language

About R Project

R programming language is one of the most popular tools that is currently being widely adopted for statistical work. It is a very important tool used in Data Science. It is an open-source programming language developed by a wide community of avid developers across the globe. It is a combination of various packages and graphical libraries which gets continuously added and upgraded by the developers and available for free at the R project website This resource provides over 10,000 packages for programming in R. The interface of R is called R Studio which is a comprehensive environment that provides the ability to handle data, code, perform statistical modelling, and for developing outcomes in graphical or textual format. The R console takes the commands as input and is evaluated and executed subsequently. R language cannot automatically detect auto-formatting characters such as quotes and dashes, hence whenever a code is used from external sources the user should discreetly use those in the R environment.  

Important Features of R-Programming for Dissertation Work

Dissertations which involve a large span of data, available both on public domains as well as extracted from various sources, have found immense use of R programming from data mining to create statistical models and visuals. The application of R in data science is immense, and one can use it to perform simple data mining, statistical analysis to machine learning techniques. The user can create objects, functions and packages in R. It is also supported by most operating systems and as it comes as an open-source licensing and hence can be installed and used by anyone. For its free availability, it is very commonly used in the academic world but also has lately found its presence in various industries working in the data science field.

R combines both the procedural programming as well as object-oriented programing involving generic functions and is therefore called as a comprehensive programming language. As there are already over 10,000 built functions in R hence it provides convenience for easier programming using these functions to the coder. R is an interpreter-based language and hence can be portable independent of the machine. Thus, it is also easy to debug an error in the code. It can handle complex operations involving arrays, vectors, data frames and other objects with variable sizes. It also provides robust data handling and storage options. Besides all these, there is huge open community support for R programming to provide technical support.

R Graphical User Interface

R GUI is the standard interface for working in R. The R console as shown in Figure 1 is the most essential part of the R GUI. This is the window where R scripts, different instructions and operations are passed. Several tools are embedded in this console to facilitate the use of the interface. Whenever one accesses the R GUI this console will appear.

Fig. 1: R Console in the R Graphical User Interface

The “File” menu at the top of this console in the main panel of R GUI should be clicked and then the “New Script” option should be selected to start a new script in R. To exit an active session the user should type “q()” after the R command prompt “>” as shown in the above figure.

R Studio

R Studio is a comprehensive and integrated development environment for R. It provides one single window to facilitate editing of codes, bug notification, data view and output generated from executing the code. It has the facility to access via web browser and across various platforms. It includes an auto-update feature for latest releases of R packages and therefore reduces manual intervention. As the data view is also available on the same window, thus handling and coding on the data gets convenient for the user. A snapshot of the R studio window is shown in Figure 2.

Fig. 2: R Studio window display on MAC OS

Key Components of R Studio a User Should Know About

There are four key components of R Studio that are used while programming on the R environment.

Source: This space is present in the top left corner of the window. It is the text editor that provides the user to code within source scripting. Multiple lines of code could be entered here without executing these and the same can be saved to files which are stored in local memory.

Console: This is used for interactive scripting in the R environment and each line of code is executed before moving to the next line.

Workspace and History: In the top right side of R Studio, one can find the Workspace and History window. It shows the history of all past commands that were executed and the list of all variables of the data used and created during the work.

Files, Plots, Package and Help: There are four tabs in this bottom right window. The files tab helps the user to browse through the files and folders in the computer. The Plots tab shows the graphs and plots executed from the program if any. The packages option shows the list of all installed packages, and as the name suggests, Help tab provides with the built-in support system in R.

Benefits and Limitations of R Programing Language in Dissertation Work

Compared to other technologies there are certain unique aspects of R programming which makes it a programming language of choice. The graphical libraries available in R like ggplot2, plotly etc. can help built appealing and customizable plots. As it is a pure programing language hence there is very less restriction in developing a plot, model or graph of choice. R can also read different data formats and can source data from different databases, data files and even from online web sources thus making it very convenient. It is an open-source platform hence all its features come bearing no costs to the user. Given its wide applicability and free availability, there is extensive community support available.

However, it also comes with a few limitations and difficulties in use. As it is a programming language with no inbuilt features like that of other statistical tools such as SAS, SPSS, STATA etc. hence the user is expected to learn and understand programming in R. Though the merit of it is an open-source means it has no cost implication but it continuously gets upgraded and new functions created on an ongoing basis thus creating a challenge to the user to remain abreast with the latest versions and capabilities. Thus a researcher working on a dissertation is expected to understand and know the R programming language well to use it to its fullest potential.

Leave a Reply

Your email address will not be published. Required fields are marked *