Joachim Schork – Author & Founder of Statistics Globe
Hi, my name is Joachim Schork and I’m the guy behind Statistics Globe.
On this page, I’ll give you a brief overview about my background in statistics and why I started this website.
I have spent a lot of time to increase my skills in statistics and programming within the last 15 years. My academic career started with a bachelor in Educational Science at the University of Tübingen, Germany.
During my Bachelor studies, I fell in love with statistics the first time. I focused as much as possible on statistical research methods for the monitoring of students’ skills and the assessment of educational interventions.
Due to my increasing interest in statistics and the corresponding software tools, I decided to focus even more on this field in my Master studies. For that reason, I moved to Trier University where I have finished a Master of Survey Statistics and an EMOS Certificate in 2017.
Afterwards, I started a job as Microdata Expert located at STATEC, the national statistical institute of Luxembourg.
This job position was definitely another boost for my statistical skills, since I was able to work on many different data sets such as the Statistics on Income and Living Conditions (SILC), the Labour Force Survey (LFS), or the Business and Leisure Tourism Survey.
In discussions with colleagues or at research conferences, I noticed how important it is to exchange with other statisticians and researchers of different fields and this was definitely one of the main reasons why I started Statistics Globe.
While working at STATEC, I also started a side-business for online marketing and webdesign. This business was initially a part-time job, but at the end of 2019 I decided to quit my job at STATEC to dedicate all my time to my own business.
Since then, Statistics Globe is not only a way to exchange with other programmers and researchers; It also allows me to combine both of my interests (statistics and webdesign).
Social Media & Contact
As already mentioned: A major goal of this website is to exchange with other statisticians, data scientists, programmers, and researchers of any field. So please let me know in the comments or on social media, in case you have any questions or topics you want to discuss!
In case you would like to follow me on social media or if you would like to contact me via email, you can find all details below.
- YouTube
- Facebook – Statistics Globe Page
- Facebook – R Programming Group for Discussions & Questions
- Facebook – Python Programming Group for Discussions & Questions
- LinkedIn – Statistics Globe Page
- LinkedIn – R Programming Group for Discussions & Questions
- LinkedIn – Python Programming Group for Discussions & Questions
42 Comments. Leave new
Hello Jo
We like your portal “statistics Globe” and I am a frequent visitor.
Can you do some tutorials in Debug functions in R. Also if you do so, please include a layman’s explanation about how to interpret explainations that debug functions give because these are equally technical .
Thanks
Hey Shrinivas,
Nice to hear from you!
I don’t have a tutorial on debug functions yet, but you may have a look at this article: https://statisticsglobe.com/errors-warnings-r
It contains a list of some of the most common error messages in R and provides introductions on how to fix these errors.
I hope that helps!
Joachim
Hallo Joachim, coole Seite. Gibt es die Möglichkeit Beobachtungen in denen Ergebnisse von 1%-100% existieren und Skalen von 1-5 umzuwandeln? Also das man sagt, alles was in dieser Spalte ist, in der die Prozente sind und im Prozentbereich von 1-20% liegt wird in eine 1 umgewandelt und alles was zwischen 21-40% liegt ist eine 2 usw usw. Dafür würde ich direkt Trinkgeld auf Paypal schicken.
Hi Daniele,
vielen Dank für die netten Worte!
Du kannst deine Daten wie folgt umwandeln:
Ich hoffe, das hilft dir weiter!
Falls du mich unterstützen möchtest, kannst du dir gerne mal meinen Patreon Account anschauen (ist natürlich völlig freiwillig! 🙂 ): https://www.patreon.com/statisticsglobe
Viele Grüße und schöne Feiertage!
Joachim
Congratulations, Joachim, for your great work on statisticsglobe.com – and a huge ‘Thank you’ as it is really helpful in my studying of statistics and R programming. Very well and detailed explanations and yet helt super simple. Just seen your vid on ‘Join Data with dplyr in R’ – what a great way of explaining. This 9 min. video has saved me hours of reading and has made it much more clear to me using your graphical explanations – THANK YOU SO MUCH!
Carsten,
Thank you so much for this amazing feedback! I’m very happy to hear that you enjoy my content and that it helps to improve your statistics and R programming skills.
Don’t hesitate to let me know in the comments in case you have any questions in your future learning progress.
Regards,
Joachim
Dear Joachim,
You are doing a great Job. Its really very helpful for all.
Hi Kamal,
Thanks a lot for your kind words! It’s very motivating to get such positive feedback! 🙂
Regards
Joachim
Fuzzy clustering:
I want to do (time-series data fuzzy clustering) using R program
Use my data rather than (CharTraj ) in this code but use same style (CharTraj )
data <-read.csv(file.choose(),sep = ',')
library("dtwclust")
data("uciCT")
# Calculate autocorrelation up to 50th lag
acf_fun <- function(series, …) {
lapply(series, function(x) {
as.numeric(acf(x, lag.max = 50, plot = FALSE)$acf)
})
}
# Fuzzy c-means
fc <- tsclust(CharTraj[1:25], type = "f", k = 4L,
preproc = acf_fun, distance = "L2",
seed = 42)
# Fuzzy membership matrix
fc@fcluster
example results
## cluster_1 cluster_2 cluster_3 cluster_4
## A.V1 0.944079794 0.010596054 0.020895926 0.0244282262
## A.V2 0.973024707 0.004558053 0.009814713 0.0126025278
## A.V3 0.910457782 0.013363454 0.026818391 0.0493603740
## A.V4 0.487954179 0.212700292 0.219111649 0.0802338802
## A.V5 0.557762811 0.172923239 0.188579412 0.0807345380
## B.V1 0.128665544 0.034803979 0.082738850 0.7537916278
## B.V2 0.010999524 0.002277317 0.004997756 0.9817254027
## B.V3 0.197222739 0.033052784 0.061935472 0.7077890056
## B.V4 0.166409909 0.031366546 0.050323544 0.7519000007
## B.V5 0.427121633 0.235092628 0.187510917 0.1502748225
## C.V1 0.311652169 0.047492672 0.197978128 0.4428770302
## C.V2 0.007458354 0.002748052 0.986187858 0.0036057365
## C.V3 0.075206881 0.051338895 0.840850637 0.0326035878
## C.V4 0.340863672 0.055549042 0.357239701 0.2463475850
## C.V5 0.015607418 0.006151640 0.970146090 0.0080948526
## D.V1 0.017714824 0.958605028 0.016256793 0.0074233544
## D.V2 0.047929862 0.903236104 0.030495920 0.0183381136
## D.V3 0.002225743 0.994942451 0.001865065 0.0009667418
## D.V4 0.004954758 0.988846881 0.004040801 0.0021575597
## D.V5 0.018867912 0.954708141 0.017683168 0.0087407796
Hi Ahmed,
Thank you for your comment. Do you have a specific question? 🙂
Regards
Joachim
how can I do (time-series data fuzzy clustering, for data 25 columns)?
add to, which code to make my data from column to lists?
or
which code can I use to (R Create Data Frame where a Column is a List | Different Variable Types )
I’m not an expert for Fuzzy Clustering, but this tutorial seems to explain it quite well: https://cran.r-project.org/web/packages/ppclust/vignettes/fcm.html
Here you can learn how to convert columns to lists: https://statisticsglobe.com/convert-data-frame-columns-to-list-elements-in-r
Regards
Joachim
Hi Joachim, your contents are very helpful:). I have a large data (with 11 variables) in .txt and when i use the “read.table (“data.txt”, ….) I do not like how R has read the data. Please see head (data) below. How can i ensure that data is read properly as a data frame or table with 11 dimensions/variables? Your ideas are greatly appreciated, thanks
combined head(data)
SNP.Name.Sample.ID.Allele1…Forward.Allele2…Forward.Allele1…Top.Allele2…Top.Allele1…AB.Allele2…AB.GC.Score.X.Y
1 1-65462706-C-G-rs43237859 JERGBRF000000051557 C C C C A A 0 2250 523
2 1-69673871-C-T-rs209885271 JERGBRF000000051557 C C G G B B 0 639 2749
3 1-69756947-C-G-rs469945562 JERGBRF000000051557 C C C C A A 0 1405 455.4
4 1-69832336-G-A-rs209887380 JERGBRF000000051557 G G G G B B 0 1159 6483
5 11-19037605-G-A-rs381309800 JERGBRF000000051557 A G A G A B 0 1878 2468
6 11-19079043-C-T-rs208164936 JERGBRF000000051557 T C A G A B 0 1710 1730
Hi again Elsie,
Thank you for the kind words! 🙂
Regarding your question: It seems like you have to change the separator, i.e. the sep argument within the read.table function.
For example, you could try the following codes:
or
Does this help?
Joachim
Hi Joachim, I want to make a time series plot of 8 lines and one scatter plot on the same graph. My data is long – in the sense, 2 columns – open for the dates and the other for the output(lines and scatter plot points). How do I do this? I know how to plot the lines on a plot and know how to do a scatter plot – both separately; but how to do it other in one plot?
Hi Mahesh,
I recommend using the ggplot2 package. Based on this package, you can add lines and points in the the same graph using the geom_line and geom_point functions.
You can find more info in the following tutorials:
https://statisticsglobe.com/draw-ggplot2-plot-with-lines-and-points-in-r
https://statisticsglobe.com/draw-multiple-time-series-in-same-plot-in-r
I hope that helps!
Joachim
Hi Jo, please are there any good packages in R that can be used to interpret the statistical output?
Hi Oluwafemi,
This depends very much on the type of analysis and your specific output. How does your output exactly look like?
Regards
Joachim
Hi Joachim. I am roll two die, 1:4. Initially the exercise was to add the sim, sum(dice), now I’m being asked to modify the function to get the product. I’ve tried reducing sum with prod, but it does not seem to work. I get numbers like 5 and 7 which cannot possibly be the product of any numbers 1 through 4.
Hey Reco,
Could you share the R code you have used so far?
Regards
Joachim
A great Job. Its really very helpful for all. keep it up
Thank you very much Admasu, glad you think so! 🙂
Hello Joachim Schork, I’m Andrés from Colombia (South America), thank you so much for this wonderful and comprehensive work of yours, truly it is one of a kind and so helpful for statistics enthusiast and students worldwide I discovered your page during the pandemic and it was one of the best things that ever happened during this trying times, you make me want to aspire to be like you, thank you so much for the hard work that you have put into creating this site I hope you know that you are loved and that we are very appreciative of your work and knowledge!
Wow, thank you so much for this wonderful feedback and the very kind words Andres! It makes me very happy to hear that! 🙂
I wish you the best of luck for your learning progress and just let me know in case you have any questions.
Greetings from Germany to beautiful Colombia (I’ve been in Bogotá and Cartagena once, and it was a great experience!)
Hi Joachim Schork. Once again, your site helped quickly find a solution to a problem. Your simple and understandable examples often help out. THANKS FOR YOUR WORK !!!
If you will be in Russia, come visit 🙂
Hi mate,
This is great to hear! Thank you so much for this wonderful feedback! 🙂
I always wanted to travel in Russia, hopefully I can do this soon.
Regards
Joachim
Hi Joachim.
I have created a heatmap, but the sequences of x-axis and y-axis variables are not organized like in the excel sheet. How can I organize the heatmap?
thanks
Hey Muhammad,
Thank you for the interesting question. It has inspired me to write a new tutorial on this topic. You can find it here: https://statisticsglobe.com/order-rows-columns-heatmap-r
Regards,
Joachim
i need to solve this problem, but i cant
can you please help me ou..
Q. s1 = rnorm(100)
Calculate the square root for the positive values and store it with index.
Hey Sunandan,
You can remove negative values from your data as explained here: https://statisticsglobe.com/remove-negative-values-from-vector-data-frame-r
And then you can calculate the square root as explained here: https://statisticsglobe.com/square-root-in-r-sqrt
Regards,
Joachim
Hi, Joachim
I have been using the R-studio for climate data analysis which is .nc file format. However, I found the error ” incorrect number of dimensions” during image plot. Could you help me, here is the code Ak = ncvar_get(rai, “pr”, start = c(lonind[1],latind[1],1), count = c(length(lonind),length(latind),-1))
CLIMATE = image.plot(Ak[, , 1])
Hey Twork,
Is the error message returned after the first line or the second line of code? What is returned when you run head(Ak) and dim(Ak) ?
Regards,
Joachim
Dear Joachim,
I find your website very useful , but I’m stuck concerning an R analysis. I have a data frame including a column with countries (150+) and all have different rows associated with them regarding different years. In order to create a loop for future calculations for each countries, how can I index my data per countries? I need one index for each country, but the same index for all the rows regarding the same countries.
Thank you in advance !
Best regards,
Camille
Hi Camille,
First of all, thank you very much for the very kind words! Glad you find my tutorials helpful! 🙂
Regarding your question, could you please illustrate in an example how the first few rows of your current data look like, and how the data should look like at the end? This would help to understand your question better.
Regards,
Joachim
Hi Joachim,
Love how you are help people out! 🙂
Im not really a statistic, but i do genetics and are now stuck.. I need to do a 5-fold cross validation with my model running as a BGLR. I do not code, so im really lost in how to figure this out.
Thank you in advance!
Regards,
Kathrine
Hey Kathrine,
Thanks a lot for the very kind words, glad you like my website! 🙂
Unfortunately, I’m not an expert on this topic. However, I have recently created a Facebook discussion group where people can ask questions about R programming and statistics. Could you post your question there? This way, others can contribute/read as well: https://www.facebook.com/groups/statisticsglobe
Regards,
Joachim
how are you sir
I want to find code Comparing five dissimilarity measures by ( adjusted Rand index) based on my data to find best one to my data
Hey Ahmed,
Please excuse the late response. I was on a long holiday so unfortunately I wasn’t able to reply sooner. Still need help with your code?
Regards,
Joachim
lm_info <- lm(t0.5 ~ AUC, data = data_all), (na.rm = TRUE)
above is my commant in python R and showing below error can u suggest the solution
Error: unexpected ',' in "lm_info <- lm(t0.5 ~ AUC, data = data_all),"
Hey Azeem,
Is it an R code or a Python code?
Regards,
Cansu
Hi Joachim,
I really appreciate how you are helping people here with Statistics in R.
I have a doubt with:
GMAT scores of applicants to a Business Analytics course
Interval
501-550
551-600
601-650
651-700
701-750
751-800
Corresponding Frequency
25
40
42
31
26
21
What is the median class of the above GMAT scores?
After the second round of applications, this data table was updated. 13 students were added to the Interval551-600 and 22 students were added to 701-750. What is the modal class of the new data?
I am able to do this by conventional method.But how would one do this in R?
I am very new with R, so I would really appreciate it if you could help me with this.Thanks!
Hello Sherry,
First of all, thank you so much for your kind words. I am not so familiar with the concepts that you mentioned. Based on my small research, I could come up with the following solution.
For the median class, you can do the following:
For modal class, you can do the following in R:
I hope I didn’t make a conceptual mistake. You can play with the code to correct it, if so. If you need something additional, let me know.
Best,
Cansu