Joachim Schork – Author & Founder of Statistics Globe

Hi, my name is Joachim Schork and I’m the guy behind Statistics Globe.

On this page, I’ll give you a brief overview about my background in statistics and why I started this platform.

I have spent a lot of time increasing my skills in statistics and programming within the last 15 years. My academic career started with a Bachelor in Educational Science at the University of Tübingen, Germany.

During my Bachelor’s studies, I fell in love with statistics the first time. I focused as much as possible on statistical research methods for the monitoring of students’ skills and the assessment of educational interventions.

Due to my increasing interest in statistics and the corresponding software tools such as R and Python, I decided to focus even more on this field in my Master’s studies. For that reason, I moved to Trier University, where I have finished a Master of Survey Statistics and an EMOS Certificate in 2017.

Afterward, I started a job as a Microdata Expert located at STATEC, the national statistical institute of Luxembourg.

This job position was definitely another boost for my statistical skills, since I was able to work on many different data sets such as the Statistics on Income and Living Conditions (SILC), the Labour Force Survey (LFS), and the Business and Leisure Tourism Survey.

In discussions with colleagues or at research conferences, I noticed how important it is to exchange with other statisticians and researchers of different fields, and this was definitely one of the main reasons why I started Statistics Globe.

While working at STATEC, I also started a side-business for online marketing and webdesign. This business was initially a part-time job, but at the end of 2019 I decided to quit my job at STATEC to dedicate all my time to my own business.

Since then, Statistics Globe is not only a way to exchange with other programmers and researchers, it also allows me to combine both of my interests (statistics and webdesign).

Over the years, Statistics Globe has evolved from a one-person company into a thriving team of statistics and data science enthusiasts. Today, the platform attracts hundreds of thousands of visitors monthly, becoming a prominent resource in the field.

Furthermore, we’ve expanded our offerings to include comprehensive online courses and consulting services in addition to our free tutorials. These additions aim to empower individuals to enhance their own data science capabilities or support their projects with our expertise.

Social Media & Contact

As already mentioned: A major goal of this platform is to exchange with other statisticians, data scientists, programmers, and researchers of any field. So please let me know if you have questions, topics you wish to discuss, or if you’re interested in any of our services!

In case you would like to follow me on social media or if you would like to contact me via email, you can find all the details below.

46 Comments. Leave new

SHRINIVAS
November 4, 2020 1:40 am

Hello Jo

We like your portal “statistics Globe” and I am a frequent visitor.
Can you do some tutorials in Debug functions in R. Also if you do so, please include a layman’s explanation about how to interpret explainations that debug functions give because these are equally technical .

Thanks

Reply
- Joachim
  November 4, 2020 1:35 pm
  
  Hey Shrinivas,
  
  Nice to hear from you!
  
  I don’t have a tutorial on debug functions yet, but you may have a look at this article: https://statisticsglobe.com/errors-warnings-r
  
  It contains a list of some of the most common error messages in R and provides introductions on how to fix these errors.
  
  I hope that helps!
  
  Joachim
  
  Reply
Daniele
December 20, 2020 8:59 pm

Hallo Joachim, coole Seite. Gibt es die Möglichkeit Beobachtungen in denen Ergebnisse von 1%-100% existieren und Skalen von 1-5 umzuwandeln? Also das man sagt, alles was in dieser Spalte ist, in der die Prozente sind und im Prozentbereich von 1-20% liegt wird in eine 1 umgewandelt und alles was zwischen 21-40% liegt ist eine 2 usw usw. Dafür würde ich direkt Trinkgeld auf Paypal schicken.

Reply
- Joachim
  December 21, 2020 7:39 am
  Hi Daniele,
  
  vielen Dank für die netten Worte!
  
  Du kannst deine Daten wie folgt umwandeln:
  # Create example data set.seed(287346) x <- paste0(round(runif(20, 1, 100)), "%") x # [1] "32%" "35%" "64%" "70%" "13%" "23%" "70%" "94%" "95%" "68%" "42%" "32%" "39%" "15%" "54%" "57%" "22%" "34%" "61%" "71%" # Convert data into groups x_new <- rep(NA, length(x)) x_new[x %in% paste0(1:20, "%")] <- 1 x_new[x %in% paste0(21:40, "%")] <- 2 x_new[x %in% paste0(41:60, "%")] <- 3 x_new[x %in% paste0(61:80, "%")] <- 4 x_new[x %in% paste0(81:100, "%")] <- 5 x_new # [1] 2 2 4 4 1 2 4 5 5 4 3 2 2 1 3 3 2 2 4 4
  Ich hoffe, das hilft dir weiter!
  
  Falls du mich unterstützen möchtest, kannst du dir gerne mal meinen Patreon Account anschauen (ist natürlich völlig freiwillig! 🙂 ): https://www.patreon.com/statisticsglobe
  
  Viele Grüße und schöne Feiertage!
  
  Joachim
  Reply
Carsten Grube
January 30, 2021 2:49 pm

Congratulations, Joachim, for your great work on statisticsglobe.com – and a huge ‘Thank you’ as it is really helpful in my studying of statistics and R programming. Very well and detailed explanations and yet helt super simple. Just seen your vid on ‘Join Data with dplyr in R’ – what a great way of explaining. This 9 min. video has saved me hours of reading and has made it much more clear to me using your graphical explanations – THANK YOU SO MUCH!

Reply
- Joachim
  January 30, 2021 3:00 pm
  
  Carsten,
  
  Thank you so much for this amazing feedback! I’m very happy to hear that you enjoy my content and that it helps to improve your statistics and R programming skills.
  
  Don’t hesitate to let me know in the comments in case you have any questions in your future learning progress.
  
  Regards,
  
  Joachim
  
  Reply
Dr. Kamal Nain Kapoor
January 30, 2021 6:33 pm

Dear Joachim,
You are doing a great Job. Its really very helpful for all.

Reply
- Joachim
  February 1, 2021 7:10 am
  
  Hi Kamal,
  
  Thanks a lot for your kind words! It’s very motivating to get such positive feedback! 🙂
  
  Regards
  
  Joachim
  
  Reply
ahmed
March 24, 2021 9:34 pm

Fuzzy clustering:
I want to do (time-series data fuzzy clustering) using R program

Use my data rather than (CharTraj ) in this code but use same style (CharTraj )

data <-read.csv(file.choose(),sep = ',')

library("dtwclust")
data("uciCT")

# Calculate autocorrelation up to 50th lag
acf_fun <- function(series, …) {
lapply(series, function(x) {
as.numeric(acf(x, lag.max = 50, plot = FALSE)$acf)
})
}

# Fuzzy c-means
fc <- tsclust(CharTraj[1:25], type = "f", k = 4L,
preproc = acf_fun, distance = "L2",
seed = 42)

# Fuzzy membership matrix
fc@fcluster

example results

## cluster_1 cluster_2 cluster_3 cluster_4
## A.V1 0.944079794 0.010596054 0.020895926 0.0244282262
## A.V2 0.973024707 0.004558053 0.009814713 0.0126025278
## A.V3 0.910457782 0.013363454 0.026818391 0.0493603740
## A.V4 0.487954179 0.212700292 0.219111649 0.0802338802
## A.V5 0.557762811 0.172923239 0.188579412 0.0807345380
## B.V1 0.128665544 0.034803979 0.082738850 0.7537916278
## B.V2 0.010999524 0.002277317 0.004997756 0.9817254027
## B.V3 0.197222739 0.033052784 0.061935472 0.7077890056
## B.V4 0.166409909 0.031366546 0.050323544 0.7519000007
## B.V5 0.427121633 0.235092628 0.187510917 0.1502748225
## C.V1 0.311652169 0.047492672 0.197978128 0.4428770302
## C.V2 0.007458354 0.002748052 0.986187858 0.0036057365
## C.V3 0.075206881 0.051338895 0.840850637 0.0326035878
## C.V4 0.340863672 0.055549042 0.357239701 0.2463475850
## C.V5 0.015607418 0.006151640 0.970146090 0.0080948526
## D.V1 0.017714824 0.958605028 0.016256793 0.0074233544
## D.V2 0.047929862 0.903236104 0.030495920 0.0183381136
## D.V3 0.002225743 0.994942451 0.001865065 0.0009667418
## D.V4 0.004954758 0.988846881 0.004040801 0.0021575597
## D.V5 0.018867912 0.954708141 0.017683168 0.0087407796

Reply
- Joachim
  March 25, 2021 6:50 am
  
  Hi Ahmed,
  
  Thank you for your comment. Do you have a specific question? 🙂
  
  Regards
  
  Joachim
  
  Reply
ahmed
March 25, 2021 6:06 pm

how can I do (time-series data fuzzy clustering, for data 25 columns)?

add to, which code to make my data from column to lists?
or
which code can I use to (R Create Data Frame where a Column is a List | Different Variable Types )

Reply
- Joachim
  March 26, 2021 8:35 am
  
  I’m not an expert for Fuzzy Clustering, but this tutorial seems to explain it quite well: https://cran.r-project.org/web/packages/ppclust/vignettes/fcm.html
  
  Here you can learn how to convert columns to lists: https://statisticsglobe.com/convert-data-frame-columns-to-list-elements-in-r
  
  Regards
  
  Joachim
  
  Reply
Elsie
April 20, 2021 8:42 pm

Hi Joachim, your contents are very helpful:). I have a large data (with 11 variables) in .txt and when i use the “read.table (“data.txt”, ….) I do not like how R has read the data. Please see head (data) below. How can i ensure that data is read properly as a data frame or table with 11 dimensions/variables? Your ideas are greatly appreciated, thanks

combined head(data)
SNP.Name.Sample.ID.Allele1…Forward.Allele2…Forward.Allele1…Top.Allele2…Top.Allele1…AB.Allele2…AB.GC.Score.X.Y
1 1-65462706-C-G-rs43237859 JERGBRF000000051557 C C C C A A 0 2250 523
2 1-69673871-C-T-rs209885271 JERGBRF000000051557 C C G G B B 0 639 2749
3 1-69756947-C-G-rs469945562 JERGBRF000000051557 C C C C A A 0 1405 455.4
4 1-69832336-G-A-rs209887380 JERGBRF000000051557 G G G G B B 0 1159 6483
5 11-19037605-G-A-rs381309800 JERGBRF000000051557 A G A G A B 0 1878 2468
6 11-19079043-C-T-rs208164936 JERGBRF000000051557 T C A G A B 0 1710 1730

Reply
- Joachim
  April 23, 2021 9:53 am
  Hi again Elsie,
  
  Thank you for the kind words! 🙂
  
  Regarding your question: It seems like you have to change the separator, i.e. the sep argument within the read.table function.
  
  For example, you could try the following codes:
  read.table ("data.txt", sep = ",")
  or
  read.table ("data.txt", sep = ";")
  Does this help?
  
  Joachim
  Reply
Mahesh Doshi
May 19, 2021 10:35 am

Hi Joachim, I want to make a time series plot of 8 lines and one scatter plot on the same graph. My data is long – in the sense, 2 columns – open for the dates and the other for the output(lines and scatter plot points). How do I do this? I know how to plot the lines on a plot and know how to do a scatter plot – both separately; but how to do it other in one plot?

Reply
- Joachim
  May 20, 2021 8:23 am
  
  Hi Mahesh,
  
  I recommend using the ggplot2 package. Based on this package, you can add lines and points in the the same graph using the geom_line and geom_point functions.
  
  You can find more info in the following tutorials:
  
  https://statisticsglobe.com/draw-ggplot2-plot-with-lines-and-points-in-r
  https://statisticsglobe.com/draw-multiple-time-series-in-same-plot-in-r
  
  I hope that helps!
  
  Joachim
  
  Reply
- Oluwafemi
  July 18, 2021 3:06 pm
  
  Hi Jo, please are there any good packages in R that can be used to interpret the statistical output?
  
  Reply
  - Joachim
    August 2, 2021 10:02 am
    
    Hi Oluwafemi,
    
    This depends very much on the type of analysis and your specific output. How does your output exactly look like?
    
    Regards
    
    Joachim
    
    Reply
Reco
June 30, 2021 5:32 pm

Hi Joachim. I am roll two die, 1:4. Initially the exercise was to add the sim, sum(dice), now I’m being asked to modify the function to get the product. I’ve tried reducing sum with prod, but it does not seem to work. I get numbers like 5 and 7 which cannot possibly be the product of any numbers 1 through 4.

Reply
- Joachim
  July 1, 2021 1:15 pm
  
  Hey Reco,
  
  Could you share the R code you have used so far?
  
  Regards
  
  Joachim
  
  Reply
Admasu
August 2, 2021 2:56 pm

A great Job. Its really very helpful for all. keep it up

Reply
- Joachim
  August 2, 2021 3:00 pm
  
  Thank you very much Admasu, glad you think so! 🙂
  
  Reply
Andres
September 4, 2021 4:47 pm

Hello Joachim Schork, I’m Andrés from Colombia (South America), thank you so much for this wonderful and comprehensive work of yours, truly it is one of a kind and so helpful for statistics enthusiast and students worldwide I discovered your page during the pandemic and it was one of the best things that ever happened during this trying times, you make me want to aspire to be like you, thank you so much for the hard work that you have put into creating this site I hope you know that you are loved and that we are very appreciative of your work and knowledge!

Reply
- Joachim
  September 6, 2021 6:37 am
  
  Wow, thank you so much for this wonderful feedback and the very kind words Andres! It makes me very happy to hear that! 🙂
  
  I wish you the best of luck for your learning progress and just let me know in case you have any questions.
  
  Greetings from Germany to beautiful Colombia (I’ve been in Bogotá and Cartagena once, and it was a great experience!)
  
  Reply
Алексей Соловьев
October 8, 2021 7:38 am

Hi Joachim Schork. Once again, your site helped quickly find a solution to a problem. Your simple and understandable examples often help out. THANKS FOR YOUR WORK !!!
If you will be in Russia, come visit 🙂

Reply
- Joachim
  October 8, 2021 8:18 am
  
  Hi mate,
  
  This is great to hear! Thank you so much for this wonderful feedback! 🙂
  
  I always wanted to travel in Russia, hopefully I can do this soon.
  
  Regards
  
  Joachim
  
  Reply
Muhammad Nafees
December 2, 2021 4:27 pm

Hi Joachim.
I have created a heatmap, but the sequences of x-axis and y-axis variables are not organized like in the excel sheet. How can I organize the heatmap?
thanks

Reply
- Joachim
  December 3, 2021 7:44 am
  
  Hey Muhammad,
  
  Thank you for the interesting question. It has inspired me to write a new tutorial on this topic. You can find it here: https://statisticsglobe.com/order-rows-columns-heatmap-r
  
  Regards,
  Joachim
  
  Reply
Sunandan
December 14, 2021 9:09 pm

i need to solve this problem, but i cant

can you please help me ou..
Q. s1 = rnorm(100)
Calculate the square root for the positive values and store it with index.

Reply
- Joachim
  December 15, 2021 7:39 am
  
  Hey Sunandan,
  
  You can remove negative values from your data as explained here: https://statisticsglobe.com/remove-negative-values-from-vector-data-frame-r
  
  And then you can calculate the square root as explained here: https://statisticsglobe.com/square-root-in-r-sqrt
  
  Regards,
  Joachim
  
  Reply
Twork
April 18, 2022 11:31 am

Hi, Joachim

I have been using the R-studio for climate data analysis which is .nc file format. However, I found the error ” incorrect number of dimensions” during image plot. Could you help me, here is the code Ak = ncvar_get(rai, “pr”, start = c(lonind[1],latind[1],1), count = c(length(lonind),length(latind),-1))
CLIMATE = image.plot(Ak[, , 1])

Reply
- Joachim
  April 19, 2022 7:19 am
  
  Hey Twork,
  
  Is the error message returned after the first line or the second line of code? What is returned when you run head(Ak) and dim(Ak) ?
  
  Regards,
  Joachim
  
  Reply
Camille
April 25, 2022 1:13 pm

Dear Joachim,

I find your website very useful , but I’m stuck concerning an R analysis. I have a data frame including a column with countries (150+) and all have different rows associated with them regarding different years. In order to create a loop for future calculations for each countries, how can I index my data per countries? I need one index for each country, but the same index for all the rows regarding the same countries.

Thank you in advance !

Best regards,
Camille

Reply
- Joachim
  April 25, 2022 2:15 pm
  
  Hi Camille,
  
  First of all, thank you very much for the very kind words! Glad you find my tutorials helpful! 🙂
  
  Regarding your question, could you please illustrate in an example how the first few rows of your current data look like, and how the data should look like at the end? This would help to understand your question better.
  
  Regards,
  Joachim
  
  Reply
Kathrine
August 17, 2022 7:45 am

Hi Joachim,

Love how you are help people out! 🙂

Im not really a statistic, but i do genetics and are now stuck.. I need to do a 5-fold cross validation with my model running as a BGLR. I do not code, so im really lost in how to figure this out.

Thank you in advance!

Regards,
Kathrine

Reply
- Joachim
  August 17, 2022 7:46 am
  
  Hey Kathrine,
  
  Thanks a lot for the very kind words, glad you like my website! 🙂
  
  Unfortunately, I’m not an expert on this topic. However, I have recently created a Facebook discussion group where people can ask questions about R programming and statistics. Could you post your question there? This way, others can contribute/read as well: https://www.facebook.com/groups/statisticsglobe
  
  Regards,
  Joachim
  
  Reply
ahmed
October 7, 2022 4:18 pm

how are you sir
I want to find code Comparing five dissimilarity measures by ( adjusted Rand index) based on my data to find best one to my data

Reply
- Joachim
  November 14, 2022 11:55 am
  
  Hey Ahmed,
  
  Please excuse the late response. I was on a long holiday so unfortunately I wasn’t able to reply sooner. Still need help with your code?
  
  Regards,
  Joachim
  
  Reply
azeem khan
December 17, 2022 5:53 pm

lm_info <- lm(t0.5 ~ AUC, data = data_all), (na.rm = TRUE)

above is my commant in python R and showing below error can u suggest the solution

Error: unexpected ',' in "lm_info <- lm(t0.5 ~ AUC, data = data_all),"

Reply
- Cansu (Statistics Globe)
  December 19, 2022 4:12 pm
  
  Hey Azeem,
  
  Is it an R code or a Python code?
  
  Regards,
  Cansu
  
  Reply
Sherry
August 1, 2023 11:27 am

Hi Joachim,
I really appreciate how you are helping people here with Statistics in R.

I have a doubt with:

GMAT scores of applicants to a Business Analytics course

Interval

501-550

551-600

601-650

651-700

701-750

751-800

Corresponding Frequency

25

40

42

31

26

21

What is the median class of the above GMAT scores?

After the second round of applications, this data table was updated. 13 students were added to the Interval551-600 and 22 students were added to 701-750. What is the modal class of the new data?
I am able to do this by conventional method.But how would one do this in R?

I am very new with R, so I would really appreciate it if you could help me with this.Thanks!

Reply
- Cansu (Statistics Globe)
  August 8, 2023 9:25 am
  Hello Sherry,
  
  First of all, thank you so much for your kind words. I am not so familiar with the concepts that you mentioned. Based on my small research, I could come up with the following solution.
  
  For the median class, you can do the following:
  intervals <- c("501-550", "551-600", "601-650", "651-700", "701-750", "751-800") frequencies <- c(25, 40, 42, 31, 26, 21) # define data cumulative_frequencies <- cumsum(frequencies) # calculate cumulative frequencies cumulative_frequencies # [1] 25 65 107 138 164 185 median_position <- sum(frequencies) / 2 # find median position # [1] 92.5 median_class_index <- which(cumulative_frequencies >= median_position)[1] # position of median class median_class <- intervals[median_class_index] # retrieve respective interval median_class # [1] "601-650"
  For modal class, you can do the following in R:
  #Update data frequencies[2] <- frequencies[2] + 13 # For 551-600 frequencies[5] <- frequencies[5] + 22 # For 701-750 modal_class_index <- which.max(frequencies) # position of modal class modal_class <- intervals[modal_class_index] # retrieve respective interval modal_class # [1] "551-600"
  I hope I didn’t make a conceptual mistake. You can play with the code to correct it, if so. If you need something additional, let me know.
  
  Best,
  Cansu
  Reply
Adekunle Odunayo Ajiboye
October 11, 2023 8:17 pm

Hi. Thanks so much for the code you put up here. I don’t no why the data points are not showing on my pca plots. I have 20 samples with 34034 gene counts each

Reply
- Cansu (Statistics Globe)
  October 12, 2023 9:15 am
  
  Hello,
  
  It is so hard to help with this little information. Could you please explain your dataset and share the code that you use?
  
  Best,
  Cansu
  
  Reply
stinkyfoot gopherson
January 12, 2024 4:17 pm

hi jo can i get a shout out in your next video?

Reply
- Joachim (Statistics Globe)
  January 15, 2024 6:08 am
  
  Haha, thanks for the comment, but unfortunately that’s not possible. 😀
  
  Reply

Joachim Schork – Author & Founder of Statistics Globe

Social Media & Contact

46 Comments. Leave new

Leave a Reply Cancel reply