Talk:Distributed file system for cloud/Archive 1

Latest comment: 10 years ago by Bruno Mérigard in topic fifth contact
Archive 1

first contact

Hello,

I'm your reviewer for this module. So, I have to share regularly with you about your article. As your article is in English, probably it would be better if we exchange in English? It’s as you prefer.

For my first reading, I found your article interesting to read. It is rather well written with interesting information. The proportion of your two examples (GFS and HDFS) is not appropriated when you compare with all your article. I think, these are not two examples but two descriptions of what we can find.

I wonder why you chose to write your article in English. Is it a choice? Is it because it’s easier to find links to other information? Is it because, it’s sometimes difficult to translate some specific terms in French?


For my specific notices:

  • It could be useful to add some links on generic terms for cloud computing, scalability, availability, cluster…It’s in order to have links toward information more specific to your need. In the same time, take care about your links. If a word or an acronym appears in red, the link doesn’t work well like this one:

It started running on the DPD-10 in the end of 1973. for example.

  • Since I wrote on talk draft in French, you added some references especially on the paragraph about historic. It's better.
  • Your article needs some more illustrations, but I guess you know that? I think that your first illustration in load balancing paragraph is difficult to see properly.
  • Some acronyms could be defined at the end of the article rather than in your text like this one:

NFS (network file system) for example.

I could have some other points but as it’s a draft article, I'm waiting to see the follow.

Bruno Mérigard (talk) 17:33, 6 January 2014 (UTC)

Answer

Hi, Thank you for your reading and for your comments. I appreciate the fact that you communicate with us in English. We chose to write in English because we all know that the majority of scientific contents is published in English. It was our personal choice, made in order to improve our English level.

We will take into consideration your previous remarks and we will obviously be interested by your next comments. Rabab —Preceding undated comment added 21:18, 6 January 2014 (UTC)


Second contact

Hi, no problem to continue in English. I wonder yet if I will present your article in English too.

I agree it’s easier to find more information and it’s important to improve our English for our next job.

Before to start, I want to say again that your article is interesting and rather well written, in my view

I have some advises and correction to give you.

General notices:

Take care not to have big sentences

Your illustrations should be readable. Your two pictures are difficult to read.

You should illustrate more in order to have an article clearer.

Some other specific words should have a wiki link to a definition you did like for example “file system”, “batch processing”….

Introduction

Why don’t you speak about the possibility that DFSC could also be used to share data as photos, pictures or other files.

I don’t clearly see this possibility in your article.

May be I make a mistake because that it is not a subject in this article?

Bruno Mérigard (talk) 14:26, 8 January 2014 (UTC)

History

Your first sentence doesn’t learn us something, especially here.

Don’t forget to check your links. This one on DPD-10 doesn’t work properly.

There is also a mistake with tense used in your fourth sentence, I think (start/became)

Supporting techniques

This part seems to send your reader to others articles and not to have enough information of what we could find here.

In the same time, we don’t find any references.

I think this part could be developed a little because there are many information here in fact and you will have to describe these techniques later

Application

As I wonder if DFSC is also used to share files (like photos, video…) in order to show something, I don’t see anything about that.

Take care not to have long sentences, if possible of course.

Bruno Mérigard (talk) 14:33, 8 January 2014 (UTC)

Client-server architecture

It could be very useful to have an illustration of how it’s work.

Your link number 6 to “Andrew, S.Tanenbaum; Maarten, Van Steen (2006) Distributed file systems principles and paradigms” is to a paper book. It’s damage not to have the possibility to see your information on internet. By this way, your reader should find it elsewhere.

You propose to compare with Unix, but if we don’t know how Unix works, it could be not useful. That is also why an illustration could be interesting here.

Cluster-Based architectures

As the previous part, it could be interesting to have an illustration, especially if your references are to a paper book (link number 7). Perhaps that an illustration comparing Client-server with Cluster-Based could be sufficient?

You must modify your link number 8. It sends us to Lille University.

When I go on reference number 9, I don’t move towards the bibliography, probably because it’s a pdf file and it should be made differently?

design principles and examples

There is something more or less confusing in my view: you introduce 2 architectures (one is used to GFS) and after you speak about design principles about GFS and HDFS like these two architectures were simply two examples. And these two examples; four with load balancing and rebalancing take about 35% of your article.

Perhaps it would have been better if you had started "Architectures" with description of the 2 big solutions of distributed file systems (GFS and HDFS) and after by speaking of definitions and what is used in these 2 possibilities.

I will continue later.

Bruno Mérigard (talk) 15:03, 8 January 2014 (UTC)

Answer to the second contact

Hello,

I agree with you on all the points you specified. In fact, we have to add more explicit illustrations. For the design principles and examples sections, we didn't pay attention that we mentioned GFS and HDFS before introducing them.

We are sorry about the references that don't work. We'll modify the text in order to correct it and make it coherent.

Rabab —Preceding undated comment added 15:26, 8 January 2014 (UTC)

Third contact

Thanks a lot to take in consideration my view. I continue my advises

design principles

In this part, you cite 5 hypotheses from your article written by Krzyzanowski but I can’t read it except if I go directly on the biblioragpy

google file system

Your example is as developed as the main article you referenced in fact. This part is very interesting and his title (example) seems not appropriate in my view.

I said you that link number 9 doesn’t work properly. In fact, link number 11 is the same link but it works well. Hence, you can modify link number 9 as link number 11

I expect a lot at least one illustration for this part that I find quite a long. Probably, this part would earn to be sliced in some different parts depending of what you want to explain.

I imagine at least 3 parts:

  • First to present GFS
  • Second to speak about fault tolerance
  • Third to describe the file access process

I have found a mistake just before link number 14: …. they have stored. [lelivre] GFS is a scalable distributed file system …...

I am not sure that we can write something like “Now, let's detail the file access process”

I will continue.

Bruno Mérigard (talk) 15:38, 9 January 2014 (UTC)

Answer to the third contact

Hello Bruno,

We thank you again for your responsiveness. Your advices are interesting, we take into consideration your feedback. I will discuss with my colleague to rectify Google File System part.

Narjes —Preceding undated comment added 18:56, 9 January 2014 (UTC)


fourth contact

Hello again.

I have some advises again to give you

Hadoop distributed file system

At the beginning of this section, we could imagine find the differences with GFS. You say us that there is another way offered by Apache which is almost like GFS. What is the interest of HDFS in fact or why do you describe it?

I find this part not very well structured. You often mix the definition, the function and the architecture of many things. I would imagine find

  • HDFS comparing GFS or what is HDFS
  • Structure or architecture of HDFS

After it’s easier to understand between NameNode, Datanode, files and clients how HDFS works. May be is it because there is not illustration too?

I didn’t understand the problem of data rebalancing in your last paragraph of this section. I think, it’s not interesting to say that it will be developed later.

Take care of your links. The links 20 et 21 are toward Lille University.

Bruno Mérigard (talk) 12:41, 10 January 2014 (UTC)

Load balancing

Is "load balancing" an example or a function we find in every DFS?

Same question with Load rebalancing.

It’s because I found them in the sections “examples” like GFS and HDFS

Link 22 doesn't work properly.

I also think that your illustration is not very explicit. We see a succession of servers that we can't associate with something.

Load rebalancing

How can we check that failure is the norm? why is it the norm? It could be your own assertion.

I wonder if there is a mistake in your sentence (cause and effect?) “ In a cloud computing environment, failure is the norm, and chunkservers may be upgraded, replaced, and added in the system. “

Link 23 doesn't work properly. I suggest you to check all your links because it's not useful to link towards Lille University and some links don't work properly.

Bruno Mérigard (talk) 13:01, 10 January 2014 (UTC)

Communication

You say that “Several works have been done in order to improve communications” without any reference. It could be seen as an assertion.

Of course, an illustration could be interesting. — Preceding unsigned comment added by Bruno Mérigard (talkcontribs) 21:56, 9 January 2014 (UTC)

Security keys

Good idea this small introduction. It could be interesting to reference to a specific article for this small section. You have probably forecast to do it because I saw some words between “[“ for the last sentence.

Perhaps, this artcile? https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6238281

It could be possible to imagine an illustration which shows the 3 concepts to garantee the security of information.

Bruno Mérigard (talk) 14:01, 10 January 2014 (UTC)

Confidentiality

I don’t clearly understand the meaning of your sentence “The risk of an unsecured environment is realized if the service provider can locate consumer's data in the cloud” Probably that the word “realized” is not well appropriate? i think that if we are in an unsecured environment, there is a risk that the service provider can locate consumer's data in the cloud.

I think that it could be confusing when you speak about the three conditions. I propose you to expose like this:

The risk to loose confidentiality is significant in an unsecured environment if the service provider:

  • can locate consumer's data in the cloud
  • has the privilege to access and retrieve consumer's data and
  • can understand the meaning of data (types of data,....


I will continue — Preceding unsigned comment added by Bruno Mérigard (talkcontribs) 12:02, 10 January 2014 (UTC)

Availability

You say that Availability is generally treated by replication. Could your give us a reference? Is-it possible to link the word "replication" with this link which is more interesting?https://en.wikipedia.org/wiki/Replication_%28computing%29 — Preceding unsigned comment added by Bruno Mérigard (talkcontribs) 13:47, 10 January 2014 (UTC)


Integrity

When I compare with all your artcile, I think that this section could have been more developed.

You list what we can find (solutions and the way affecting integrity) but you don't describe enough them comparing the others parts of your article.

I am not sure you are exhaustive because you only speak about cryptography.

Cloud-based Synchronization of Distributed File System

I am quite surprised to find this section here in fact. I wonder if it's not like an architecture solution too or may be like an application? By this way, you would have been three big applications:

  • first to share a huge amount of files to run softwares between a lot of computers.
  • second to share lots of information used as data bases
  • third to share files as described here

Economic aspects

May be it could have been intersting to show the increase of the invest in this technology. — Preceding unsigned comment added by Bruno Mérigard (talkcontribs) 14:50, 10 January 2014 (UTC)

last things

I didn't check your English, because your article is still a draft I invite you to check all your links. Take care that we needn't have 2 or 3 links for a single reference in your library like (for example)

(en) Farag Azzedin, « Towards A Scalable HDFS Architecture », [Collaboration Technologies and Systems (CTS), 2013 International Conference on],‎ 2013, p. 155-161 (DOI 10.1109/CTS.2013.6567222, lire en ligne) Information and Computer Science Department King Fahd University of Petroleum and Minerals

You also need to put the point after your reference and not before. See the correct example on the number 44 that I have corrected

I have suggested you many things. You could follow them depending of your own view of course.— Preceding unsigned comment added by Bruno Mérigard (talkcontribs) 15:18, 1 January 2014 (UTC)

Answer to the third and fourth contact

Hi Bruno

We are working on all the points that you have specified as we agree with you on them. We didn't finish to treat them all. We are sorry again for the references but you can check that now, they all work correctly. We put a link for the book "Distributed systems:Designs and principals" in case you want to check something. For your comment about "Security keys", what kind of illustration you are waiting for?

you wrote: "You also need to put the point after your reference and not before" For English references, I think that the point must be put before the reference and we can see that in all English articles (I've checked in different articles in wikipedia).

We'll work on the points not treated yet and we'll keep you posted.

Rabab —Preceding undated comment added 22:28, 10 January 2014 (UTC)


fifth contact

Hi Rabab, No problem. It's a hard work to write a correct article directly without mistake.

  • About your links, especially in your Bibliography, we needn't to have two or three links. One to the text is enough. It's not useful to find a link toward Lille University. I think you haven't finished to correct them. (in your bibliography)
  • About the illustration that I suggest on security keys, it's rather to show availability, integrity and confidentiality on the same scheme. It's more to have a general illustration than an explanatory one with a lot of details. In fact, it depends on the other illustrations that we could find later on the rest of your article. An other possibility is to illustrate section on confidentiality because I think it's the most important thing and it's also on this one that you have the most information.

something else

Hello Rabab, Just, two small things.

About the section “Cloud-based Synchronization of Distributed File System”, do you think it’s a singular part or an application or an example of architecture?

In your plan, I see that “ Load balancing and rebalancing” is like a section of “Hadoop distributed file system”. Is it true? I suggest you to modify the architecture in your article to avoid very low sessions in your content.— Preceding unsigned comment added by Bruno Mérigard (talkcontribs) 15:48, 13 January 2014 (UTC)


answer

Hi Bruno,

We are sorry for confusing you, but normally the section "load balancing" is not included in Hadoop part. It was just an error. However, for the section “Cloud-based Synchronization of Distributed File System”, we think that it should be a independent part. Do you think it should not be?

Rabab —Preceding undated comment added 06:30, 16 January 2014 (UTC)

my view

Hello Rabab,

  • No problem about the positionning of "load balancing" in your article. I have guessed that there was a mistake
  • Just one question: I saw that there are 4 parts in your section "cluster based architecture" with one speaking of design principles but only one part in the section "client server architecture. Is-it because the most possiblities are only on the "cluster based architecture"?
  • About your choice to have a specific part on Cloud-based Synchronization of Distributed File System is not a trouble. But as I think it's more or less between the solutions of architercure and the applications, I would imagine this part between the two.
  • Do you intend to add some illustrations?

Answer

Hello Bruno,

We agree with you, there is an unbalance between sections but Gilles Grimaud said that's not a problem. We think that "cluster based architecture" is more complex than "client server architecture" that's need more details and explanations.

We will try to develop your proposition about Synchronization.

We are currently working on some new illustrations.

Narjes —Preceding undated comment added 14:15, 17 January 2014 (UTC)


My view

Hello Narjes,

I spoke about the number of solutions in cluster based architecture but not about details in fact. I think it is damage that you described only one solution in a model of artchitecture against 4 solutions for the other type of solution. I didn't know that G. Grimaud said you something about that.— Preceding unsigned comment added by Bruno Mérigard (talkcontribs) 15:00, 17 January 2014 (UTC)

Answer

Hi Bruno,

Excuse me Bruno, but we don't understand your view. You talk about four solutions in "cluster based architecture", what do you mean by solution? On your first view you said "part". Maybe you mean examples "Google file system" and "Hadoop distributed file system".

Can you give us more details. We thank you.

Narjes —Preceding undated comment added 18:21, 17 January 2014 (UTC)


my view

Ok, I made a little confusion, sorry.

In the section/part n°2.2 "cluster based architecures", you describe:

  • 2.2.1 designe principles of cluster based architectures
  • 2.2.4.1 what is load balancing
  • 2.2.4.2 what is load rebalancing

and you finish by describing in détails 2 examples of this type of architecture which are:

  • 2.2.2 GFS and
  • 2.2.3 Hadoop.

Do you agree?

In this case, your arrangement disturb me because usually, we finish a part by examples or applications of what we describ before.— Preceding unsigned comment added by Bruno Mérigard (talkcontribs) 17:53, 17 January 2014 (UTC)


answer

Yes, we agree. You're right. We will modify that. And for the references that lead you to Lille University, I think it's normal because we have access to these articles through our university account. I think there is no solution. I checked other articles (even yours), and all the references link to Lille University.

Rabab —Preceding undated comment added 22:33, 17 January 2014 (UTC)

Answer

Hi Bruno,

Ok that's clear now, the problem is "Load balancing" and "rebalancing" are two approaches that have used mainly on a cluster environment. I just talked to Rabab and we are trying to find a solution. Of course, do not hesitate to propose any solution that can help you for your presentation.

Narjes —Preceding undated comment added 23:11, 17 January 2014 (UTC)


My view

Hello Narjes and Rabab,

Ok, I see what you Mean Rabab about your links. The fact is that we have to imagine that our articles will be for everybody from anywhere. Many sources are not free, but if your reader is not allowed to go to the site of Lille University, the problem is worst because your reader hasn't any possibility to go on this site.

I think that your links should show where are the sources for everybody nevermind if it's not free. In this case, your reader could find a solution from his university, from his job or something else.

About the arrangement of your article, it's difficult to me to say more because since the beginning, I have said that your two examples take a too great part of your article.

In fact we could wonder if the main aim of your article is to describe your two examples more than other things. My approache would have been totaly different if I had been the writer but it's your article.

Bruno Mérigard (talk) 08:50, 18 January 2014 (UTC)

Answer

Hi Bruno,

Yes, normally the reader have the right to access all the references. But how can we make links show which source is free (or not free) ?

Concerning the arrangement of the article, we agree with you that examples take a long part and that because when we started to write our article, the main articles didn't present enough details at all. However, now, the main articles have been modified recently and I think it's more enriched.

Moreover, we don't understand why you could wonder if the goal is to describe the two examples more than other things. The title of the article is "Distributed file system in the clouds" GFS and HDFS are the most used and we have already mentioned it in the text.

We will be happy if we can convince you about the arrangement. Otherwise, we can still discuss it.

You have asked us about "Cloud-based Synchronization of Distributed File System". We think it's a singular part. Does it disturb you like that?

Also, you said how can we check that failure is the norm. I put some references and it's not our assertion.

For the sentence "In a cloud computing environment, failure is the norm, and chunkservers may be upgraded, replaced, and added in the system." in load rebalancing part, you wondered if it's a mistake. It's not a mistake. When there are component failures, then we have chunkservers that are replaced or new chunkservers that are added to the system. We don't see why do you think it's a mistake. Please, explain you view.

Concerning your view about "Cloud-based Synchronization of Distributed File System", we don't understand what you wanted to say. You propose 3 applications but we don't see why you want to differentiate them while they are all about sharing data.

Another point, you have asked us why we don't talk about sharing data as photos, pictures or other files but sharing data include all types of data (pictures, ...) I think we don't have to explicitly detail all the types.


Rabab


It's natural to have that two examples take two great parts of our article because on those two parts we have detailed the main architectures of File system for the cloud, we can't detail all other architectures and at the same time we can't describe one without referring to its source.


Narjes —Preceding undated comment added 16:08, 18 January 2014 (UTC)

My view

ok

For the links, I give an example:

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5982251&queryText%3DDistributed+File+System

For your arrangement, I was just a little surprised about the size of your two descriptions comparing with the other parts. You had needed to develop a lot what is GFS and hadoop. It was your researches that lead you to do like this. But, why not. In this case, it would have been interesting to list the other systems, just for information.

About "Cloud-based Synchronization of Distributed File System", I have already answered. No problem to a singular part, but I finally thought that it is not well palced.

When I see that "In a cloud computing environment, failure is the norm", I imagined that there are usually problem in cloud computing environment. Is it this meaning?

Bruno Mérigard (talk) 15:54, 18 January 2014 (UTC)

Answer

Ok, we can add other examples.

"In a cloud computing environment, failure is the norm" It means that components failures are a norm rather than an exception.

We didn't understand about the link you gave. — Preceding unsigned comment added by Bouziane.Rabab (talkcontribs) 16:10, 18 January 2014 (UTC)

Rabab

I gave you an example, just to show a link to IEEE Xplore without through Lille University

About other examples, it was just an idea to list all the systems without desciption. it's just to say: the other systems are... but they aren't used a lot.

Bruno Mérigard (talk) 16:51, 18 January 2014 (UTC)

Two things

hello Narjes and Rabab,

Thank you to take in consideration most of my suggestions. Just two things, try to write using a higher size on your illustrations. It's often difficult to read.

About link: I took an example again.

Among your sources, you wrote, for example:

| title = Distributed File System as a basis of Data-Intensive Computing

| periodical = Application of Information and Communication Technologies (AICT), 2012 6th International Conference on

| layurl = http://ieeexplore.ieee.org.docproxy.univ-lille1.fr/xpl/mostRecentIssue.jsp?punumber=6385344

and I found that this link would be better:

| layurl = http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6398484&queryText%3DDistributed+File+System+as+a+basis+of+Data-Intensive+Computing

Bruno Mérigard (talk) 08:45, 19 January 2014 (UTC)

Question

Hi Bruno

We want to add other examples as we discuss before. In you opinion, where we can add this part? we want it to be in the correct place and may be there is a place that suits you more.

Rabab

My answer

Hello Rabab,

I think it could be a good place in 2.2.4, after Hadoop but if you have an other idea....

Bruno Mérigard (talk) 08:01, 20 January 2014 (UTC)