Lars Francke
2010-Nov-18 23:28 UTC
[Puppet Users] Managing a "complex" directory structure
Hi, I''m trying to manage our Hadoop cluster with Puppet but there are a few challenges. The one I''m facing now is managing the following. I''ve got an array variable depending on the type of server: $hadoop_disks = [''/mnt/disk1'', ''/mnt/disk2'', ...] Depending on the classes I include for each role there needs to be a different directory structure on all those disks. Namenode + Datanode = /mnt/diskX/hadoop/dfs Jobtracker + Tasktracker = /mnt/diskX/hadoop/mapred Each directory (/hadoop, /hadoop/dfs, /hadoop/mapred) has different permissions and both roles can be on the same server (Namenode + Datanode). I''ve tried multiple different things but I wasn''t able to find a solution that works. This is what I thought about doing: base class: define hadoop_main_directory() { file { "${name}/hadoop": ensure => directory, owner => "root", group => "hadoop", } } define hadoop_sub_directory($path, $user) { file { "${name}/hadoop/${path}": ensure => directory, owner => $user, group => "hadoop", require => Hadoop_main_directory[$name], } } And in each of the four classes a definition like hadoop_sub_directory { $hadoop_disks: path => "dfs", owner => "hdfs", } But I guess that doesn''t work because a resource may be managed multiple times. Any ideas how to solve this? I can provide more details. Our configuration is also on github[1] but it''s not working right now and probably not very pretty. First time I''ve used Puppet and learning on the go... Cheers, Lars [1] https://github.com/lfrancke/gbif-puppet -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Joshua Anderson
2010-Nov-19 07:30 UTC
Re: [Puppet Users] Managing a "complex" directory structure
Hi Lars, Take a look at virtual resources: http://projects.puppetlabs.com/projects/1/wiki/Virtual_Resources -Josh On Nov 18, 2010, at 3:28 PM, Lars Francke wrote:> Hi, > > I''m trying to manage our Hadoop cluster with Puppet but there are a > few challenges. The one I''m facing now is managing the following. > > I''ve got an array variable depending on the type of server: > $hadoop_disks = [''/mnt/disk1'', ''/mnt/disk2'', ...] > > Depending on the classes I include for each role there needs to be a > different directory structure on all those disks. > > Namenode + Datanode = /mnt/diskX/hadoop/dfs > Jobtracker + Tasktracker = /mnt/diskX/hadoop/mapred > > Each directory (/hadoop, /hadoop/dfs, /hadoop/mapred) has different > permissions and both roles can be on the same server (Namenode + > Datanode). > > I''ve tried multiple different things but I wasn''t able to find a > solution that works. This is what I thought about doing: > > base class: > > define hadoop_main_directory() { > file { "${name}/hadoop": > ensure => directory, > owner => "root", > group => "hadoop", > } > } > > define hadoop_sub_directory($path, $user) { > file { "${name}/hadoop/${path}": > ensure => directory, > owner => $user, > group => "hadoop", > require => Hadoop_main_directory[$name], > } > } > > And in each of the four classes a definition like > > hadoop_sub_directory { $hadoop_disks: > path => "dfs", > owner => "hdfs", > } > > But I guess that doesn''t work because a resource may be managed multiple times. > > Any ideas how to solve this? I can provide more details. Our > configuration is also on github[1] but it''s not working right now and > probably not very pretty. First time I''ve used Puppet and learning on > the go... > > Cheers, > Lars > > [1] https://github.com/lfrancke/gbif-puppet > > -- > You received this message because you are subscribed to the Google Groups "Puppet Users" group. > To post to this group, send email to puppet-users@googlegroups.com. > To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en. >-- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
jcbollinger
2010-Nov-19 14:04 UTC
[Puppet Users] Re: Managing a "complex" directory structure
On Nov 18, 5:28 pm, Lars Francke <lars.fran...@gmail.com> wrote: [...]> But I guess that doesn''t work because a resource may be managed multiple times. > > Any ideas how to solve this?There are two basic approaches you can use, separately or together, depending on how granularly you want to manage these resources. 1) As Joshua mentioned, virtual resources are one of them. They allow you to declare (once) all the resources any of your systems might need, and then "realize" per-system those you do need, possibly multiple times each. 2) Factor out common resources into separate classes. You cannot declare the same resource multiple times in one configuration, but you can include a class any number of times. This is a good general practice even when you''re not currently having trouble, because it helps avoid future problems. Good Luck, John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Lars Francke
2010-Nov-20 09:46 UTC
Re: [Puppet Users] Managing a "complex" directory structure
Hi, thank you very much for the replies.> Take a look at virtual resources: http://projects.puppetlabs.com/projects/1/wiki/Virtual_ResourcesI did take a look at them but I couldn''t really figure out how to properly use them. I can''t remember what the problem was but I think part of it was because of the array I used for the resources. I''ll try it again and will report back. Cheers, Lars -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Lars Francke
2010-Nov-26 00:22 UTC
Re: [Puppet Users] Managing a "complex" directory structure
I don''t understand how I''d convert the definitions from my original mail into virtual resources. Simplified: $disks = [''/a'', ''/b''] define foo() { file { "${name}/foo": } } foo { $disks: } define bar($path) { file { "${name}/foo/${path}": require => Foo[$name], } } bar { $disks: path => "bar", } And I require those things in multiple classes. I think this works but apart from being ugly it does not work when requiring foo from multiple classes. I also cannot require bar multiple times with different paths because the name is the same (the array) and I don''t know how to get around that. Which kinda defeats the variability of the define. I also don''t understand how to convert foo to a virtual resource definition. I need to change it to this: @foo { $disks: } and then in bar just add this: Foo <| |> ? Any help would be really appreciated. I must have read the documentation four bajillion times now but parts of it it still make no sense to me. I especially have trouble understanding Virtual resources, calling a define/resource with an array and requiring virtual resources and defines. Cheers, Lars -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
jcbollinger
2010-Nov-29 17:30 UTC
[Puppet Users] Re: Managing a "complex" directory structure
On Nov 25, 6:22 pm, Lars Francke <lars.fran...@gmail.com> wrote:> I don''t understand how I''d convert the definitions from my original > mail into virtual resources.You wouldn''t convert the definitions themselves, but rather their instantiations. Alternatively, you might convert the definitions to *use* virtual resources, but instantiate them non-virtually. For example, define foo() { @file { "${name}/foo": } } define bar(...) { realize File["${name}/foo"] ... } class baz { foo { "bat": } bar { "bat": } } Having now looked a little more deeply at your problem, however, I think that''s putting the cart before the horse. You need to redesign / refactor (more on that below).> Simplified: > $disks = [''/a'', ''/b''] > > define foo() { > file { "${name}/foo": }} > > foo { $disks: } > > define bar($path) { > file { "${name}/foo/${path}": > require => Foo[$name], > }} > > bar { $disks: > path => "bar", > > } > > And I require those things in multiple classes. > I think this works but apart from being ugly it does not work when > requiring foo from multiple classes. I also cannot require bar > multiple times with different paths because the name is the same (the > array) and I don''t know how to get around that. Which kinda defeats > the variability of the define.You should not think of defines as macros or functions, or even as a special kind of class. They are essentially custom resource types, akin to File and Service. As such, (a) the titles provided in their instantiations must each identify a specific logical object to manage, (b) you cannot instantiate a define multiple times with the same title, and thus (c) one set of their parameters must provide all the needed details for one object. All that is just like any other resource declaration.> I also don''t understand how to convert foo to a virtual resource > definition. I need to change it to this: > @foo { $disks: } > > and then in bar just add this: Foo <| |> ?Yes, that''s about right. You can also put selection predicates inside the brackets to limit which Foos are realized. Alternatively, you can use the "realize" function instead of bracket notation if you know exactly which virtual Foos you want to realize. But none of that is going to solve your particular problem, because even if you instantiate your defines virtually, you still can provide only one set of parameters for each title within the scope of each node. Basically, this part of your design concept (define "bar") does not fit the Puppet model. I think the bar / hadoop_sub_directory define needs to be removed altogether. You may be able to replace some or all of its intended function with ordinary File resources in conjunction with suitably- scoped File property defaults. You may simply need to be a little more verbose and / or repetitious in your manifests. You may need or want to refactor some of the classes that use these defines.> Any help would be really appreciated. I must have read the > documentation four bajillion times now but parts of it it still make > no sense to me. I especially have trouble understanding Virtual > resources,Getting your head around virtual resources can take some effort, but once you''ve got it, they''re really not that hard. A virtual resource declaration (including a virtual definition instantiation) is identical to an ordinary one except 1) it has an @ sigil in front of it, and 2) its effect is only to set the properties of the declared resource, not to instruct Puppet (as non-virtual declarations do) to *apply* the resource to the current node. To make that useful in light of (2), a set of manifests in which a virtual declaration is included can instruct Puppet to apply the resource either by means of the <| |> notation or by calling the "realize" function. The two realization methods have identical effects on affected resources. Each realization is an instruction to Puppet that the specified resource should be included, once only, in the catalog for the relevant node. Multiple realizations are thus redundant, but consistent, as each one tells Puppet the same thing.> calling a define/resource with an arrayYou may help yourself by thinking in terms of "declaring" a resource or definition instance, rather than in terms of "calling" these. "Calling" is associated (at least for me) with functions, and these don''t work like functions. In any event, declaring a resource or definition instance using an array for the title is just shorthand for multiple declarations, one for each element of array, each with identical properties except for the title. In the body of a definition, the title may be referenced as $name, and therefore it may influence the properties of resources declared within. That in no way changes the semantics and requirements of resource declarations.> and requiring > virtual resources and defines.If you mean "realizing" virtual declarations, I''ve covered that above as best I can. If you actually do mean "requiring" them, then the only difference from non-virtual resources is that you can only "require" a virtual resource for a target node where that resource is realized. HTH, John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Lars Francke
2010-Nov-30 00:15 UTC
Re: [Puppet Users] Re: Managing a "complex" directory structure
Hello,> But none of that is going to solve your particular problem, because > even if you instantiate your defines virtually, you still can provide > only one set of parameters for each title within the scope of each > node. Basically, this part of your design concept (define "bar") does > not fit the Puppet model.That''s what I suspected: My brain has not yet been able to comprehend the Puppet model completely.> I think the bar / hadoop_sub_directory define needs to be removed > altogether. You may be able to replace some or all of its intended > function with ordinary File resources in conjunction with suitably- > scoped File property defaults. You may simply need to be a little > more verbose and / or repetitious in your manifests. You may need or > want to refactor some of the classes that use these defines.This is the part I don''t understand. The more verbose I''ve been the more errors I got because of duplicate definitions etc. Which only supports the point that I still don''t "get" the Puppet model :)> HTH,I really do appreciate that you took the time to write such a detailed response. I will try once more to understand what I''m doing wrong and I''ll re-read your mail a few times but I''ve spent so much time on this seemingly simple directory issue that I might just have to revert to a simple script or fabric to do this job. But as I said: Thank you for the help - I''ll try to learn something from it now. Cheers, Lars -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
jcbollinger
2010-Nov-30 14:53 UTC
[Puppet Users] Re: Managing a "complex" directory structure
On Nov 29, 6:15 pm, Lars Francke <lars.fran...@gmail.com> wrote:> This is the part I don''t understand. The more verbose I''ve been the > more errors I got because of duplicate definitions etc. Which only > supports the point that I still don''t "get" the Puppet model :)Wrapping up a bunch of things in a define may reduce the number of error *messages* simply because the error shifts from the usages of a bunch of separate resources to the usage of one define. That doesn''t change the nature of the the error. One of the things that sometimes gives Puppet newbies trouble is that it employs a declarative model rather than an imperative one. In other words, it''s an expert system, not a scripting language. The Puppet language is all about describing the state that your system should have, and Puppet later uses that description to ensure that your system is in that state. This is very powerful and flexible, but sometimes confusing, too. It is the core reason why you cannot declare duplicate resources. Perhaps it would help to start by writing your manifests without using any definitions and without using any variables other than facts, at least for these directories you are managing. This will require that every directory and file you are explicitly managing must be named in some (exactly one) File resource. Use recurse => true for directories where it is appropriate to do so (i.e. you don''t necessarilly need to explicitly manage every single file). By all means, do organize these resources into classes in a manner that makes sense to you. The objective there is to create a set of manifests that get the job done, and which you can then use, if you wish, as a basis for further development. When every file you are directly managing is explicitly named, it should be easy to ensure that there are no duplicates. By classifying the resources in that form, it should be trivial to ensure that you have no overlapping classes. And if you later refine those initial manifests to shorten them by re-introducing variables and definitions, then it should be clearer what variables you want and what definitions would be useful. For what it''s worth, I think it''s a good idea in general to hold Puppet definitions in reserve for places where you really need them or where they provide a big win. Good Luck, John -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Lars Francke
2010-Dec-03 17:04 UTC
Re: [Puppet Users] Re: Managing a "complex" directory structure
Hi! Again thank you for the detailed answer.> One of the things that sometimes gives Puppet newbies trouble is that > it employs a declarative model rather than an imperative one. In > other words, it''s an expert system, not a scripting language. The > Puppet language is all about describing the state that your system > should have, and Puppet later uses that description to ensure that > your system is in that state. This is very powerful and flexible, but > sometimes confusing, too. It is the core reason why you cannot > declare duplicate resources.I think I got all that. I have one problem though. Even spelling it all out won''t work. Or to put it better: I don''t know how. And add to that that we''re changing our infrastructure quite frequently at the moment so I''d have to switch back and forth quite a lot of code if I got it to work. What I don''t understand: I have an array $disks = ["/a", "/b"] And I can use that as the title of resources to define one resource for each member of the array. So far so good. file { $disks: } works as expected. And is expanded to: file { "/a": } file { "/b": } which are two distinct titles so work as expected. But those aren''t the paths I need to manage. But what I want is: file { "${disks}/foo": } Being expanded to: file { "/a/foo": } file { "/b/foo": } What really happens: file { "/a/b/foo": } It''s obvious to me why this happens (the variable being in a string etc.) but I''d still love a way to allow me to do what I want because I think that would solve all my problems (I might be wrong here obviously). As I said earlier: I''m at a point where I don''t know how to solve my problem even if I am as verbose as I can. I''ve just pushed my current (non-working) configuration to Github: https://github.com/lfrancke/gbif-puppet/blob/master/modules/hadoop/manifests/init.pp#L22-58 I''ve spelled out all the main directories here in a virtual resource definition but I have no idea how to go on from there. Some machines need only two of those, some need all. The datanode[1] and namenode[2] classes need subdirectories "/dfs" everywhere and again this is with two different configurations (2 vs. 6 disks) and the tasktracker[3] and jobtracker[4] need a "/mapreduce" directory in there. Because those classes all need the common super directory, that''s why I made it virtual. But how Do I realize it? I can''t just list the realizations in the classes because they are different depending on the node. If you could take another look at it then that would be great but I''d understand if you have better things to do with your time. I''ve fallen back to manually creating that directory structure using fabric now. But I''d obviously like to get it working in Puppet to avoid this manual step. I''m afraid I''m just a hopeless case ;-) Cheers, Lars [1] https://github.com/lfrancke/gbif-puppet/blob/master/modules/hadoop/manifests/datanode.pp [2] https://github.com/lfrancke/gbif-puppet/blob/master/modules/hadoop/manifests/namenode.pp [3] https://github.com/lfrancke/gbif-puppet/blob/master/modules/hadoop/manifests/tasktracker.pp [4] https://github.com/lfrancke/gbif-puppet/blob/master/modules/hadoop/manifests/jobtracker.pp -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Nan Liu
2010-Dec-03 21:50 UTC
Re: [Puppet Users] Re: Managing a "complex" directory structure
On Fri, Dec 3, 2010 at 9:04 AM, Lars Francke <lars.francke@gmail.com> wrote:> Hi! > > Again thank you for the detailed answer. > >> One of the things that sometimes gives Puppet newbies trouble is that >> it employs a declarative model rather than an imperative one. In >> other words, it''s an expert system, not a scripting language. The >> Puppet language is all about describing the state that your system >> should have, and Puppet later uses that description to ensure that >> your system is in that state. This is very powerful and flexible, but >> sometimes confusing, too. It is the core reason why you cannot >> declare duplicate resources. > > I think I got all that. > I have one problem though. Even spelling it all out won''t work. Or to > put it better: I don''t know how. > And add to that that we''re changing our infrastructure quite > frequently at the moment so I''d have to switch back and forth quite a > lot of code if I got it to work. > > What I don''t understand: > I have an array $disks = ["/a", "/b"] > > And I can use that as the title of resources to define one resource > for each member of the array. So far so good. > > file { $disks: } > works as expected. And is expanded to: > file { "/a": } > file { "/b": } > which are two distinct titles so work as expected. But those aren''t > the paths I need to manage. > > But what I want is: > file { "${disks}/foo": } > > Being expanded to: > file { "/a/foo": } > file { "/b/foo": } > > What really happens: > file { "/a/b/foo": }A custom function that expands the array would be a bit more universal, but you can still do this with puppet define resource type with no ruby code: define hadoop::mount { file { "/mnt/${name}/hadoop": ensure => directory, owner => "root", group => "hadoop", require => Package["hadoop-0.20"] ; } } hadoop::mount { ["disk1", "disk2", "disk3"]: } notice: /Stage[main]//Hadoop::Mount[disk2]/File[/mnt/disk2/hadoop]/ensure: is absent, should be directory (noop) notice: /Stage[main]//Hadoop::Mount[disk1]/File[/mnt/disk1/hadoop]/ensure: is absent, should be directory (noop) notice: /Stage[main]//Hadoop::Mount[disk3]/File[/mnt/disk3/hadoop]/ensure: is absent, should be directory (noop) mnt and hadoop can be a an variable for the defined resource so it''s supports generic file path: "/mnt/${name}/hadoop":> It''s obvious to me why this happens (the variable being in a string > etc.) but I''d still love a way to allow me to do what I want because I > think that would solve all my problems (I might be wrong here > obviously). > > As I said earlier: I''m at a point where I don''t know how to solve my > problem even if I am as verbose as I can. > I''ve just pushed my current (non-working) configuration to Github: > https://github.com/lfrancke/gbif-puppet/blob/master/modules/hadoop/manifests/init.pp#L22-58 > > I''ve spelled out all the main directories here in a virtual resource > definition but I have no idea how to go on from there. Some machines > need only two of those, some need all. The datanode[1] and namenode[2] > classes need subdirectories "/dfs" everywhere and again this is with > two different configurations (2 vs. 6 disks) and the tasktracker[3] > and jobtracker[4] need a "/mapreduce" directory in there. Because > those classes all need the common super directory, that''s why I made > it virtual. But how Do I realize it? I can''t just list the > realizations in the classes because they are different depending on > the node.The difference between the system can be written as a custom fact, a variable set at top scope when defining the nodes, or use parametrized class (available in 2.6). class demo ($disk) { case $disk { "2" : { # add resource specific for 2 disk } "6" : { # add resource specific for 6 disk } } # common for all system using class demo } node server1 { class { "demo": disk => 2 } } node server2 { class { "demo": disk => 6 } } Thanks, Nan -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
Lars Francke
2010-Dec-08 15:11 UTC
Re: [Puppet Users] Re: Managing a "complex" directory structure
Hello, sorry it took me so long to reply. Thank you for the answer. That has helped me a lot and I think I''m on the correct path now. It is just a lot more verbose than I had hoped :)> A custom function that expands the array would be a bit more > universal, but you can still do this with puppet define resource type > with no ruby code: > > define hadoop::mount { > file { "/mnt/${name}/hadoop": > ensure => directory, > owner => "root", > group => "hadoop", > require => Package["hadoop-0.20"] > ; > } > } > > hadoop::mount { > ["disk1", "disk2", "disk3"]: > }This in combination with virtual resources seems to work[1]. It is pretty complicated and unintuitive though as I can''t require + realize a virtual resource at the same time[2] so I have to add placebo definitions that realize the resources so I can require them and subscribe to them. I''m still in the process of converting everything to virtual resources so it doesn''t work at the moment but I''ll see tomorrow if I can get it working. Can slim it down afterwards if I find better ways to do stuff.> The difference between the system can be written as a custom fact, a > variable set at top scope when defining the nodes, or use parametrized > class (available in 2.6). > > class demo ($disk) { > case $disk { > "2" : { # add resource specific for 2 disk } > "6" : { # add resource specific for 6 disk } > } > # common for all system using class demo > } > > node server1 { > class { "demo": disk => 2 } > } > > node server2 { > class { "demo": disk => 6 } > }That looks interesting. I''ve read about those new parameterized classes but skipped them for now. Thanks for the hint. I''ll take a look at it later when I got the basics working again. Cheers, Lars [1] https://github.com/lfrancke/gbif-puppet/blob/master/modules/hadoop/manifests/init.pp#L22-66 [2] https://github.com/lfrancke/gbif-puppet/blob/master/modules/hadoop/manifests/namenode.pp#L19-24 -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to puppet-users@googlegroups.com. To unsubscribe from this group, send email to puppet-users+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.